Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Magic File Format

Magic files define rules for identifying file types through byte-level patterns. This chapter documents the magic file format supported by libmagic-rs.

Overview

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

  1. Offset - Where to look in the file
  2. Type - How to interpret the bytes
  3. Value - What to match against
  4. Message - Description to display on match

Basic Format

offset  type  value  message

Example:

0       string  PK    ZIP archive data

This rule matches files starting with “PK” and labels them as “ZIP archive data”.

Basic Syntax

Rule Structure

[level>]offset    type    [operator]value    message
ComponentRequiredDescription
level>NoIndentation level for nested rules
offsetYesWhere to read data
typeYesData type to read
operatorNoComparison operator (default: =)
valueYesExpected value
messageYesDescription text

Comments

Lines starting with # are comments:

# This is a comment
0  string  PK  ZIP archive

Whitespace

  • Fields are separated by whitespace (spaces or tabs)
  • Leading whitespace indicates rule nesting level
  • Trailing whitespace is ignored

Offset Specifications

Absolute Offset

Direct byte position from file start:

0       string  \x7fELF   ELF executable
16      short   2         (shared object)

Hexadecimal Offset

Use 0x prefix for hex offsets:

0x0     string  MZ        DOS executable
0x3c    long    >0        (PE offset present)

Negative Offset (From End)

Read from end of file:

-4      string  .ZIP      ZIP file (end marker)

Indirect Offset

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l)   string  PE\0\0  PE executable

Indirect offset syntax:

  • (base.type) - Read pointer at base, interpret as type
  • (base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

  • .b - byte (1 byte)
  • .s - short (2 bytes)
  • .l - long (4 bytes)
  • .q - quad (8 bytes)

Relative Offset

Offset relative to previous match:

0       string  PK\x03\x04   ZIP archive
&2      short   >0           (with data)

The & prefix indicates relative offset.

Type Specifications

Integer Types

TypeSizeEndianness
byte1 byteN/A
short2 bytesnative
leshort2 byteslittle-endian
beshort2 bytesbig-endian
long4 bytesnative
lelong4 byteslittle-endian
belong4 bytesbig-endian
quad8 bytesnative
lequad8 byteslittle-endian
bequad8 bytesbig-endian

All integer types have unsigned variants prefixed with u:

  • ubyte, ushort, uleshort, ubeshort
  • ulong, ulelong, ubelong
  • uquad, ulequad, ubequad

Examples:

0       byte      0x7f      (byte match)
0       leshort   0x5a4d    DOS MZ signature
0       belong    0xcafebabe Java class file
0       lequad    0x1234567890abcdef  (64-bit little-endian)
8       uquad     >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types

TypeSizeEndiannessIEEE 754
float4 bytesnative32-bit
befloat4 bytesbig-endian32-bit
lefloat4 byteslittle-endian32-bit
double8 bytesnative64-bit
bedouble8 bytesbig-endian64-bit
ledouble8 byteslittle-endian64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0       lefloat   =3.14159   File with float value pi
0       bedouble  >1.0       Double value greater than 1.0

Float comparison behavior:

  • Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
  • Ordering: Uses IEEE 754 semantics via partial_cmp
  • NaN: NaN != NaN, comparisons with NaN always return false
  • Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types

TypeSizeEndiannessUTC/LocalDescription
date4 bytesnativeUTC32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
ldate4 bytesnativeLocal32-bit Unix timestamp, formatted as local time
bedate4 bytesbig-endianUTC32-bit Unix timestamp, big-endian byte order, UTC
beldate4 bytesbig-endianLocal32-bit Unix timestamp, big-endian byte order, local time
ledate4 byteslittle-endianUTC32-bit Unix timestamp, little-endian byte order, UTC
leldate4 byteslittle-endianLocal32-bit Unix timestamp, little-endian byte order, local time
qdate8 bytesnativeUTC64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
qldate8 bytesnativeLocal64-bit Unix timestamp, formatted as local time
beqdate8 bytesbig-endianUTC64-bit Unix timestamp, big-endian byte order, UTC
beqldate8 bytesbig-endianLocal64-bit Unix timestamp, big-endian byte order, local time
leqdate8 byteslittle-endianUTC64-bit Unix timestamp, little-endian byte order, UTC
leqldate8 byteslittle-endianLocal64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: “Www Mmm DD HH:MM:SS YYYY”

Examples:

# Match file modified at Unix epoch
0       date      =0        File created at epoch

# Check timestamp in file header (big-endian)
8       bedate    >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16      leqldate  x         \b, timestamp %s

String Types

Match literal string data:

0       string    %PDF      PDF document
0       string    GIF89a    GIF image data

String escape sequences:

  • \x00 - hex byte
  • \n - newline
  • \t - tab
  • \\ - backslash

Pascal String Type

Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.

Length Prefix Width

The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:

SuffixWidthEndiannessRange
/B1 byteN/A0-255 (default)
/H2 bytesbig-endian0-65535
/h2 byteslittle-endian0-65535
/L4 bytesbig-endian0-4294967295
/l4 byteslittle-endian0-4294967295

Self-Inclusive Length (/J Flag)

The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.

Examples

Basic pstring with default 1-byte prefix:

0       pstring   =JPEG     JPEG image (Pascal string)

2-byte big-endian length prefix:

0       pstring/H =JPEG     JPEG image (2-byte BE prefix)

4-byte little-endian length prefix:

0       pstring/l x         \b, name: %s

Self-inclusive length with 2-byte big-endian prefix:

0       pstring/HJ x        \b, JPEG-style length

Self-inclusive length with default 1-byte prefix:

0       pstring/J  x        \b, self-inclusive length

The optional max_length parameter caps the length value:

0       pstring   x         \b, name: %s

String Flags (Not Yet Implemented)

Note: String flags are documented for libmagic compatibility reference but are not yet implemented in libmagic-rs.

FlagDescription
/cCase-insensitive match
/wWhitespace-insensitive
/bMatch at word boundary

Example:

0       string/c  <!doctype  HTML document

Operators

Comparison Operators

OperatorDescriptionExample
=Equal (default)0 long =0xcafebabe
!=Not equal4 byte !=0
>Greater than8 long >1000
<Less than8 long <100
>=Greater than or equal8 long >=1000
<=Less than or equal8 long <=100
&Bitwise AND4 byte &0x80
^Bitwise XOR (not yet implemented)4 byte ^0xff

Bitwise AND with Mask

Test specific bits:

# Check if bit 7 is set
4       byte    &0x80     (compressed)

# Check if lower nibble is 0x0f
4       byte    &0x0f=0x0f (all bits set)

Negation

Prefix operator with ! for negation:

# Match if NOT equal to zero
4       long    !0        (non-zero)

Values

Numeric Values

# Decimal
0       long    1234

# Hexadecimal
0       long    0x4d5a

# Octal
0       byte    0177

String Values

# Plain string
0       string  RIFF

# With escape sequences
0       string  PK\x03\x04

# Unicode (as bytes)
0       string  \xff\xfe

Special Values

ValueDescription
xMatch any value (always true)

Example:

0       string  PK        ZIP archive
>4      short   x         version %d

The x value matches anything and %d formats the matched value.

Nested Rules

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels

Use > prefix for nested rules:

0       string  \x7fELF   ELF
>4      byte    1         32-bit
>4      byte    2         64-bit
>5      byte    1         LSB
>5      byte    2         MSB

Evaluation:

  1. Check offset 0 for ELF magic
  2. If matched, check offset 4 for bit size
  3. If matched, check offset 5 for endianness

Multiple Nesting Levels

0       string  \x7fELF       ELF
>4      byte    2             64-bit
>>5     byte    1             LSB
>>>16   short   2             (shared object)
>>>16   short   3             (executable)

Continuation Messages

Use \b (backspace) to suppress space before message:

0       string  GIF8      GIF image data
>4      byte    7a        \b, version 87a
>4      byte    9a        \b, version 89a

Output: GIF image data, version 89a

Examples

ELF Executable

# ELF (Executable and Linkable Format)
0       string  \x7fELF       ELF
>4      byte    1             32-bit
>4      byte    2             64-bit
>5      byte    1             LSB
>5      byte    2             MSB
>16     leshort 2             (executable)
>16     leshort 3             (shared object)

ZIP Archive

# ZIP archive
0       string  PK\x03\x04    ZIP archive data
>4      leshort x             \b, version %d.%d to extract
>6      leshort &0x0001       \b, encrypted
>6      leshort &0x0008       \b, with data descriptor

JPEG Image

# JPEG
0       string  \xff\xd8\xff  JPEG image data
>3      byte    0xe0          \b, JFIF standard
>3      byte    0xe1          \b, Exif format

PDF Document

# PDF
0       string  %PDF-         PDF document
>5      string  1.            \b, version 1.x
>5      string  2.            \b, version 2.x

PE Executable

# DOS MZ executable with PE header
0       string  MZ            DOS executable
>0x3c   lelong  >0            (PE offset)
>(0x3c.l) string PE\0\0       PE executable

GZIP Compressed

# GZIP
0       string  \x1f\x8b      gzip compressed data
>2      byte    8             \b, deflated
>3      byte    &0x01         \b, ASCII text
>3      byte    &0x02         \b, with header CRC
>3      byte    &0x04         \b, with extra field
>3      byte    &0x08         \b, with original name
>3      byte    &0x10         \b, with comment

PNG Image

# PNG
0       string  \x89PNG\r\n\x1a\n   PNG image data
>16     belong  x                   \b, %d x
>20     belong  x                   %d
>24     byte    0                   \b, grayscale
>24     byte    2                   \b, RGB
>24     byte    3                   \b, palette
>24     byte    4                   \b, grayscale+alpha
>24     byte    6                   \b, RGBA

Floating-Point Values

# Check for specific float value
0       lefloat   =3.14159   File with float value pi

# Float comparison
0       float     >1.0       Float value greater than 1.0

# Double precision
0       bedouble  =0.45455   PNG image with gamma 0.45455

Best Practices

1. Order Rules by Specificity

Put more specific rules first:

# Good: Specific before general
0       string  PK\x03\x04   ZIP archive
0       string  PK           (generic PK signature)

# Bad: General catches all
0       string  PK           (generic PK signature)
0       string  PK\x03\x04   ZIP archive  # Never reached

2. Use Nested Rules for Details

# Good: Hierarchical structure
0       string  \x7fELF   ELF
>4      byte    2         64-bit
>>5     byte    1         LSB

# Bad: Flat rules
0       string  \x7fELF           ELF
4       byte    2                 64-bit
5       byte    1                 LSB

3. Document Complex Rules

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0       string  \xff\xd8\xff    JPEG image data
>3      byte    0xe1            \b, Exif format

4. Test Edge Cases

Consider:

  • Empty files
  • Truncated files
  • Minimum valid file size
  • Maximum offset values

5. Use Appropriate Types

# Good: Match exact size needed
0       leshort 0x5a4d   DOS executable

# Bad: Over-reading
0       lelong  x        (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly

# Good: Explicit endianness
0       lelong  0xcafebabe   (little-endian)
0       belong  0xcafebabe   (big-endian)

# Risky: Native endianness
0       long    0xcafebabe   (platform-dependent)

Supported Features

Currently Supported

  • Absolute offsets
  • Relative offsets
  • Indirect offsets (basic)
  • Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
  • Float and double types (32-bit and 64-bit IEEE 754 floating-point)
  • Date and qdate types (32-bit and 64-bit Unix timestamps)
  • String and pstring types (null-terminated and length-prefixed strings)
  • Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
  • Bitwise AND operator
  • Nested rules
  • Comments

Not Yet Supported

  • Regex patterns
  • 128-bit integer types
  • Use/name directives
  • Default rules

Recently Added

  • Strength modifiers: The !:strength directive for adjusting rule priority
  • 64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
  • Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting

Rule Not Matching

  1. Check offset is correct (0-indexed)
  2. Verify endianness matches file format
  3. Test with hexdump -C file | head
  4. Ensure no conflicting rules

Unexpected Results

  1. Check rule order (first match wins)
  2. Verify nested rule levels
  3. Test with simpler rules first

Performance Issues

  1. Avoid unnecessary string searches
  2. Use specific offsets over searches
  3. Order rules by likelihood of match

See Also