Magic File Format

Magic files define rules for identifying file types through byte-level patterns.

Overview

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

Offset - Where to look in the file
Type - How to interpret the bytes
Value - What to match against
Message - Description to display on match

Basic Format

offset  type  value  message

Example:

0       string  PK    ZIP archive data

This rule matches files starting with “PK” and labels them as “ZIP archive data”.

Basic Syntax

Rule Structure

[level>]offset    type    [operator]value    message

Component	Required	Description
`level>`	No	Indentation level for nested rules
`offset`	Yes	Where to read data
`type`	Yes	Data type to read
`operator`	No	Comparison operator (default: `=`)
`value`	Yes	Expected value
`message`	Yes	Description text

Comments

Lines starting with # are comments:

# This is a comment
0  string  PK  ZIP archive

Whitespace

Fields are separated by whitespace (spaces or tabs)
Leading whitespace indicates rule nesting level
Trailing whitespace is ignored

Offset Specifications

Absolute Offset

Direct byte position from file start:

0       string  \x7fELF   ELF executable
16      short   2         (shared object)

Hexadecimal Offset

Use 0x prefix for hex offsets:

0x0     string  MZ        DOS executable
0x3c    long    >0        (PE offset present)

Negative Offset (From End)

Read from end of file:

-4      string  .ZIP      ZIP file (end marker)

Indirect Offset

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l)   string  PE\0\0  PE executable

Indirect offset syntax:

(base.type) - Read pointer at base, interpret as type
(base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

.b - byte (1 byte)
.s - short (2 bytes)
.l - long (4 bytes)
.q - quad (8 bytes)

Relative Offset

Offset relative to previous match:

0       string  PK\x03\x04   ZIP archive
&2      short   >0           (with data)

The & prefix indicates relative offset.

Type Specifications

Integer Types

Type	Size	Endianness
`byte`	1 byte	N/A
`short`	2 bytes	native
`leshort`	2 bytes	little-endian
`beshort`	2 bytes	big-endian
`long`	4 bytes	native
`lelong`	4 bytes	little-endian
`belong`	4 bytes	big-endian
`quad`	8 bytes	native
`lequad`	8 bytes	little-endian
`bequad`	8 bytes	big-endian

All integer types have unsigned variants prefixed with u:

ubyte, ushort, uleshort, ubeshort
ulong, ulelong, ubelong
uquad, ulequad, ubequad

Examples:

0       byte      0x7f      (byte match)
0       leshort   0x5a4d    DOS MZ signature
0       belong    0xcafebabe Java class file
0       lequad    0x1234567890abcdef  (64-bit little-endian)
8       uquad     >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types

Type	Size	Endianness	IEEE 754
`float`	4 bytes	native	32-bit
`befloat`	4 bytes	big-endian	32-bit
`lefloat`	4 bytes	little-endian	32-bit
`double`	8 bytes	native	64-bit
`bedouble`	8 bytes	big-endian	64-bit
`ledouble`	8 bytes	little-endian	64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0       lefloat   =3.14159   File with float value pi
0       bedouble  >1.0       Double value greater than 1.0

Float comparison behavior:

Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
Ordering: Uses IEEE 754 semantics via partial_cmp
NaN: NaN != NaN, comparisons with NaN always return false
Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types

Type	Size	Endianness	UTC/Local	Description
`date`	4 bytes	native	UTC	32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
`ldate`	4 bytes	native	Local	32-bit Unix timestamp, formatted as local time
`bedate`	4 bytes	big-endian	UTC	32-bit Unix timestamp, big-endian byte order, UTC
`beldate`	4 bytes	big-endian	Local	32-bit Unix timestamp, big-endian byte order, local time
`ledate`	4 bytes	little-endian	UTC	32-bit Unix timestamp, little-endian byte order, UTC
`leldate`	4 bytes	little-endian	Local	32-bit Unix timestamp, little-endian byte order, local time
`qdate`	8 bytes	native	UTC	64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
`qldate`	8 bytes	native	Local	64-bit Unix timestamp, formatted as local time
`beqdate`	8 bytes	big-endian	UTC	64-bit Unix timestamp, big-endian byte order, UTC
`beqldate`	8 bytes	big-endian	Local	64-bit Unix timestamp, big-endian byte order, local time
`leqdate`	8 bytes	little-endian	UTC	64-bit Unix timestamp, little-endian byte order, UTC
`leqldate`	8 bytes	little-endian	Local	64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: “Www Mmm DD HH:MM:SS YYYY”

Examples:

# Match file modified at Unix epoch
0       date      =0        File created at epoch

# Check timestamp in file header (big-endian)
8       bedate    >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16      leqldate  x         \b, timestamp %s

String Types

Match literal string data:

0       string    %PDF      PDF document
0       string    GIF89a    GIF image data

String escape sequences:

\x00 - hex byte
\n - newline
\t - tab
\\ - backslash

Pascal String Type

Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.

Length Prefix Width

The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:

Suffix	Width	Endianness	Range
`/B`	1 byte	N/A	0-255 (default)
`/H`	2 bytes	big-endian	0-65535
`/h`	2 bytes	little-endian	0-65535
`/L`	4 bytes	big-endian	0-4294967295
`/l`	4 bytes	little-endian	0-4294967295

Self-Inclusive Length (`/J` Flag)

The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.

Examples

Basic pstring with default 1-byte prefix:

0       pstring   =JPEG     JPEG image (Pascal string)

2-byte big-endian length prefix:

0       pstring/H =JPEG     JPEG image (2-byte BE prefix)

4-byte little-endian length prefix:

0       pstring/l x         \b, name: %s

Self-inclusive length with 2-byte big-endian prefix:

0       pstring/HJ x        \b, JPEG-style length

Self-inclusive length with default 1-byte prefix:

0       pstring/J  x        \b, self-inclusive length

The optional max_length parameter caps the length value:

0       pstring   x         \b, name: %s

String Flags

String flags are now implemented (issue #234, landed in PR #288), providing libmagic-compatible string comparison semantics.

Flag	Description
`/c`	Case-insensitive (lowercase pattern chars trigger fold)
`/C`	Case-insensitive (uppercase pattern chars trigger fold)
`/w`	Whitespace-optional (pattern whitespace matches zero or more)
`/W`	Whitespace-required-compact (at least one, greedy consume)
`/T`	Trim leading/trailing ASCII whitespace from pattern
`/f`	Full-word match (post-match word boundary check)
`/b`	Force binary test (hint for MIME output)
`/t`	Force text test (hint for MIME output)

Note: /c and /C are asymmetric — the pattern character controls fold direction. With /c, only lowercase pattern chars cause the file byte to be folded to lowercase. With /C, only uppercase pattern chars cause the file byte to be folded to uppercase. See GOTCHAS section S6.5 for details on mixed-case behavior. /B (uppercase) is not a string flag; it is reserved for pstring length-width specification and is rejected on string types.

Examples:

# Case-insensitive match
0       string/c  <!doctype  HTML document

# Whitespace-optional (matches "ab", "a b", "a  b")
0       string/w  a b        Pattern with flexible whitespace

# Combined flags
0       string/cw <!doctype html>  HTML document (case and space insensitive)

# Full-word boundary check
0       string/f  int        C int keyword (not "integer")

# Trim leading/trailing whitespace from the pattern (`/T` = STRING_TRIM)
0       string/T  "  hello  "  Hello marker (matches "hello" without surrounding spaces)

# Binary-mode hint (`/b` = STRING_BINTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
24      string/b  FTCOMP      FTCOMP compressed archive

# Text-mode hint (`/t` = STRING_TEXTTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
0       string/t  #!/bin/sh   POSIX shell script text

Note on /T empty patterns: string/T " " trims to an empty pattern. The evaluator treats this as no-match (with a warn! log) rather than letting it silently match every file. Fix the rule.

Search Flags

Search flags are specified as /flags after the range in search types: search/N/<flags>. libmagic-rs implements the full search-type flag semantics (issue #235).

Search flags share most semantics with string flags. Eight flags (/c, /C, /w, /W, /T, /f, /t, /b) carry the same comparison-altering or metadata-hint meanings as their string-type counterparts. The ninth flag, /s, is search-specific: it controls where the previous-match anchor lands for relative-offset children.

Flag	Description
`/s`	Start anchor: sets the previous-match anchor to match-START instead of match-END for relative-offset children
`/c`	Case-insensitive (lowercase): pattern lowercase letters match both cases in buffer
`/C`	Case-insensitive (uppercase): pattern uppercase letters match both cases in buffer
`/w`	Optional whitespace: pattern whitespace matches zero-or-more buffer whitespace
`/W`	Compact whitespace: pattern whitespace requires ≥1 buffer whitespace
`/T`	Trim whitespace: leading/trailing whitespace in pattern is ignored
`/f`	Full word: post-match word boundary check (same semantics as string type)
`/t`	Text test hint: MIME output hint (parsed, no comparison effect)
`/b`	Binary test hint: MIME output hint (parsed, no comparison effect)

Performance note: Flags /c, /C, /w, /W, /T, /f force byte-by-byte comparison, while /s, /t, /b preserve the fast SIMD-accelerated search path (via memchr::memmem::find).

/s anchor semantics: By default, a search match advances the previous-match anchor to the byte just past the matched pattern (match-END). With /s, the anchor lands on the first byte of the match (match-START). This is required for file formats that place magic signatures in trailers or use relative-offset children that reference the signature start (TGA footer, sfnt name table).

Examples:

# TGA footer with start-anchor (images:114)
# The magic string "TRUEVISION-XFILE.\0" is in the trailer; /s lets
# relative-offset children resolve against the signature's start position
0       search/4261301/s  TRUEVISION-XFILE.\0  TGA image data
>-8     lelong            x                   \b, offset %d

# Python shebang with optional whitespace (commands:20)
# Pattern has one space; /w allows zero or more whitespace in the file
0       search/1/w  #!\040/usr/bin/python  Python script text executable

# BinHex with binary hint (macintosh:17)
# /b is parsed and stored; comparison-time MIME effect deferred to !:mime
0       search/2652/b  (This\ file\ must\ be\ converted\ with\ BinHex  BinHex binary text

Note on /T empty patterns: the /N range is mandatory, so the example must carry a window like /256. A rule such as search/256/T " " (or any search/N/T with a whitespace-only pattern) trims to an empty pattern, and the evaluator treats that as no-match (with a warn! log) rather than letting it silently match every offset. Fix the rule. Bare search/T does not reach the evaluator at all – it is a parse error before the trim ever runs.

Operators

Comparison Operators

Operator	Description	Example
`=`	Equal (default)	`0 long =0xcafebabe`
`!=`	Not equal	`4 byte !=0`
`>`	Greater than	`8 long >1000`
`<`	Less than	`8 long <100`
`>=`	Greater than or equal	`8 long >=1000`
`<=`	Less than or equal	`8 long <=100`
`&`	Bitwise AND	`4 byte &0x80`
`^`	Bitwise XOR (not yet implemented)	`4 byte ^0xff`

Bitwise AND with Mask

Test specific bits:

# Check if bit 7 is set
4       byte    &0x80     (compressed)

# Check if lower nibble is 0x0f
4       byte    &0x0f=0x0f (all bits set)

Negation

Prefix operator with ! for negation:

# Match if NOT equal to zero
4       long    !0        (non-zero)

Values

Numeric Values

# Decimal
0       long    1234

# Hexadecimal
0       long    0x4d5a

# Octal
0       byte    0177

String Values

# Plain string
0       string  RIFF

# With escape sequences
0       string  PK\x03\x04

# Unicode (as bytes)
0       string  \xff\xfe

Special Values

Value	Description
`x`	Match any value (always true)

Example:

0       string  PK        ZIP archive
>4      short   x         version %d

The x value matches anything and %d formats the matched value.

Nested Rules

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels

Use > prefix for nested rules:

0       string  \x7fELF   ELF
>4      byte    1         32-bit
>4      byte    2         64-bit
>5      byte    1         LSB
>5      byte    2         MSB

Evaluation:

Check offset 0 for ELF magic
If matched, check offset 4 for bit size
If matched, check offset 5 for endianness

Multiple Nesting Levels

0       string  \x7fELF       ELF
>4      byte    2             64-bit
>>5     byte    1             LSB
>>>16   short   2             (shared object)
>>>16   short   3             (executable)

Continuation Messages

Use \b (backspace) to suppress space before message:

0       string  GIF8      GIF image data
>4      byte    7a        \b, version 87a
>4      byte    9a        \b, version 89a

Output: GIF image data, version 89a

Examples

ELF Executable

# ELF (Executable and Linkable Format)
0       string  \x7fELF       ELF
>4      byte    1             32-bit
>4      byte    2             64-bit
>5      byte    1             LSB
>5      byte    2             MSB
>16     leshort 2             (executable)
>16     leshort 3             (shared object)

ZIP Archive

# ZIP archive
0       string  PK\x03\x04    ZIP archive data
>4      leshort x             \b, version %d.%d to extract
>6      leshort &0x0001       \b, encrypted
>6      leshort &0x0008       \b, with data descriptor

JPEG Image

# JPEG
0       string  \xff\xd8\xff  JPEG image data
>3      byte    0xe0          \b, JFIF standard
>3      byte    0xe1          \b, Exif format

PDF Document

# PDF
0       string  %PDF-         PDF document
>5      string  1.            \b, version 1.x
>5      string  2.            \b, version 2.x

PE Executable

# DOS MZ executable with PE header
0       string  MZ            DOS executable
>0x3c   lelong  >0            (PE offset)
>(0x3c.l) string PE\0\0       PE executable

GZIP Compressed

# GZIP
0       string  \x1f\x8b      gzip compressed data
>2      byte    8             \b, deflated
>3      byte    &0x01         \b, ASCII text
>3      byte    &0x02         \b, with header CRC
>3      byte    &0x04         \b, with extra field
>3      byte    &0x08         \b, with original name
>3      byte    &0x10         \b, with comment

PNG Image

# PNG
0       string  \x89PNG\r\n\x1a\n   PNG image data
>16     belong  x                   \b, %d x
>20     belong  x                   %d
>24     byte    0                   \b, grayscale
>24     byte    2                   \b, RGB
>24     byte    3                   \b, palette
>24     byte    4                   \b, grayscale+alpha
>24     byte    6                   \b, RGBA

Floating-Point Values

# Check for specific float value
0       lefloat   =3.14159   File with float value pi

# Float comparison
0       float     >1.0       Float value greater than 1.0

# Double precision
0       bedouble  =0.45455   PNG image with gamma 0.45455

Meta-types / Control Directives

Meta-types are pseudo-types that do not read bytes from the buffer. Instead, they control the evaluation flow: defining named subroutines, invoking them, providing fallbacks when no sibling matched, resetting per-level match state, or re-applying the entire rule database at a resolved offset.

Keyword	Syntax	Description
`name <id>`	`0 name part2`	Defines a named subroutine block; children are the subroutine body
`use <id>`	`>0 use part2`	Invokes a named subroutine at the resolved offset
`default`	`0 default x Fallback`	Fires only when no sibling at the same level has matched
`clear`	`0 clear`	Resets the per-level sibling-matched flag
`indirect`	`8 indirect x`	Re-applies the full rule database at the resolved offset
`offset`	`0 offset x at_offset %lld`	Emits the resolved file position as a `Value::Uint` for printf-style substitution

`name` and `use` — Named Subroutines

name <id> defines a named subroutine block at the top level; its children are the subroutine body. use <id> invokes that subroutine at a given offset.

# Define a reusable subroutine
0       name    part2
>0      search/64    ABC       found_ABC
>>&0    byte    x            followed_by 0x%x

# Top-level rule that invokes the subroutine
0       string  TEST          Testfmt
>0      use     part2
>64     use     part2

Top-level name blocks are hoisted out of the flat rule list at parse time into a NameTable keyed by identifier. Duplicate names retain the first definition and emit a warning. name rules nested inside another rule’s children are not well-defined in magic(5) and are scrubbed at load time.

`default` — Fallback Rule

A default rule at a given level fires only when none of its siblings at the same level have matched. The operator is conventionally x (any-value), and the value column is ignored.

0       byte    0xAA    Real-Match
0       default x       DEFAULT-FALLBACK

Against a buffer starting with 0xAA, only Real-Match fires. Against a buffer starting with any other byte, DEFAULT-FALLBACK fires.

`clear` — Reset Sibling-Matched Flag

A clear directive resets the per-level “sibling matched” flag, so a subsequent default at the same level can fire again even after an earlier sibling matched. Pair with EvaluationConfig::with_stop_at_first_match(false) to walk all top-level siblings.

0       byte    0xAA    Match-A
0       default x       DEFAULT-SKIPPED
0       clear
0       default x       DEFAULT-FIRES

Against a buffer starting with 0xAA: Match-A fires, DEFAULT-SKIPPED is suppressed (a sibling matched), clear resets the flag, and DEFAULT-FIRES fires.

`indirect` — Re-apply Root Rules at a Resolved Offset

An indirect rule resolves its offset, slices the buffer at that point, and re-applies the full rule database against the sub-buffer. Recursion is bounded by EvaluationConfig::max_recursion_depth.

0       byte    0x42    Inner-Match
8       indirect x

Against a 16-byte buffer with buf[8] = 0x42: the top-level byte rule at offset 0 does not match, and the indirect rule re-applies the root rules at offset 8 — where buf[8] = 0x42 matches the inner byte rule, producing Inner-Match.

Best Practices

1. Order Rules by Specificity

Put more specific rules first:

# Good: Specific before general
0       string  PK\x03\x04   ZIP archive
0       string  PK           (generic PK signature)

# Bad: General catches all
0       string  PK           (generic PK signature)
0       string  PK\x03\x04   ZIP archive  # Never reached

2. Use Nested Rules for Details

# Good: Hierarchical structure
0       string  \x7fELF   ELF
>4      byte    2         64-bit
>>5     byte    1         LSB

# Bad: Flat rules
0       string  \x7fELF           ELF
4       byte    2                 64-bit
5       byte    1                 LSB

3. Document Complex Rules

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0       string  \xff\xd8\xff    JPEG image data
>3      byte    0xe1            \b, Exif format

4. Test Edge Cases

Consider:

Empty files
Truncated files
Minimum valid file size
Maximum offset values

5. Use Appropriate Types

# Good: Match exact size needed
0       leshort 0x5a4d   DOS executable

# Bad: Over-reading
0       lelong  x        (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly

# Good: Explicit endianness
0       lelong  0xcafebabe   (little-endian)
0       belong  0xcafebabe   (big-endian)

# Risky: Native endianness
0       long    0xcafebabe   (platform-dependent)

Supported Features

Currently Supported

Absolute offsets
Relative offsets
Indirect offsets (basic)
Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
Float and double types (32-bit and 64-bit IEEE 754 floating-point)
Date and qdate types (32-bit and 64-bit Unix timestamps)
String and pstring types (null-terminated and length-prefixed strings)
Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
Bitwise AND operator
Nested rules
Comments

Not Yet Supported

Regex patterns
128-bit integer types

Recently Added

Strength modifiers: The !:strength directive for adjusting rule priority
64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting

Rule Not Matching

Check offset is correct (0-indexed)
Verify endianness matches file format
Test with hexdump -C file | head
Ensure no conflicting rules

Unexpected Results

Check rule order (first match wins)
Verify nested rule levels
Test with simpler rules first

Performance Issues

Avoid unnecessary string searches
Use specific offsets over searches
Order rules by likelihood of match

Keyboard shortcuts

Libmagic-rs Developer Guide