Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Magic File Format

Magic files define rules for identifying file types through byte-level patterns.

Overview

Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:

  1. Offset - Where to look in the file
  2. Type - How to interpret the bytes
  3. Value - What to match against
  4. Message - Description to display on match

Basic Format

offset  type  value  message

Example:

0       string  PK    ZIP archive data

This rule matches files starting with “PK” and labels them as “ZIP archive data”.

Basic Syntax

Rule Structure

[level>]offset    type    [operator]value    message
ComponentRequiredDescription
level>NoIndentation level for nested rules
offsetYesWhere to read data
typeYesData type to read
operatorNoComparison operator (default: =)
valueYesExpected value
messageYesDescription text

Comments

Lines starting with # are comments:

# This is a comment
0  string  PK  ZIP archive

Whitespace

  • Fields are separated by whitespace (spaces or tabs)
  • Leading whitespace indicates rule nesting level
  • Trailing whitespace is ignored

Offset Specifications

Absolute Offset

Direct byte position from file start:

0       string  \x7fELF   ELF executable
16      short   2         (shared object)

Hexadecimal Offset

Use 0x prefix for hex offsets:

0x0     string  MZ        DOS executable
0x3c    long    >0        (PE offset present)

Negative Offset (From End)

Read from end of file:

-4      string  .ZIP      ZIP file (end marker)

Indirect Offset

Read pointer value and use as offset:

# Read 4-byte pointer at offset 60, then check that location
(0x3c.l)   string  PE\0\0  PE executable

Indirect offset syntax:

  • (base.type) - Read pointer at base, interpret as type
  • (base.type+adj) - Add adjustment to pointer value

Types for indirect offsets:

  • .b - byte (1 byte)
  • .s - short (2 bytes)
  • .l - long (4 bytes)
  • .q - quad (8 bytes)

Relative Offset

Offset relative to previous match:

0       string  PK\x03\x04   ZIP archive
&2      short   >0           (with data)

The & prefix indicates relative offset.

Type Specifications

Integer Types

TypeSizeEndianness
byte1 byteN/A
short2 bytesnative
leshort2 byteslittle-endian
beshort2 bytesbig-endian
long4 bytesnative
lelong4 byteslittle-endian
belong4 bytesbig-endian
quad8 bytesnative
lequad8 byteslittle-endian
bequad8 bytesbig-endian

All integer types have unsigned variants prefixed with u:

  • ubyte, ushort, uleshort, ubeshort
  • ulong, ulelong, ubelong
  • uquad, ulequad, ubequad

Examples:

0       byte      0x7f      (byte match)
0       leshort   0x5a4d    DOS MZ signature
0       belong    0xcafebabe Java class file
0       lequad    0x1234567890abcdef  (64-bit little-endian)
8       uquad     >0x8000000000000000 (unsigned 64-bit check)

Floating-Point Types

TypeSizeEndiannessIEEE 754
float4 bytesnative32-bit
befloat4 bytesbig-endian32-bit
lefloat4 byteslittle-endian32-bit
double8 bytesnative64-bit
bedouble8 bytesbig-endian64-bit
ledouble8 byteslittle-endian64-bit

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

0       lefloat   =3.14159   File with float value pi
0       bedouble  >1.0       Double value greater than 1.0

Float comparison behavior:

  • Equality: Uses epsilon-aware comparison (f64::EPSILON tolerance)
  • Ordering: Uses IEEE 754 semantics via partial_cmp
  • NaN: NaN != NaN, comparisons with NaN always return false
  • Infinity: Positive and negative infinity are properly ordered

Date/Timestamp Types

TypeSizeEndiannessUTC/LocalDescription
date4 bytesnativeUTC32-bit Unix timestamp (signed seconds since epoch), formatted as UTC
ldate4 bytesnativeLocal32-bit Unix timestamp, formatted as local time
bedate4 bytesbig-endianUTC32-bit Unix timestamp, big-endian byte order, UTC
beldate4 bytesbig-endianLocal32-bit Unix timestamp, big-endian byte order, local time
ledate4 byteslittle-endianUTC32-bit Unix timestamp, little-endian byte order, UTC
leldate4 byteslittle-endianLocal32-bit Unix timestamp, little-endian byte order, local time
qdate8 bytesnativeUTC64-bit Unix timestamp (signed seconds since epoch), formatted as UTC
qldate8 bytesnativeLocal64-bit Unix timestamp, formatted as local time
beqdate8 bytesbig-endianUTC64-bit Unix timestamp, big-endian byte order, UTC
beqldate8 bytesbig-endianLocal64-bit Unix timestamp, big-endian byte order, local time
leqdate8 byteslittle-endianUTC64-bit Unix timestamp, little-endian byte order, UTC
leqldate8 byteslittle-endianLocal64-bit Unix timestamp, little-endian byte order, local time

Timestamp values are formatted as strings matching GNU file output format: “Www Mmm DD HH:MM:SS YYYY”

Examples:

# Match file modified at Unix epoch
0       date      =0        File created at epoch

# Check timestamp in file header (big-endian)
8       bedate    >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16      leqldate  x         \b, timestamp %s

String Types

Match literal string data:

0       string    %PDF      PDF document
0       string    GIF89a    GIF image data

String escape sequences:

  • \x00 - hex byte
  • \n - newline
  • \t - tab
  • \\ - backslash

Pascal String Type

Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.

Length Prefix Width

The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:

SuffixWidthEndiannessRange
/B1 byteN/A0-255 (default)
/H2 bytesbig-endian0-65535
/h2 byteslittle-endian0-65535
/L4 bytesbig-endian0-4294967295
/l4 byteslittle-endian0-4294967295

Self-Inclusive Length (/J Flag)

The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.

Examples

Basic pstring with default 1-byte prefix:

0       pstring   =JPEG     JPEG image (Pascal string)

2-byte big-endian length prefix:

0       pstring/H =JPEG     JPEG image (2-byte BE prefix)

4-byte little-endian length prefix:

0       pstring/l x         \b, name: %s

Self-inclusive length with 2-byte big-endian prefix:

0       pstring/HJ x        \b, JPEG-style length

Self-inclusive length with default 1-byte prefix:

0       pstring/J  x        \b, self-inclusive length

The optional max_length parameter caps the length value:

0       pstring   x         \b, name: %s

String Flags

String flags are now implemented (issue #234, landed in PR #288), providing libmagic-compatible string comparison semantics.

FlagDescription
/cCase-insensitive (lowercase pattern chars trigger fold)
/CCase-insensitive (uppercase pattern chars trigger fold)
/wWhitespace-optional (pattern whitespace matches zero or more)
/WWhitespace-required-compact (at least one, greedy consume)
/TTrim leading/trailing ASCII whitespace from pattern
/fFull-word match (post-match word boundary check)
/bForce binary test (hint for MIME output)
/tForce text test (hint for MIME output)

Note: /c and /C are asymmetric — the pattern character controls fold direction. With /c, only lowercase pattern chars cause the file byte to be folded to lowercase. With /C, only uppercase pattern chars cause the file byte to be folded to uppercase. See GOTCHAS section S6.5 for details on mixed-case behavior. /B (uppercase) is not a string flag; it is reserved for pstring length-width specification and is rejected on string types.

Examples:

# Case-insensitive match
0       string/c  <!doctype  HTML document

# Whitespace-optional (matches "ab", "a b", "a  b")
0       string/w  a b        Pattern with flexible whitespace

# Combined flags
0       string/cw <!doctype html>  HTML document (case and space insensitive)

# Full-word boundary check
0       string/f  int        C int keyword (not "integer")

# Trim leading/trailing whitespace from the pattern (`/T` = STRING_TRIM)
0       string/T  "  hello  "  Hello marker (matches "hello" without surrounding spaces)

# Binary-mode hint (`/b` = STRING_BINTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
24      string/b  FTCOMP      FTCOMP compressed archive

# Text-mode hint (`/t` = STRING_TEXTTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
0       string/t  #!/bin/sh   POSIX shell script text

Note on /T empty patterns: string/T " " trims to an empty pattern. The evaluator treats this as no-match (with a warn! log) rather than letting it silently match every file. Fix the rule.

Search Flags

Search flags are specified as /flags after the range in search types: search/N/<flags>. libmagic-rs implements the full search-type flag semantics (issue #235).

Search flags share most semantics with string flags. Eight flags (/c, /C, /w, /W, /T, /f, /t, /b) carry the same comparison-altering or metadata-hint meanings as their string-type counterparts. The ninth flag, /s, is search-specific: it controls where the previous-match anchor lands for relative-offset children.

FlagDescription
/sStart anchor: sets the previous-match anchor to match-START instead of match-END for relative-offset children
/cCase-insensitive (lowercase): pattern lowercase letters match both cases in buffer
/CCase-insensitive (uppercase): pattern uppercase letters match both cases in buffer
/wOptional whitespace: pattern whitespace matches zero-or-more buffer whitespace
/WCompact whitespace: pattern whitespace requires ≥1 buffer whitespace
/TTrim whitespace: leading/trailing whitespace in pattern is ignored
/fFull word: post-match word boundary check (same semantics as string type)
/tText test hint: MIME output hint (parsed, no comparison effect)
/bBinary test hint: MIME output hint (parsed, no comparison effect)

Performance note: Flags /c, /C, /w, /W, /T, /f force byte-by-byte comparison, while /s, /t, /b preserve the fast SIMD-accelerated search path (via memchr::memmem::find).

/s anchor semantics: By default, a search match advances the previous-match anchor to the byte just past the matched pattern (match-END). With /s, the anchor lands on the first byte of the match (match-START). This is required for file formats that place magic signatures in trailers or use relative-offset children that reference the signature start (TGA footer, sfnt name table).

Examples:

# TGA footer with start-anchor (images:114)
# The magic string "TRUEVISION-XFILE.\0" is in the trailer; /s lets
# relative-offset children resolve against the signature's start position
0       search/4261301/s  TRUEVISION-XFILE.\0  TGA image data
>-8     lelong            x                   \b, offset %d

# Python shebang with optional whitespace (commands:20)
# Pattern has one space; /w allows zero or more whitespace in the file
0       search/1/w  #!\040/usr/bin/python  Python script text executable

# BinHex with binary hint (macintosh:17)
# /b is parsed and stored; comparison-time MIME effect deferred to !:mime
0       search/2652/b  (This\ file\ must\ be\ converted\ with\ BinHex  BinHex binary text

Note on /T empty patterns: the /N range is mandatory, so the example must carry a window like /256. A rule such as search/256/T " " (or any search/N/T with a whitespace-only pattern) trims to an empty pattern, and the evaluator treats that as no-match (with a warn! log) rather than letting it silently match every offset. Fix the rule. Bare search/T does not reach the evaluator at all – it is a parse error before the trim ever runs.

Operators

Comparison Operators

OperatorDescriptionExample
=Equal (default)0 long =0xcafebabe
!=Not equal4 byte !=0
>Greater than8 long >1000
<Less than8 long <100
>=Greater than or equal8 long >=1000
<=Less than or equal8 long <=100
&Bitwise AND4 byte &0x80
^Bitwise XOR (not yet implemented)4 byte ^0xff

Bitwise AND with Mask

Test specific bits:

# Check if bit 7 is set
4       byte    &0x80     (compressed)

# Check if lower nibble is 0x0f
4       byte    &0x0f=0x0f (all bits set)

Negation

Prefix operator with ! for negation:

# Match if NOT equal to zero
4       long    !0        (non-zero)

Values

Numeric Values

# Decimal
0       long    1234

# Hexadecimal
0       long    0x4d5a

# Octal
0       byte    0177

String Values

# Plain string
0       string  RIFF

# With escape sequences
0       string  PK\x03\x04

# Unicode (as bytes)
0       string  \xff\xfe

Special Values

ValueDescription
xMatch any value (always true)

Example:

0       string  PK        ZIP archive
>4      short   x         version %d

The x value matches anything and %d formats the matched value.

Nested Rules

Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.

Indentation Levels

Use > prefix for nested rules:

0       string  \x7fELF   ELF
>4      byte    1         32-bit
>4      byte    2         64-bit
>5      byte    1         LSB
>5      byte    2         MSB

Evaluation:

  1. Check offset 0 for ELF magic
  2. If matched, check offset 4 for bit size
  3. If matched, check offset 5 for endianness

Multiple Nesting Levels

0       string  \x7fELF       ELF
>4      byte    2             64-bit
>>5     byte    1             LSB
>>>16   short   2             (shared object)
>>>16   short   3             (executable)

Continuation Messages

Use \b (backspace) to suppress space before message:

0       string  GIF8      GIF image data
>4      byte    7a        \b, version 87a
>4      byte    9a        \b, version 89a

Output: GIF image data, version 89a

Examples

ELF Executable

# ELF (Executable and Linkable Format)
0       string  \x7fELF       ELF
>4      byte    1             32-bit
>4      byte    2             64-bit
>5      byte    1             LSB
>5      byte    2             MSB
>16     leshort 2             (executable)
>16     leshort 3             (shared object)

ZIP Archive

# ZIP archive
0       string  PK\x03\x04    ZIP archive data
>4      leshort x             \b, version %d.%d to extract
>6      leshort &0x0001       \b, encrypted
>6      leshort &0x0008       \b, with data descriptor

JPEG Image

# JPEG
0       string  \xff\xd8\xff  JPEG image data
>3      byte    0xe0          \b, JFIF standard
>3      byte    0xe1          \b, Exif format

PDF Document

# PDF
0       string  %PDF-         PDF document
>5      string  1.            \b, version 1.x
>5      string  2.            \b, version 2.x

PE Executable

# DOS MZ executable with PE header
0       string  MZ            DOS executable
>0x3c   lelong  >0            (PE offset)
>(0x3c.l) string PE\0\0       PE executable

GZIP Compressed

# GZIP
0       string  \x1f\x8b      gzip compressed data
>2      byte    8             \b, deflated
>3      byte    &0x01         \b, ASCII text
>3      byte    &0x02         \b, with header CRC
>3      byte    &0x04         \b, with extra field
>3      byte    &0x08         \b, with original name
>3      byte    &0x10         \b, with comment

PNG Image

# PNG
0       string  \x89PNG\r\n\x1a\n   PNG image data
>16     belong  x                   \b, %d x
>20     belong  x                   %d
>24     byte    0                   \b, grayscale
>24     byte    2                   \b, RGB
>24     byte    3                   \b, palette
>24     byte    4                   \b, grayscale+alpha
>24     byte    6                   \b, RGBA

Floating-Point Values

# Check for specific float value
0       lefloat   =3.14159   File with float value pi

# Float comparison
0       float     >1.0       Float value greater than 1.0

# Double precision
0       bedouble  =0.45455   PNG image with gamma 0.45455

Meta-types / Control Directives

Meta-types are pseudo-types that do not read bytes from the buffer. Instead, they control the evaluation flow: defining named subroutines, invoking them, providing fallbacks when no sibling matched, resetting per-level match state, or re-applying the entire rule database at a resolved offset.

KeywordSyntaxDescription
name <id>0 name part2Defines a named subroutine block; children are the subroutine body
use <id>>0 use part2Invokes a named subroutine at the resolved offset
default0 default x FallbackFires only when no sibling at the same level has matched
clear0 clearResets the per-level sibling-matched flag
indirect8 indirect xRe-applies the full rule database at the resolved offset
offset0 offset x at_offset %lldEmits the resolved file position as a Value::Uint for printf-style substitution

name and use — Named Subroutines

name <id> defines a named subroutine block at the top level; its children are the subroutine body. use <id> invokes that subroutine at a given offset.

# Define a reusable subroutine
0       name    part2
>0      search/64    ABC       found_ABC
>>&0    byte    x            followed_by 0x%x

# Top-level rule that invokes the subroutine
0       string  TEST          Testfmt
>0      use     part2
>64     use     part2

Top-level name blocks are hoisted out of the flat rule list at parse time into a NameTable keyed by identifier. Duplicate names retain the first definition and emit a warning. name rules nested inside another rule’s children are not well-defined in magic(5) and are scrubbed at load time.

default — Fallback Rule

A default rule at a given level fires only when none of its siblings at the same level have matched. The operator is conventionally x (any-value), and the value column is ignored.

0       byte    0xAA    Real-Match
0       default x       DEFAULT-FALLBACK

Against a buffer starting with 0xAA, only Real-Match fires. Against a buffer starting with any other byte, DEFAULT-FALLBACK fires.

clear — Reset Sibling-Matched Flag

A clear directive resets the per-level “sibling matched” flag, so a subsequent default at the same level can fire again even after an earlier sibling matched. Pair with EvaluationConfig::with_stop_at_first_match(false) to walk all top-level siblings.

0       byte    0xAA    Match-A
0       default x       DEFAULT-SKIPPED
0       clear
0       default x       DEFAULT-FIRES

Against a buffer starting with 0xAA: Match-A fires, DEFAULT-SKIPPED is suppressed (a sibling matched), clear resets the flag, and DEFAULT-FIRES fires.

indirect — Re-apply Root Rules at a Resolved Offset

An indirect rule resolves its offset, slices the buffer at that point, and re-applies the full rule database against the sub-buffer. Recursion is bounded by EvaluationConfig::max_recursion_depth.

0       byte    0x42    Inner-Match
8       indirect x

Against a 16-byte buffer with buf[8] = 0x42: the top-level byte rule at offset 0 does not match, and the indirect rule re-applies the root rules at offset 8 — where buf[8] = 0x42 matches the inner byte rule, producing Inner-Match.

Best Practices

1. Order Rules by Specificity

Put more specific rules first:

# Good: Specific before general
0       string  PK\x03\x04   ZIP archive
0       string  PK           (generic PK signature)

# Bad: General catches all
0       string  PK           (generic PK signature)
0       string  PK\x03\x04   ZIP archive  # Never reached

2. Use Nested Rules for Details

# Good: Hierarchical structure
0       string  \x7fELF   ELF
>4      byte    2         64-bit
>>5     byte    1         LSB

# Bad: Flat rules
0       string  \x7fELF           ELF
4       byte    2                 64-bit
5       byte    1                 LSB

3. Document Complex Rules

# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0       string  \xff\xd8\xff    JPEG image data
>3      byte    0xe1            \b, Exif format

4. Test Edge Cases

Consider:

  • Empty files
  • Truncated files
  • Minimum valid file size
  • Maximum offset values

5. Use Appropriate Types

# Good: Match exact size needed
0       leshort 0x5a4d   DOS executable

# Bad: Over-reading
0       lelong  x        (reads 4 bytes when 2 needed)

6. Handle Endianness Explicitly

# Good: Explicit endianness
0       lelong  0xcafebabe   (little-endian)
0       belong  0xcafebabe   (big-endian)

# Risky: Native endianness
0       long    0xcafebabe   (platform-dependent)

Supported Features

Currently Supported

  • Absolute offsets
  • Relative offsets
  • Indirect offsets (basic)
  • Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
  • Float and double types (32-bit and 64-bit IEEE 754 floating-point)
  • Date and qdate types (32-bit and 64-bit Unix timestamps)
  • String and pstring types (null-terminated and length-prefixed strings)
  • Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
  • Bitwise AND operator
  • Nested rules
  • Comments

Not Yet Supported

  • Regex patterns
  • 128-bit integer types

Recently Added

  • Strength modifiers: The !:strength directive for adjusting rule priority
  • 64-bit integers: quad type family (quad, uquad, lequad, ulequad, bequad, ubequad)
  • Floating-point types: float and double type families (float, befloat, lefloat, double, bedouble, ledouble) with IEEE 754 semantics and epsilon-aware equality

Troubleshooting

Rule Not Matching

  1. Check offset is correct (0-indexed)
  2. Verify endianness matches file format
  3. Test with hexdump -C file | head
  4. Ensure no conflicting rules

Unexpected Results

  1. Check rule order (first match wins)
  2. Verify nested rule levels
  3. Test with simpler rules first

Performance Issues

  1. Avoid unnecessary string searches
  2. Use specific offsets over searches
  3. Order rules by likelihood of match

See Also