Magic File Format
Magic files define rules for identifying file types through byte-level patterns.
Overview
Magic files contain rules that describe file formats by specifying byte patterns at specific offsets. Each rule consists of:
- Offset - Where to look in the file
- Type - How to interpret the bytes
- Value - What to match against
- Message - Description to display on match
Basic Format
offset type value message
Example:
0 string PK ZIP archive data
This rule matches files starting with “PK” and labels them as “ZIP archive data”.
Basic Syntax
Rule Structure
[level>]offset type [operator]value message
| Component | Required | Description |
|---|---|---|
level> | No | Indentation level for nested rules |
offset | Yes | Where to read data |
type | Yes | Data type to read |
operator | No | Comparison operator (default: =) |
value | Yes | Expected value |
message | Yes | Description text |
Comments
Lines starting with # are comments:
# This is a comment
0 string PK ZIP archive
Whitespace
- Fields are separated by whitespace (spaces or tabs)
- Leading whitespace indicates rule nesting level
- Trailing whitespace is ignored
Offset Specifications
Absolute Offset
Direct byte position from file start:
0 string \x7fELF ELF executable
16 short 2 (shared object)
Hexadecimal Offset
Use 0x prefix for hex offsets:
0x0 string MZ DOS executable
0x3c long >0 (PE offset present)
Negative Offset (From End)
Read from end of file:
-4 string .ZIP ZIP file (end marker)
Indirect Offset
Read pointer value and use as offset:
# Read 4-byte pointer at offset 60, then check that location
(0x3c.l) string PE\0\0 PE executable
Indirect offset syntax:
(base.type)- Read pointer at base, interpret as type(base.type+adj)- Add adjustment to pointer value
Types for indirect offsets:
.b- byte (1 byte).s- short (2 bytes).l- long (4 bytes).q- quad (8 bytes)
Relative Offset
Offset relative to previous match:
0 string PK\x03\x04 ZIP archive
&2 short >0 (with data)
The & prefix indicates relative offset.
Type Specifications
Integer Types
| Type | Size | Endianness |
|---|---|---|
byte | 1 byte | N/A |
short | 2 bytes | native |
leshort | 2 bytes | little-endian |
beshort | 2 bytes | big-endian |
long | 4 bytes | native |
lelong | 4 bytes | little-endian |
belong | 4 bytes | big-endian |
quad | 8 bytes | native |
lequad | 8 bytes | little-endian |
bequad | 8 bytes | big-endian |
All integer types have unsigned variants prefixed with u:
ubyte,ushort,uleshort,ubeshortulong,ulelong,ubelonguquad,ulequad,ubequad
Examples:
0 byte 0x7f (byte match)
0 leshort 0x5a4d DOS MZ signature
0 belong 0xcafebabe Java class file
0 lequad 0x1234567890abcdef (64-bit little-endian)
8 uquad >0x8000000000000000 (unsigned 64-bit check)
Floating-Point Types
| Type | Size | Endianness | IEEE 754 |
|---|---|---|---|
float | 4 bytes | native | 32-bit |
befloat | 4 bytes | big-endian | 32-bit |
lefloat | 4 bytes | little-endian | 32-bit |
double | 8 bytes | native | 64-bit |
bedouble | 8 bytes | big-endian | 64-bit |
ledouble | 8 bytes | little-endian | 64-bit |
Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).
Examples:
0 lefloat =3.14159 File with float value pi
0 bedouble >1.0 Double value greater than 1.0
Float comparison behavior:
- Equality: Uses epsilon-aware comparison (
f64::EPSILONtolerance) - Ordering: Uses IEEE 754 semantics via
partial_cmp - NaN:
NaN != NaN, comparisons with NaN always return false - Infinity: Positive and negative infinity are properly ordered
Date/Timestamp Types
| Type | Size | Endianness | UTC/Local | Description |
|---|---|---|---|---|
date | 4 bytes | native | UTC | 32-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
ldate | 4 bytes | native | Local | 32-bit Unix timestamp, formatted as local time |
bedate | 4 bytes | big-endian | UTC | 32-bit Unix timestamp, big-endian byte order, UTC |
beldate | 4 bytes | big-endian | Local | 32-bit Unix timestamp, big-endian byte order, local time |
ledate | 4 bytes | little-endian | UTC | 32-bit Unix timestamp, little-endian byte order, UTC |
leldate | 4 bytes | little-endian | Local | 32-bit Unix timestamp, little-endian byte order, local time |
qdate | 8 bytes | native | UTC | 64-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
qldate | 8 bytes | native | Local | 64-bit Unix timestamp, formatted as local time |
beqdate | 8 bytes | big-endian | UTC | 64-bit Unix timestamp, big-endian byte order, UTC |
beqldate | 8 bytes | big-endian | Local | 64-bit Unix timestamp, big-endian byte order, local time |
leqdate | 8 bytes | little-endian | UTC | 64-bit Unix timestamp, little-endian byte order, UTC |
leqldate | 8 bytes | little-endian | Local | 64-bit Unix timestamp, little-endian byte order, local time |
Timestamp values are formatted as strings matching GNU file output format: “Www Mmm DD HH:MM:SS YYYY”
Examples:
# Match file modified at Unix epoch
0 date =0 File created at epoch
# Check timestamp in file header (big-endian)
8 bedate >946684800 File created after 2000-01-01
# 64-bit timestamp (little-endian, local time)
16 leqldate x \b, timestamp %s
String Types
Match literal string data:
0 string %PDF PDF document
0 string GIF89a GIF image data
String escape sequences:
\x00- hex byte\n- newline\t- tab\\- backslash
Pascal String Type
Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.
Length Prefix Width
The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:
| Suffix | Width | Endianness | Range |
|---|---|---|---|
/B | 1 byte | N/A | 0-255 (default) |
/H | 2 bytes | big-endian | 0-65535 |
/h | 2 bytes | little-endian | 0-65535 |
/L | 4 bytes | big-endian | 0-4294967295 |
/l | 4 bytes | little-endian | 0-4294967295 |
Self-Inclusive Length (/J Flag)
The /J flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.
Examples
Basic pstring with default 1-byte prefix:
0 pstring =JPEG JPEG image (Pascal string)
2-byte big-endian length prefix:
0 pstring/H =JPEG JPEG image (2-byte BE prefix)
4-byte little-endian length prefix:
0 pstring/l x \b, name: %s
Self-inclusive length with 2-byte big-endian prefix:
0 pstring/HJ x \b, JPEG-style length
Self-inclusive length with default 1-byte prefix:
0 pstring/J x \b, self-inclusive length
The optional max_length parameter caps the length value:
0 pstring x \b, name: %s
String Flags
String flags are now implemented (issue #234, landed in PR #288), providing libmagic-compatible string comparison semantics.
| Flag | Description |
|---|---|
/c | Case-insensitive (lowercase pattern chars trigger fold) |
/C | Case-insensitive (uppercase pattern chars trigger fold) |
/w | Whitespace-optional (pattern whitespace matches zero or more) |
/W | Whitespace-required-compact (at least one, greedy consume) |
/T | Trim leading/trailing ASCII whitespace from pattern |
/f | Full-word match (post-match word boundary check) |
/b | Force binary test (hint for MIME output) |
/t | Force text test (hint for MIME output) |
Note: /c and /C are asymmetric — the pattern character controls fold direction. With /c, only lowercase pattern chars cause the file byte to be folded to lowercase. With /C, only uppercase pattern chars cause the file byte to be folded to uppercase. See GOTCHAS section S6.5 for details on mixed-case behavior. /B (uppercase) is not a string flag; it is reserved for pstring length-width specification and is rejected on string types.
Examples:
# Case-insensitive match
0 string/c <!doctype HTML document
# Whitespace-optional (matches "ab", "a b", "a b")
0 string/w a b Pattern with flexible whitespace
# Combined flags
0 string/cw <!doctype html> HTML document (case and space insensitive)
# Full-word boundary check
0 string/f int C int keyword (not "integer")
# Trim leading/trailing whitespace from the pattern (`/T` = STRING_TRIM)
0 string/T " hello " Hello marker (matches "hello" without surrounding spaces)
# Binary-mode hint (`/b` = STRING_BINTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
24 string/b FTCOMP FTCOMP compressed archive
# Text-mode hint (`/t` = STRING_TEXTTEST) -- parsed and stored; MIME-output
# wiring deferred to the `!:mime` evaluation work
0 string/t #!/bin/sh POSIX shell script text
Note on /T empty patterns: string/T " " trims to an empty pattern. The evaluator treats this as no-match (with a warn! log) rather than letting it silently match every file. Fix the rule.
Search Flags
Search flags are specified as /flags after the range in search types: search/N/<flags>. libmagic-rs implements the full search-type flag semantics (issue #235).
Search flags share most semantics with string flags. Eight flags (/c, /C, /w, /W, /T, /f, /t, /b) carry the same comparison-altering or metadata-hint meanings as their string-type counterparts. The ninth flag, /s, is search-specific: it controls where the previous-match anchor lands for relative-offset children.
| Flag | Description |
|---|---|
/s | Start anchor: sets the previous-match anchor to match-START instead of match-END for relative-offset children |
/c | Case-insensitive (lowercase): pattern lowercase letters match both cases in buffer |
/C | Case-insensitive (uppercase): pattern uppercase letters match both cases in buffer |
/w | Optional whitespace: pattern whitespace matches zero-or-more buffer whitespace |
/W | Compact whitespace: pattern whitespace requires ≥1 buffer whitespace |
/T | Trim whitespace: leading/trailing whitespace in pattern is ignored |
/f | Full word: post-match word boundary check (same semantics as string type) |
/t | Text test hint: MIME output hint (parsed, no comparison effect) |
/b | Binary test hint: MIME output hint (parsed, no comparison effect) |
Performance note: Flags /c, /C, /w, /W, /T, /f force byte-by-byte comparison, while /s, /t, /b preserve the fast SIMD-accelerated search path (via memchr::memmem::find).
/s anchor semantics: By default, a search match advances the previous-match anchor to the byte just past the matched pattern (match-END). With /s, the anchor lands on the first byte of the match (match-START). This is required for file formats that place magic signatures in trailers or use relative-offset children that reference the signature start (TGA footer, sfnt name table).
Examples:
# TGA footer with start-anchor (images:114)
# The magic string "TRUEVISION-XFILE.\0" is in the trailer; /s lets
# relative-offset children resolve against the signature's start position
0 search/4261301/s TRUEVISION-XFILE.\0 TGA image data
>-8 lelong x \b, offset %d
# Python shebang with optional whitespace (commands:20)
# Pattern has one space; /w allows zero or more whitespace in the file
0 search/1/w #!\040/usr/bin/python Python script text executable
# BinHex with binary hint (macintosh:17)
# /b is parsed and stored; comparison-time MIME effect deferred to !:mime
0 search/2652/b (This\ file\ must\ be\ converted\ with\ BinHex BinHex binary text
Note on /T empty patterns: the /N range is mandatory, so the example must carry a window like /256. A rule such as search/256/T " " (or any search/N/T with a whitespace-only pattern) trims to an empty pattern, and the evaluator treats that as no-match (with a warn! log) rather than letting it silently match every offset. Fix the rule. Bare search/T does not reach the evaluator at all – it is a parse error before the trim ever runs.
Operators
Comparison Operators
| Operator | Description | Example |
|---|---|---|
= | Equal (default) | 0 long =0xcafebabe |
!= | Not equal | 4 byte !=0 |
> | Greater than | 8 long >1000 |
< | Less than | 8 long <100 |
>= | Greater than or equal | 8 long >=1000 |
<= | Less than or equal | 8 long <=100 |
& | Bitwise AND | 4 byte &0x80 |
^ | Bitwise XOR (not yet implemented) | 4 byte ^0xff |
Bitwise AND with Mask
Test specific bits:
# Check if bit 7 is set
4 byte &0x80 (compressed)
# Check if lower nibble is 0x0f
4 byte &0x0f=0x0f (all bits set)
Negation
Prefix operator with ! for negation:
# Match if NOT equal to zero
4 long !0 (non-zero)
Values
Numeric Values
# Decimal
0 long 1234
# Hexadecimal
0 long 0x4d5a
# Octal
0 byte 0177
String Values
# Plain string
0 string RIFF
# With escape sequences
0 string PK\x03\x04
# Unicode (as bytes)
0 string \xff\xfe
Special Values
| Value | Description |
|---|---|
x | Match any value (always true) |
Example:
0 string PK ZIP archive
>4 short x version %d
The x value matches anything and %d formats the matched value.
Nested Rules
Rules can be nested to create hierarchical matches. Deeper matches indicate more specific identification.
Indentation Levels
Use > prefix for nested rules:
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
Evaluation:
- Check offset 0 for ELF magic
- If matched, check offset 4 for bit size
- If matched, check offset 5 for endianness
Multiple Nesting Levels
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
>>>16 short 2 (shared object)
>>>16 short 3 (executable)
Continuation Messages
Use \b (backspace) to suppress space before message:
0 string GIF8 GIF image data
>4 byte 7a \b, version 87a
>4 byte 9a \b, version 89a
Output: GIF image data, version 89a
Examples
ELF Executable
# ELF (Executable and Linkable Format)
0 string \x7fELF ELF
>4 byte 1 32-bit
>4 byte 2 64-bit
>5 byte 1 LSB
>5 byte 2 MSB
>16 leshort 2 (executable)
>16 leshort 3 (shared object)
ZIP Archive
# ZIP archive
0 string PK\x03\x04 ZIP archive data
>4 leshort x \b, version %d.%d to extract
>6 leshort &0x0001 \b, encrypted
>6 leshort &0x0008 \b, with data descriptor
JPEG Image
# JPEG
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe0 \b, JFIF standard
>3 byte 0xe1 \b, Exif format
PDF Document
# PDF
0 string %PDF- PDF document
>5 string 1. \b, version 1.x
>5 string 2. \b, version 2.x
PE Executable
# DOS MZ executable with PE header
0 string MZ DOS executable
>0x3c lelong >0 (PE offset)
>(0x3c.l) string PE\0\0 PE executable
GZIP Compressed
# GZIP
0 string \x1f\x8b gzip compressed data
>2 byte 8 \b, deflated
>3 byte &0x01 \b, ASCII text
>3 byte &0x02 \b, with header CRC
>3 byte &0x04 \b, with extra field
>3 byte &0x08 \b, with original name
>3 byte &0x10 \b, with comment
PNG Image
# PNG
0 string \x89PNG\r\n\x1a\n PNG image data
>16 belong x \b, %d x
>20 belong x %d
>24 byte 0 \b, grayscale
>24 byte 2 \b, RGB
>24 byte 3 \b, palette
>24 byte 4 \b, grayscale+alpha
>24 byte 6 \b, RGBA
Floating-Point Values
# Check for specific float value
0 lefloat =3.14159 File with float value pi
# Float comparison
0 float >1.0 Float value greater than 1.0
# Double precision
0 bedouble =0.45455 PNG image with gamma 0.45455
Meta-types / Control Directives
Meta-types are pseudo-types that do not read bytes from the buffer. Instead, they control the evaluation flow: defining named subroutines, invoking them, providing fallbacks when no sibling matched, resetting per-level match state, or re-applying the entire rule database at a resolved offset.
| Keyword | Syntax | Description |
|---|---|---|
name <id> | 0 name part2 | Defines a named subroutine block; children are the subroutine body |
use <id> | >0 use part2 | Invokes a named subroutine at the resolved offset |
default | 0 default x Fallback | Fires only when no sibling at the same level has matched |
clear | 0 clear | Resets the per-level sibling-matched flag |
indirect | 8 indirect x | Re-applies the full rule database at the resolved offset |
offset | 0 offset x at_offset %lld | Emits the resolved file position as a Value::Uint for printf-style substitution |
name and use — Named Subroutines
name <id> defines a named subroutine block at the top level; its children are the subroutine body. use <id> invokes that subroutine at a given offset.
# Define a reusable subroutine
0 name part2
>0 search/64 ABC found_ABC
>>&0 byte x followed_by 0x%x
# Top-level rule that invokes the subroutine
0 string TEST Testfmt
>0 use part2
>64 use part2
Top-level name blocks are hoisted out of the flat rule list at parse time into a NameTable keyed by identifier. Duplicate names retain the first definition and emit a warning. name rules nested inside another rule’s children are not well-defined in magic(5) and are scrubbed at load time.
default — Fallback Rule
A default rule at a given level fires only when none of its siblings at the same level have matched. The operator is conventionally x (any-value), and the value column is ignored.
0 byte 0xAA Real-Match
0 default x DEFAULT-FALLBACK
Against a buffer starting with 0xAA, only Real-Match fires. Against a buffer starting with any other byte, DEFAULT-FALLBACK fires.
clear — Reset Sibling-Matched Flag
A clear directive resets the per-level “sibling matched” flag, so a subsequent default at the same level can fire again even after an earlier sibling matched. Pair with EvaluationConfig::with_stop_at_first_match(false) to walk all top-level siblings.
0 byte 0xAA Match-A
0 default x DEFAULT-SKIPPED
0 clear
0 default x DEFAULT-FIRES
Against a buffer starting with 0xAA: Match-A fires, DEFAULT-SKIPPED is suppressed (a sibling matched), clear resets the flag, and DEFAULT-FIRES fires.
indirect — Re-apply Root Rules at a Resolved Offset
An indirect rule resolves its offset, slices the buffer at that point, and re-applies the full rule database against the sub-buffer. Recursion is bounded by EvaluationConfig::max_recursion_depth.
0 byte 0x42 Inner-Match
8 indirect x
Against a 16-byte buffer with buf[8] = 0x42: the top-level byte rule at offset 0 does not match, and the indirect rule re-applies the root rules at offset 8 — where buf[8] = 0x42 matches the inner byte rule, producing Inner-Match.
Best Practices
1. Order Rules by Specificity
Put more specific rules first:
# Good: Specific before general
0 string PK\x03\x04 ZIP archive
0 string PK (generic PK signature)
# Bad: General catches all
0 string PK (generic PK signature)
0 string PK\x03\x04 ZIP archive # Never reached
2. Use Nested Rules for Details
# Good: Hierarchical structure
0 string \x7fELF ELF
>4 byte 2 64-bit
>>5 byte 1 LSB
# Bad: Flat rules
0 string \x7fELF ELF
4 byte 2 64-bit
5 byte 1 LSB
3. Document Complex Rules
# JPEG with Exif metadata
# The Exif APP1 marker (0xFFE1) contains camera metadata
0 string \xff\xd8\xff JPEG image data
>3 byte 0xe1 \b, Exif format
4. Test Edge Cases
Consider:
- Empty files
- Truncated files
- Minimum valid file size
- Maximum offset values
5. Use Appropriate Types
# Good: Match exact size needed
0 leshort 0x5a4d DOS executable
# Bad: Over-reading
0 lelong x (reads 4 bytes when 2 needed)
6. Handle Endianness Explicitly
# Good: Explicit endianness
0 lelong 0xcafebabe (little-endian)
0 belong 0xcafebabe (big-endian)
# Risky: Native endianness
0 long 0xcafebabe (platform-dependent)
Supported Features
Currently Supported
- Absolute offsets
- Relative offsets
- Indirect offsets (basic)
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- Float and double types (32-bit and 64-bit IEEE 754 floating-point)
- Date and qdate types (32-bit and 64-bit Unix timestamps)
- String and pstring types (null-terminated and length-prefixed strings)
- Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
- Bitwise AND operator
- Nested rules
- Comments
Not Yet Supported
- Regex patterns
- 128-bit integer types
Recently Added
- Strength modifiers: The
!:strengthdirective for adjusting rule priority - 64-bit integers:
quadtype family (quad,uquad,lequad,ulequad,bequad,ubequad) - Floating-point types:
floatanddoubletype families (float,befloat,lefloat,double,bedouble,ledouble) with IEEE 754 semantics and epsilon-aware equality
Troubleshooting
Rule Not Matching
- Check offset is correct (0-indexed)
- Verify endianness matches file format
- Test with
hexdump -C file | head - Ensure no conflicting rules
Unexpected Results
- Check rule order (first match wins)
- Verify nested rule levels
- Test with simpler rules first
Performance Issues
- Avoid unnecessary string searches
- Use specific offsets over searches
- Order rules by likelihood of match
See Also
- magic(5) - Original magic format
- file(1) - GNU file command
- API Reference - libmagic-rs API documentation