Format specifications
Overview
This section provides detailed technical specifications for the Microsoft compression formats supported by Cabriolet. These documents go beyond usage guides to explain the low-level binary structures, data layouts, and algorithms that define each format.
Purpose
Format specifications are essential for:
-
Library developers implementing or extending format support
-
Security researchers analyzing file format vulnerabilities
-
Forensics analysts examining corrupted or malformed archives
-
Archivists understanding format preservation requirements
-
Advanced users troubleshooting format-specific issues
Available specifications
CAB File Format Specification
Complete technical specification of the Microsoft Cabinet file format, including:
-
Binary structure definitions (CFHEADER, CFFOLDER, CFFILE)
-
Multi-volume spanning mechanisms
-
Reserved and future-use fields
-
Signature and checksum algorithms
-
Cabinet set handling
Compression Type Codes
Comprehensive reference for compression algorithm identifiers:
-
Compression type enumeration
-
Algorithm-specific flags and parameters
-
Legacy and deprecated compression types
-
Vendor extensions and custom types
Windows Help File Format Specification
Technical specification of the Windows Help (WinHelp) format, including:
-
WinHelp 3.x and 4.x file structures
-
Internal file system (|SYSTEM, |TOPIC)
-
Zeck LZ77 compression algorithm
-
B-tree index structures
-
Phrase replacement compression
File Attributes Reference
Detailed documentation of file attribute flags:
-
MS-DOS attribute flags
-
Windows file attributes
-
Special attribute combinations
-
Platform-specific attributes
Reading these specifications
Structure notation
Binary structures use the following conventions:
STRUCTURE_NAME {
Type field_name; // Offset: 0x00, Size: bytes, Description
Type[n] array_field; // Array with n elements
Type :bits; // Bitfield with specified bit count
}Types:
-
u8,u16,u32,u64- Unsigned integers (8, 16, 32, 64-bit) -
i8,i16,i32,i64- Signed integers -
char[n]- Fixed-length character array -
byte[n]- Fixed-length byte array -
STRUCTURE- Nested structure reference
Byte order
Unless otherwise specified:
-
Little-endian byte order is used (Intel x86 convention)
-
Multi-byte integers have least significant byte first
-
Example:
0x12345678stored as78 56 34 12
Validation and testing
When implementing based on these specifications:
-
Test with known-good files from official Microsoft tools
-
Test edge cases including maximum sizes and boundary conditions
-
Handle malformed input gracefully with appropriate error messages
-
Verify checksums when specified in the format
-
Check version compatibility for format variants
Format version history
Common implementation pitfalls
Off-by-one errors
-
Array indices vs. counts (zero-based vs. one-based)
-
Inclusive vs. exclusive range boundaries
-
String null terminators in size calculations
Integer overflow
-
Size field multiplication (e.g., count × item_size)
-
Offset calculations exceeding 32-bit limits
-
Compressed vs. uncompressed size comparisons
Standards and references
Official documentation
-
Microsoft Cabinet SDK documentation (1997)
-
Microsoft Compression API documentation
-
Windows Platform SDK specifications
Reverse engineering sources
-
libmspack project (https://www.cabextract.org.uk/libmspack/)
-
7-Zip format documentation
-
Wine project cabinet implementation
Contributing
If you find errors or omissions in these specifications:
-
Check against reference implementations
-
Test with multiple file samples
-
Document the issue clearly
-
Submit a detailed bug report or pull request
See Contributing Guide for details.
Format support matrix
| Format | Specification | Compression Algorithms | Special Features |
|---|---|---|---|
CAB | Complete | None, MSZIP, Quantum, LZX | Multi-volume, reserves, signatures |
CHM | Partial | LZX | Compound document, sections |
HLP | Partial | LZ77 variant | Topic-based, phrases |
KWAJ | Basic | LZSS, MSZIP, Quantum | Single file, header variants |
LIT | Basic | DES + LZ | Encryption, DRM |
OAB | Basic | LZ variant | Address book specific |
SZDD | Complete | LZSS | Simple header |