Format specifications

Overview

This section provides detailed technical specifications for the Microsoft compression formats supported by Cabriolet. These documents go beyond usage guides to explain the low-level binary structures, data layouts, and algorithms that define each format.

Purpose

Format specifications are essential for:

  • Library developers implementing or extending format support

  • Security researchers analyzing file format vulnerabilities

  • Forensics analysts examining corrupted or malformed archives

  • Archivists understanding format preservation requirements

  • Advanced users troubleshooting format-specific issues

Available specifications

CAB File Format Specification

Complete technical specification of the Microsoft Cabinet file format, including:

  • Binary structure definitions (CFHEADER, CFFOLDER, CFFILE)

  • Multi-volume spanning mechanisms

  • Reserved and future-use fields

  • Signature and checksum algorithms

  • Cabinet set handling

Compression Type Codes

Comprehensive reference for compression algorithm identifiers:

  • Compression type enumeration

  • Algorithm-specific flags and parameters

  • Legacy and deprecated compression types

  • Vendor extensions and custom types

Windows Help File Format Specification

Technical specification of the Windows Help (WinHelp) format, including:

  • WinHelp 3.x and 4.x file structures

  • Internal file system (|SYSTEM, |TOPIC)

  • Zeck LZ77 compression algorithm

  • B-tree index structures

  • Phrase replacement compression

File Attributes Reference

Detailed documentation of file attribute flags:

  • MS-DOS attribute flags

  • Windows file attributes

  • Special attribute combinations

  • Platform-specific attributes

Reading these specifications

Structure notation

Binary structures use the following conventions:

STRUCTURE_NAME {
    Type        field_name;     // Offset: 0x00, Size: bytes, Description
    Type[n]     array_field;    // Array with n elements
    Type        :bits;          // Bitfield with specified bit count
}

Types:

  • u8, u16, u32, u64 - Unsigned integers (8, 16, 32, 64-bit)

  • i8, i16, i32, i64 - Signed integers

  • char[n] - Fixed-length character array

  • byte[n] - Fixed-length byte array

  • STRUCTURE - Nested structure reference

Byte order

Unless otherwise specified:

  • Little-endian byte order is used (Intel x86 convention)

  • Multi-byte integers have least significant byte first

  • Example: 0x12345678 stored as 78 56 34 12

Bit order

Bitfields are numbered from least significant bit (LSB = bit 0) to most significant bit (MSB = bit 7/15/31 depending on field size).

Alignment

  • Structures may have padding for alignment

  • Specific alignment requirements noted in structure definitions

  • Unaligned access may be required in some formats

Validation and testing

When implementing based on these specifications:

  1. Test with known-good files from official Microsoft tools

  2. Test edge cases including maximum sizes and boundary conditions

  3. Handle malformed input gracefully with appropriate error messages

  4. Verify checksums when specified in the format

  5. Check version compatibility for format variants

Format version history

CAB Format Versions

  • Version 1.1 (1996): Original Windows 95 format

  • Version 1.2 (1997): Added reserve areas

  • Version 1.3 (1997): Enhanced cabinet sets

Compression algorithm evolution

  • MSZIP (Type 1): Deflate variant, introduced 1996

  • Quantum (Type 2): Microsoft proprietary, introduced 1996

  • LZX (Type 3): Advanced compression, introduced 1997

Common implementation pitfalls

Off-by-one errors

  • Array indices vs. counts (zero-based vs. one-based)

  • Inclusive vs. exclusive range boundaries

  • String null terminators in size calculations

Integer overflow

  • Size field multiplication (e.g., count × item_size)

  • Offset calculations exceeding 32-bit limits

  • Compressed vs. uncompressed size comparisons

Endianness issues

  • Mixing native and format byte order

  • Bitfield extraction on big-endian systems

  • Network vs. host byte order confusion

Resource limits

  • Memory allocation based on untrusted size fields

  • Excessive nesting or recursion

  • Decompression bombs (small compressed, huge uncompressed)

Standards and references

Official documentation

  • Microsoft Cabinet SDK documentation (1997)

  • Microsoft Compression API documentation

  • Windows Platform SDK specifications

Reverse engineering sources

Academic research

  • Compression algorithm papers and patents

  • File format security analysis

  • Digital preservation studies

Contributing

If you find errors or omissions in these specifications:

  1. Check against reference implementations

  2. Test with multiple file samples

  3. Document the issue clearly

  4. Submit a detailed bug report or pull request

See Contributing Guide for details.

Format support matrix

Format Specification Compression Algorithms Special Features

CAB

Complete

None, MSZIP, Quantum, LZX

Multi-volume, reserves, signatures

CHM

Partial

LZX

Compound document, sections

HLP

Partial

LZ77 variant

Topic-based, phrases

KWAJ

Basic

LZSS, MSZIP, Quantum

Single file, header variants

LIT

Basic

DES + LZ

Encryption, DRM

OAB

Basic

LZ variant

Address book specific

SZDD

Complete

LZSS

Simple header