Bit-Level Reading
Why Bits Matter
Compression algorithms work at the bit level, not byte level, to achieve optimal encoding.
Byte stream: [0x4D] [0x53] [0x43] [0x46]
Binary: 01001101 01010011 01000011 01000110
Bit stream: 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 1 ...
↑
Read bit-by-bitImplementation
Cabriolet’s Binary::Bitstream class provides bit-level I/O with configurable bit ordering:
# Create bitstream with LSB-first ordering (default, used by MSZIP)
bitstream = Cabriolet::Binary::Bitstream.new(
io_system, handle, buffer_size,
bit_order: :lsb
)
# Create bitstream with MSB-first ordering (used by LZX, Quantum)
bitstream = Cabriolet::Binary::Bitstream.new(
io_system, handle, buffer_size,
bit_order: :msb
)
# Read bits from the stream
value = bitstream.read_bits(5) # Read 5 bits
# Align to byte boundary (discard remaining bits in current byte)
bitstream.byte_alignBit Ordering
LSB-First vs MSB-First
Different compression algorithms use different bit ordering conventions:
Byte: 0xB4 = 10110100
MSB First (Most Significant Bit):
Reads bits from left to right
First bit read: 1 (leftmost)
Read order: 1 0 1 1 0 1 0 0
LSB First (Least Significant Bit):
Reads bits from right to left
First bit read: 0 (rightmost)
Read order: 0 0 1 0 1 1 0 1Algorithm Bit Orders
| Algorithm | Bit Order | Notes |
|---|---|---|
MSZIP | LSB-first | DEFLATE standard, reads from right |
LZX | MSB-first | Microsoft proprietary, reads from left |
Quantum | MSB-first | Arithmetic coding, reads from left |
LZSS | LSB-first | Simple dictionary, reads from right |
Why It Matters
Using the wrong bit order causes complete decompression failure:
# WRONG: Using LSB for LZX data
bitstream = Bitstream.new(io, handle, 4096, bit_order: :lsb)
# Result: Garbled data, invalid Huffman codes
# CORRECT: Using MSB for LZX data
bitstream = Bitstream.new(io, handle, 4096, bit_order: :msb)
# Result: Proper decompressionEOF Handling and Salvage Mode
Graceful EOF Handling
The bitstream implements libmspack-compatible EOF handling:
# Normal mode: First EOF pads with zeros, second EOF raises error
bitstream = Bitstream.new(io, handle, 4096)
# Salvage mode: Pads with zeros indefinitely (for corrupted files)
bitstream = Bitstream.new(io, handle, 4096, salvage: true)Buffer Management
Efficient buffering is critical for performance:
class BufferedBitstream
BUFFER_SIZE = 4096
def initialize(io)
@io = io
@buffer = @io.read(BUFFER_SIZE)
@position = 0
end
def read_byte
refill if @position >= @buffer.bytesize
byte = @buffer.getbyte(@position)
@position += 1
byte
end
private
def refill
@buffer = @io.read(BUFFER_SIZE)
@position = 0
end
end