Parallel extraction

Purpose

This document explains how to use multi-core parallel extraction to significantly speed up archive processing on modern CPUs.

Use parallel extraction when: * Processing large archives with many files * Running on multi-core systems * Time is more critical than memory * Batch processing multiple archives

Concepts

Worker Pool

Worker threads process files concurrently, distributing the workload across CPU cores using Fractor’s lock-free Ractor-based architecture.

Work Distribution

Files are distributed to available workers for balanced processing via Fractor’s work queue system.

Basic Parallel Extraction

require 'cabriolet'

# Parse archive
cab = Cabriolet::CAB::Parser.new.parse('large.cab')

# Extract with 8 workers using the unified Extractor
extractor = Cabriolet::Extraction::Extractor.new(
  cab,
  'output/',
  workers: 8
)

stats = extractor.extract_all

puts "Extracted: #{stats[:extracted]} files"
puts "Time saved: ~5-8x faster on 8-core CPU"

Progress Tracking

extractor.extract_with_progress do |current, total, file|
  progress = (current.to_f / total * 100).round(1)
  puts "\r[#{current}/#{total}] #{progress}% - #{file.name}"
end

Batch Processing

For batch processing multiple archives, use a simple loop with the Extractor:

# Process multiple archives
Dir.glob('*.cab').each do |cab_path|
  cab = Cabriolet.open(cab_path)
  output_dir = File.join('output', File.basename(cab_path, '.*'))

  extractor = Cabriolet::Extraction::Extractor.new(cab, output_dir, workers: 4)
  stats = extractor.extract_all

  puts "#{cab_path}: #{stats[:extracted]} files extracted"
end

Performance Guidelines

CPU Cores Recommended Workers Expected Speedup

2-4

2-4

1.5-2.5x

4-8

4-8

3-6x

8-16

8-12

5-10x