Character Encoding Problems
Filename Encoding
Issue: Garbled or incorrect filenames
Cause: Encoding mismatch between archive and extraction
Solutions:
# Force UTF-8 encoding
filename = file.name.force_encoding('UTF-8')
# Convert from Windows-1252 to UTF-8
filename = file.name.encode('UTF-8', 'Windows-1252')
# Handle invalid bytes
filename = file.name.encode('UTF-8', invalid: :replace, undef: :replace)Common Encoding Issues
Content Encoding
Issue: Text file content has encoding problems
Solution:
# Detect encoding
require 'charlock_holmes'
detection = CharlockHolmes::EncodingDetector.detect(file.data)
text = file.data.force_encoding(detection[:encoding]).encode('UTF-8')Best practices
-
Always use UTF-8::
# Set default encoding Encoding.default_external = Encoding::UTF_8 Encoding.default_internal = Encoding::UTF_8 -
Validate filenames::
# Remove invalid characters safe_name = filename.gsub(/[^\w\s\-.]/, '_') -
Test with international content