distributed_computing:data_processing:formats

This is an old revision of the document!


Formats

  • N columns
  • M row-groups
  • Metadata: Location of all column metadata start locations
  • Metadata written after the data for single pass writing
  • First metadata should be read to find column chunks
  • Non-nested schema: Nulls encoded with run-length encoding (0, 1000 times)
  • distributed_computing/data_processing/formats.1572299129.txt.gz
  • Last modified: 2019/10/28 22:45
  • by phreazer