Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Last revisionBoth sides next revision | ||
distributed_computing:data_processing:formats [2019/10/28 22:37] – [Implementations] phreazer | distributed_computing:data_processing:formats [2019/10/28 22:45] – phreazer | ||
---|---|---|---|
Line 5: | Line 5: | ||
* Column-oriented | * Column-oriented | ||
* Dictionary encoding | * Dictionary encoding | ||
+ | * Record-shredding and assembly algorithm (dremel encoding): https:// | ||
+ | |||
+ | * N columns | ||
+ | * M row-groups | ||
+ | * Metadata: Location of all column metadata start locations | ||
+ | * Metadata written after the data for single pass writing | ||
+ | * First metadata should be read to find column chunks | ||
+ | * Non-nested schema: Nulls encoded with run-length encoding (0, 1000 times) | ||
==== Implementations ==== | ==== Implementations ==== |