This is an old revision of the document!
Formats
Parquet
- Column-oriented
- Dictionary encoding
Implementations
- Read/Write in Spark
- Parallel column read in pyarrow: https://wesmckinney.com/blog/python-parquet-multithreading/
distributed_computing:data_processing:formats
This is an old revision of the document!