This is an old revision of the document!
Formats
Parquet
- Column-oriented
- Dictionary encoding
Implementations
* Read/Write in Spark * Parallel column read in pyarrow: https://wesmckinney.com/blog/python-parquet-multithreading/
distributed_computing:data_processing:formats
This is an old revision of the document!
* Read/Write in Spark * Parallel column read in pyarrow: https://wesmckinney.com/blog/python-parquet-multithreading/