distributed_computing:data_processing:formats

This is an old revision of the document!


Formats

  • Column-oriented
  • Dictionary encoding

* Read/Write in Spark * Parallel column read in pyarrow: https://wesmckinney.com/blog/python-parquet-multithreading/

  • distributed_computing/data_processing/formats.1572298610.txt.gz
  • Last modified: 2019/10/28 22:36
  • by phreazer