Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
distributed_computing:data_processing:hadoop:hdfs:small_files [2019/10/25 19:05] – created phreazer | distributed_computing:data_processing:hadoop:hdfs:small_files [2019/10/25 19:55] (current) – [Solutions] phreazer | ||
---|---|---|---|
Line 2: | Line 2: | ||
Memory overhead | Memory overhead | ||
- | * File/ | + | * File/ |
* Namenode is limited by main memory | * Namenode is limited by main memory | ||
===== Solutions ===== | ===== Solutions ===== | ||
- | * HAR: Hadoop archive; Layered file system on top of HDFS; Used for archiving (slow read) | + | |
- | * Sequence file: File name as key, contents as the value | + | * Sequence file: File name as key, contents as the value |
- | * Consolidator | + | * Consolidator |
- | * HBase: Stores data in indexed SequenceFiles (HBase) | + | * HBase: Stores data in indexed SequenceFiles (HBase) |
+ | * Spark compaction: https:// | ||
+ | * Filecrush: https:// |