Differences

This shows you the differences between two versions of the page.

--- distributed_computing:data_processing:hadoop:hdfs:small_files [2019/10/25 21:05] – [Solutions] phreazer
+++ distributed_computing:data_processing:hadoop:hdfs:small_files [2019/10/25 21:55] (current) – [Solutions] phreazer
@@ Line 2: / Line 2: @@
 Memory overhead
-  * File/Directory/Block is represented as object in namenode's memory and occupies 150 bytes. 10.000.000 small files => (10.000.000 blocks + 10.000.000 files) * 150 bytes => 3 Gb
+  * File/Directory/Block is represented as object in namenode's memory and occupies 150 bytes. 10.000.000 small files => (10.000.000 blocks * replication factor + 10.000.000 file inodes ) * 150 bytes => 3 Gb
   * Namenode is limited by main memory
@@ Line 11: / Line 11: @@
   * Consolidator
   * HBase: Stores data in indexed SequenceFiles (HBase)
+  * Spark compaction: https://github.com/KeithSSmith/spark-compaction
+  * Filecrush: https://github.com/asdaraujo/filecrush