Small file problem

This is an old revision of the document!

Memory overhead

File/Directory/Block is represented as object in namenode's memory and occupies 150 bytes. 10.000.000 small files ⇒ (10.000.000 blocks + 10.000.000 files) * 150 bytes ⇒ 3 Gb
Namenode is limited by main memory

* HAR: Hadoop archive; Layered file system on top of HDFS; Used for archiving (slow read) * Sequence file: File name as key, contents as the value * Consolidator * HBase: Stores data in indexed SequenceFiles (HBase)

Small file problem

Solutions

AE Wiki