Search This Blog

Thursday, April 12, 2012

Disk space requirement for Hadoop

Hadoop requires storage for :
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local

For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.

No comments:

Post a Comment