Hadoop requires storage for :
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local
For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.
No comments:
Post a Comment