Search This Blog


Wednesday, April 11, 2012

DFSClient write timeout in Hadoop MapReduce

When DFSClient has socket timeout to access data nodes in Hadoop MapReduce, we may consider two parameters in hdfs-site.xml

dfs.socket.timeout,  for read timeout
dfs.datanode.socket.write.timeout, for write timeout

In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen.

I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared.

No comments:

Post a Comment