In XenServer, Domain-0's VCPU counts can be found by
cat /etc/sysconfig/unplug-vcpus.
Search This Blog
Tuesday, April 17, 2012
Thursday, April 12, 2012
Disk space requirement for Hadoop
Hadoop requires storage for :
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local
For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local
For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.
Wednesday, April 11, 2012
DFSClient write timeout in Hadoop MapReduce
When DFSClient has socket timeout to access data nodes in Hadoop MapReduce, we may consider two parameters in hdfs-site.xml
dfs.socket.timeout, for read timeout
dfs.datanode.socket.write.timeout, for write timeout
In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen.
I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared.
dfs.socket.timeout, for read timeout
dfs.datanode.socket.write.timeout, for write timeout
In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen.
I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared.
Tuesday, April 3, 2012
connecting to postgreSQL using psycopg2
import psycopg2 # the postgreSQL package
import sys
import optparse
host_string = "localhost"
dbname_string = "mydatabase"
username_string = "myaccount"
from optparse import OptionParser
def main():
# The entry point : obtain the password and connect database.
parser=OptionParser()
parser.add_option("-p", "--password", dest="password", help="password for database account", metavar="PASSWORD")
parser.add_option("-o", "--output", dest="outfilename", help="output filename", metavar="OUTPUT_FILE")
(options, args) = parser.parse_args()
if len(sys.argv) < 2:
parser.print_help()
else:
print ("options:", str(options))
print ("arguments:", args)
print options.password
conn_string = "host="+"'"+host_string+"' "+"dbname="+"'"+ dbname_string+"' "+"user="+"'"+username_string+"' "+"password="+"'"+options.password+"'"
print "Connecting to database\n -> %s" % (conn_string)
try :
conn=psycopg2.connect(conn_string)
except:
print "unable to connect to database %s on host %s for user %s" %(dbname_string) %(host_string) %s(username_string)
print "Connected to database\n -> %s" % (conn_string)
cur=conn.cursor() # obtain the curser to execute SQL queries
cur.execute ("""SELECT * from system_now;""") #execute SQL query
rows = cur.fetchall() # fetch all rows
for row in rows: #print rows
print " ", row[1]
main()
import sys
import optparse
host_string = "localhost"
dbname_string = "mydatabase"
username_string = "myaccount"
from optparse import OptionParser
def main():
# The entry point : obtain the password and connect database.
parser=OptionParser()
parser.add_option("-p", "--password", dest="password", help="password for database account", metavar="PASSWORD")
parser.add_option("-o", "--output", dest="outfilename", help="output filename", metavar="OUTPUT_FILE")
(options, args) = parser.parse_args()
if len(sys.argv) < 2:
parser.print_help()
else:
print ("options:", str(options))
print ("arguments:", args)
print options.password
conn_string = "host="+"'"+host_string+"' "+"dbname="+"'"+ dbname_string+"' "+"user="+"'"+username_string+"' "+"password="+"'"+options.password+"'"
print "Connecting to database\n -> %s" % (conn_string)
try :
conn=psycopg2.connect(conn_string)
except:
print "unable to connect to database %s on host %s for user %s" %(dbname_string) %(host_string) %s(username_string)
print "Connected to database\n -> %s" % (conn_string)
cur=conn.cursor() # obtain the curser to execute SQL queries
cur.execute ("""SELECT * from system_now;""") #execute SQL query
rows = cur.fetchall() # fetch all rows
for row in rows: #print rows
print " ", row[1]
main()
Parsing command line options with optparse module
In python, you can parse command line options as follows.
Note that options are stored in options and each option can be accessed by dest as string.
from optparse import OptionParser
def main():
# The entry point : obtain the password and connect database.
parser=OptionParser()
parser.add_option("-p", "--password", dest="input", help="input", metavar="INPUT_VAL")
(options, args) = parser.parse_args()
print ("options:", str(options))
print ("arguments:", args)
print options.input
main()
Note that options are stored in options and each option can be accessed by dest as string.
Monday, April 2, 2012
how to add my current directory into path
1. vi ~/.bashrc
2. Add following.
export PATH=$PATH:.
2. Add following.
export PATH=$PATH:.
linux-common-tools are not installed due to conflict with linux-base
1. The first recommendation is
sudo apt-get -f install
2. However, in my case, the package linux-common-tools were incompletely installed, which was not resolved by the above.
I did
sudo dpkg --remove linux-base
sudo apt-get -f install
It worked. In ubuntu, linux-common-tools are equivalent to linux-base (a debian package).
sudo apt-get -f install
2. However, in my case, the package linux-common-tools were incompletely installed, which was not resolved by the above.
I did
sudo dpkg --remove linux-base
sudo apt-get -f install
It worked. In ubuntu, linux-common-tools are equivalent to linux-base (a debian package).
Subscribe to:
Posts (Atom)