Search This Blog

Tuesday, April 17, 2012

XenServer domain-0's VCPU counts

In XenServer, Domain-0's VCPU counts can be found by

cat /etc/sysconfig/unplug-vcpus.

Thursday, April 12, 2012

Disk space requirement for Hadoop

Hadoop requires storage for :
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local

For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.

Wednesday, April 11, 2012

DFSClient write timeout in Hadoop MapReduce

When DFSClient has socket timeout to access data nodes in Hadoop MapReduce, we may consider two parameters in hdfs-site.xml

dfs.socket.timeout,  for read timeout
dfs.datanode.socket.write.timeout, for write timeout

In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen.


I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared.

Tuesday, April 3, 2012

connecting to postgreSQL using psycopg2

import psycopg2 # the postgreSQL package
import sys
import optparse

host_string = "localhost"
dbname_string = "mydatabase"
username_string = "myaccount"


from optparse import OptionParser

def main():
# The entry point : obtain the password and connect database.

        parser=OptionParser()
        parser.add_option("-p", "--password", dest="password", help="password for database account", metavar="PASSWORD")
        parser.add_option("-o", "--output", dest="outfilename", help="output filename", metavar="OUTPUT_FILE")
        (options, args) = parser.parse_args()
        if len(sys.argv) < 2:
                parser.print_help()
        else:
                print ("options:", str(options))
                print ("arguments:", args)
                print options.password
                conn_string = "host="+"'"+host_string+"' "+"dbname="+"'"+ dbname_string+"' "+"user="+"'"+username_string+"' "+"password="+"'"+options.password+"'"
                print "Connecting to database\n -> %s" % (conn_string)
                try :
                        conn=psycopg2.connect(conn_string)

                except:
                        print "unable to connect to database %s on host %s for user %s" %(dbname_string) %(host_string) %s(username_string)
                print "Connected to database\n -> %s" % (conn_string)
                cur=conn.cursor() # obtain the curser to execute SQL queries
                cur.execute ("""SELECT * from system_now;""") #execute SQL query
                rows = cur.fetchall() # fetch all rows
                for row in rows: #print rows
                        print " ", row[1]

main()

Parsing command line options with optparse module

In python, you can parse command line options as follows.
from optparse import OptionParser

def main():
# The entry point : obtain the password and connect database.

        parser=OptionParser()
        parser.add_option("-p", "--password", dest="input", help="input", metavar="INPUT_VAL")
        (options, args) = parser.parse_args()
        print ("options:", str(options))
        print ("arguments:", args)
        print options.input

main()

Note that options are stored in options and each option can be accessed by dest as string.

Monday, April 2, 2012

how to add my current directory into path

1. vi ~/.bashrc
2. Add following.
    export PATH=$PATH:.

linux-common-tools are not installed due to conflict with linux-base

1. The first recommendation is
sudo apt-get -f install

2. However, in my case, the package linux-common-tools were incompletely installed, which was not resolved by the above.

I did
sudo dpkg --remove linux-base
sudo apt-get -f install

It worked.  In ubuntu, linux-common-tools are equivalent to linux-base (a debian package).