Search This Blog

Tuesday, November 13, 2012

Linear regression with Excel

Excel is a convenient tool, though often not so useful for science.

If you want to obtain a proper linear regression and r^2 values, don't use trend line from the GUI. Instead,
=index(linest(y's, x's, false, true),1) for slope
=index(linest(y's, x's, false, true),2) for y-intersect
=index(linest(y's, x's, false, true),3) for r^2

Thursday, November 8, 2012

Join multiple files

Join is a convenient text processor provided by Linux or other Unix-like systems. Join is for two files with a common field. Thus, joining two files are pretty straight forward:
join FILE1 FILE2
However, joining three files become non-trivial especially when you join on a field other than the first field. For example, in order to join as to the second field of files, you may try as follows:
join -j 2 FILE1 FILE2 | join -j 2 - FILE3
 The result would be an error because the output from the first join changes the position of the common field.  So, your second join should be:
join -j 2 FILE1 FILE2 | join -1 1 -2 2 - FILE3

Friday, June 8, 2012

How to control the number of VCPUs of the control-domain (dom-0)

 in Xen, dom-0 can use both 'xm vcpu-pin' and 'xm vcpu-set' as well as other guest domains. However, in Xenserver, the same analogy won't work.

1. xe vm-vcpu-hotplug new-vcpus=<n> uuid=<uuid of the control domain> yields

Error: No matching VMs found

Alternatively, you may want to find the menu list file in the boot directory and specificy the number of CPU and the size of memory for the boot option.

"dom0_max_vcpus=X"

In order to dedicate those VCPUs, dom0_vcpus_pin needs to be specified in the boot option.


Monday, June 4, 2012

matplotlib contourf blank area

I am drawing a contour plot with fixed range (0,100). However, my data contains only (0,1) and when I call
plt.contourf(X,Y,Z,np.linspace(0,100,101), interpolation='neither',extend='neither')
It generated blank areas.

Add the following line ahead of contourf function.
plt.imshow(Z,interpolation='nearest',extent=[0,100,ymin,ymax], vmin=0,vmax=100)

Thursday, May 31, 2012

Draw a 3D surface figure with matplotlib from a data file

To draw a 3D surface figure with matplotlib, we generate 3 2D matrix - X, Y, and Z. All of them needs to have the same shape since the element of each matrix in the same position represents the point in 3D canvas.

For example, my data file looks
sec id value
5 1 10
10 1 12

I want to draw sec as x-axis, id as y-axis and value as z-axis, repsectively.When I know the range of sec and their intervals, and id,

# to generate 3D plots
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib.pyplot as plt

# to generate 3D surface plots
from matplotlib import cm
from matplotlib.ticker import LinearLocator, formatStrFormatter

# to manipulate matrices
import numpy as np

# read data file
xs =[]
ys=[]
zs=[]
fd = open(inputfile,'r')
fd.readline() # remove the header
for line in fd:
    p = line.split()
    xs.append(float(p[0]))
    ys.append(float(p[1]))
    zs.append(float(p[2]))
fd.close()

#generate 2D arrays  
X = np.arange(0, float(timerange), 1)
Y = np.arange(0, int(float(id_range)), 1)
X, Y = np.meshgrid(X,Y) # this makes the shape of matrix X and Y the same.
Z = np.zeros(shape=X.shape)
for i in range(0, len(zs)):
    Z[ys[i],xs[i]] = zs[i]

# draw surface plot
fig=plt.figure()
ax=Axes3D(fig)
surf = ax.plot_surface(X,Y,Z,rstride=1, cstride=1, cmap=cm.jet, linewidth=0, antialiased=False)
ax.set_zlim(0.0, 100.0) # whatever  you want for the range of Z values

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.2f'))
fig.colorbar(surf,ax=ax,shrink=0.5, aspect=5)
plt.show()

Draw figures with Python

You can use matplotlib to draw professional graphs in python.

In many situations, you may want to generate graphic files instead of working on your screen. Toward this end, your python program will start with following three lines.

import matplotlib
matplotlib.use('Agg') # to generate files
import matplotlib.pyplot as plt


Monday, May 7, 2012

SSH in python

Obtain paramiko, which requires PyCrypto.
After installing pycrypto and paramiko,

python
>> import paramiko





Tuesday, April 17, 2012

XenServer domain-0's VCPU counts

In XenServer, Domain-0's VCPU counts can be found by

cat /etc/sysconfig/unplug-vcpus.

Thursday, April 12, 2012

Disk space requirement for Hadoop

Hadoop requires storage for :
- input/output in HDFS
- replica in HDFS
- intermediate data from Map tasks in local

For example, if you run 1TB terasort, the input/output would be 2 times 1TB = 2TB. if your replication degree is two, then it will double 4TB.
Meanwhile, Map tasks will generate temporary data in the local, depending on the number of Map tasks and Reduce tasks. It could be around half of the input data 500GB. So, if you have 50 data nodes, each node will require 10GB of available disk space.

Wednesday, April 11, 2012

DFSClient write timeout in Hadoop MapReduce

When DFSClient has socket timeout to access data nodes in Hadoop MapReduce, we may consider two parameters in hdfs-site.xml

dfs.socket.timeout,  for read timeout
dfs.datanode.socket.write.timeout, for write timeout

In fact, the read timeout value is used for various connections in DFSClient, if you only increase dfs.datanode.socket.write.timeout, the timeout can continue to happen.


I tried to generate 1TB data with teragen across more than 40 data nodes, increasing writing timeout has not fixed the problem. When I increased both values above 600000, it disappeared.

Tuesday, April 3, 2012

connecting to postgreSQL using psycopg2

import psycopg2 # the postgreSQL package
import sys
import optparse

host_string = "localhost"
dbname_string = "mydatabase"
username_string = "myaccount"


from optparse import OptionParser

def main():
# The entry point : obtain the password and connect database.

        parser=OptionParser()
        parser.add_option("-p", "--password", dest="password", help="password for database account", metavar="PASSWORD")
        parser.add_option("-o", "--output", dest="outfilename", help="output filename", metavar="OUTPUT_FILE")
        (options, args) = parser.parse_args()
        if len(sys.argv) < 2:
                parser.print_help()
        else:
                print ("options:", str(options))
                print ("arguments:", args)
                print options.password
                conn_string = "host="+"'"+host_string+"' "+"dbname="+"'"+ dbname_string+"' "+"user="+"'"+username_string+"' "+"password="+"'"+options.password+"'"
                print "Connecting to database\n -> %s" % (conn_string)
                try :
                        conn=psycopg2.connect(conn_string)

                except:
                        print "unable to connect to database %s on host %s for user %s" %(dbname_string) %(host_string) %s(username_string)
                print "Connected to database\n -> %s" % (conn_string)
                cur=conn.cursor() # obtain the curser to execute SQL queries
                cur.execute ("""SELECT * from system_now;""") #execute SQL query
                rows = cur.fetchall() # fetch all rows
                for row in rows: #print rows
                        print " ", row[1]

main()

Parsing command line options with optparse module

In python, you can parse command line options as follows.
from optparse import OptionParser

def main():
# The entry point : obtain the password and connect database.

        parser=OptionParser()
        parser.add_option("-p", "--password", dest="input", help="input", metavar="INPUT_VAL")
        (options, args) = parser.parse_args()
        print ("options:", str(options))
        print ("arguments:", args)
        print options.input

main()

Note that options are stored in options and each option can be accessed by dest as string.

Monday, April 2, 2012

how to add my current directory into path

1. vi ~/.bashrc
2. Add following.
    export PATH=$PATH:.

linux-common-tools are not installed due to conflict with linux-base

1. The first recommendation is
sudo apt-get -f install

2. However, in my case, the package linux-common-tools were incompletely installed, which was not resolved by the above.

I did
sudo dpkg --remove linux-base
sudo apt-get -f install

It worked.  In ubuntu, linux-common-tools are equivalent to linux-base (a debian package).

Tuesday, March 20, 2012

ssh key exchange

1. ssh-keygen -t rsa
2. ssh-copy-id -i ~/.ssh/id_rsa.pub remote-host
3. ssh remote-host 'ls -ahl ./ssh/ ' : to check the permisison of authorized_keys file.

Monday, March 19, 2012

Truncate the last letter in VI

Sometimes when you copy a text file from windows, you may have annoying characters at the end of each line. To remove then

:%s/.$//g

Thursday, March 15, 2012

Disabling GSSAPIAuthentication in SSH

vi ~/.ssh/config

add

GSSAPIAuthentication no

for loop in bash scripts


The most compatible fashion across all the bash version is : 
 
#!/bin/bash
        for i in `seq 1 10`;
        do
                echo $i
        done 
 
 

How does XenServer map vCPUs to physical processing contexts?

Please refer to http://support.citrix.com/article/ctx117960

They enumerate physical processing contexts in a depth-first manner. So, in default, vCPU0 and vCPU1 will be in the core 0.

Wednesday, March 14, 2012

Subtables

\usepackage{subfig}

and within
\begin{table}
\subfloat[caption1]
{
    \begin{tabular}
    \end{tabular}
}
\hspace{0.1cm}
\subfloat[caption2]
{
    \begin{tabular}
    \end{tabular}
}
\end{table}
.

Additionally, for resizing sub-tables,
\scalebox{0.7} {
    \begin{tabular}
    \end{tabular}
}
.

Tuesday, March 13, 2012

How to find my CPU uses Hyperthreads?

cat /proc/cpuinfo | grep ht

Even if your kernel supports hyperthread and your CPU provides hyperthread, your kernel may not recognize a hyperthread as a CPU when it is not enabled in BIOS.

Sunday, March 11, 2012

How to change the bullet in itemize environment

\begin{itemize}[label= xx ]
\end{itemize}

For no bullet, [label=]

Section without numbering

Use
\section*{section title}.

How to find CPU affinity of vCPUs in XenServer/XenCenter

xe vm-param-get uuid=<uuid> Param-Names=VCPUs-Params
xe vm-param-set uuid=<uuid> Param-Names=VCPUs-Params:mask=1