My Computing Note: 2013

Friday, October 18, 2013

Get Amazon EC2 instance price in a program

Although the json file in the url does not include RHEL, SUSE, Windows with SQL instances, this would be useful for most of cases.

-----------------------------------

import json
import urllib2

type_translation = {
('stdODI', 'sm'):   'm1.small' ,
('stdODI', 'med'): 'm1.medium' ,
('stdODI', 'lg'):   'm1.large' ,
('stdODI', 'xl'):   'm1.xlarge' ,
('uODI', 'u'):   't1.micro' ,
('hiMemODI', 'xl'):   'm2.xlarge' ,
('hiMemODI', 'xxl'):   'm2.2xlarge' ,
('hiMemODI', 'xxxxl'):   'm2.4xlarge' ,
('secgenstdODI', 'xl'):   'm3.xlarge' ,
('secgenstdODI', 'xxl'):   'm3.2xlarge' ,
('hiCPUODI', 'med'):   'c1.medium' ,
('hiCPUODI', 'xl'):   'c1.xlarge' ,
('clusterComputeI', 'xxxxl'):   'cc1.4xlarge' ,
('clusterComputeI', 'xxxxxxxxl'):   'cc2.8xlarge' ,
('clusterHiMemODI' , 'xxxxxxxxl'):   'cr1.8xlarge' ,
('clusterGPUI', 'xxxxl'):   'cg1.4xlarge' ,
('hiIoODI', 'xxxxl') : 'hi1.4xlarge' ,
('hiStoreODI', 'xxxxxxxxl') :'hs1.8xlarge'
}
region_translation = {
'us-east': 'us-east-1',
'us-west-2' :'us-west-2',
'us-west' :'us-west-1',
'eu-ireland':'eu-west-1',
'apac-sin':'ap-southeast-1',
'apac-syd': 'ap-southeast-2',
'apac-tokyo':'ap-northeast-1',
'sa-east-1':'sa-east-1'
}

instance_price={}

def getprice():
        response = urllib2.urlopen('http://aws.amazon.com/ec2/pricing/pricing-on-demand-instances.json')
        pricejson = response.read()
        pricing = json.loads(pricejson)

        for region in pricing['config']['regions']:
                for itypes in region['instanceTypes']:
                        for size in itypes['sizes']:
                                for column in size['valueColumns']:
                                        instance_price[region_translation[region['region']],type_translation[(itypes['type'], size['size'])], column['name']]=column['prices']['USD']

Monday, September 16, 2013

Adding a physical volume to an existing LVM volume

LVM provides the flexibility of relating a logical volume to physical volumes. Using LVM, we can easily resize/extend the size of volume as well as the file system capacity (if you are using resizable file systems such as ext4).

Let's say, if you have a 11GB volume and your filesystem is almost full. Then, you may want to add one more disk and resize your file system. For this task, steps are as follows:

1. create a partition on the new hard disk, using fdisk or parted.
fdisk /dev/xvdh (when /dev/xvdh is the name for your new hard disk.)
2. pvcreate /dev/xvdh1
3. vgextend vg /dev/xvdh1 : then your volume group vg will include /dev/xvdh1 into it's associated physical volume.
4. lvextend -l +100%Free -r /dev/vg/lv : lv is the name of logical volume to extend.

If you want to stripe your data across underlying n physical volumes with 4KB of stripe size, you can do
5. lvconvert --stripes <n> -I 4 /dev/vg/lv

So far, I haven't experienced with any filesystem corruption, when I execute lvconvert.

Thursday, July 25, 2013

Set up networkings between linux containers across EC2 instances (Ubuntu)

LXC is a good option to provide isolated execution environments to individual programs, which includes isolated physical resources like CPU, memory, block I/O, and network devices, as well as software resources. However, setting up networks in EC2 environments poses a unique challenge: Amazon web services allow only known IP packets to them.

Assuming your are using AWS VPC service that allows you to use private addresses, you need to following three major tasks:
1. Obtain IP addresses from AWS.
2. Update host OS
3. Properly config LXC instances.

For step 1, you can use AWS web interface, their software development kits or ec2 APIs.
For step 2, you need to update your /etc/network/interfaces in Ubuntu.

Here, we will use eth0 as management interface for ssh-ing into the EC2 instance. And we will use eth1 as a production network interface for hosting a service, which will run inside the linux container.

On ubuntu host,

ubuntu@ip-10-0-1-xxx:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
post-up ip route add default via 10.0.1.1 dev eth0 tab 1 #10.0.1.1 is your gateway.
post-up ip rule add from 10.0.1.170/32 tab 1 priority 500

auto eth1
iface eth1 inet dhcp
post-up ip route add default via 10.0.1.1 dev eth1 tab 2
post-up ip rule add from 10.0.1.190/32 tab 2 priority 600

For step 3, perform ifconfig on the host and get eth1
ubuntu@ip-10-0-1-xxx:~$ ifconfig eth1

Let's say, if AWS assigned 10.0.1.218 for eth1, on container configuration file,

lxc.network.type = phys
lxc.network.flags = up
lxc.network.link = eth1
lxc.network.name = eth1
lxc.network.ipv4 = 10.0.1.218

After starting up your linux container, using lxc-console or the container's rootfs on the host updates the /etc/network/interfaces file of the container. Somehow, they don't put default routing rule inside the container.

post-up ip route add default via 10.0.1.1 (your gateway address)

Then, now, you can log-in to the container from remote EC2 instance in the same subnet.

Monday, July 22, 2013

Using instance storage in Amazon EC2

Some Amazon EC2 instances provide instance store. You may lose data once you stop the instance, but it does not charge against your account. EBS costs around 1 cent per GB. Thus, if you want to spend your money when you only use the instance, instance storage would be the option to be considered.

In order to use the instance storage, we need to partition and format it. Additionally, we may consider to create LVM over them since some instances provide multiple instance stores and we often want to consolidate them.

1.Find instance stores:
ls /dev/xvd[b-g]

2.unmount them (by default, they are mounted /mnt)
umount /mnt

3. delete the primary partition
parted /dev/xvdb rm 1

4. make new label, gpt (dos is default)
parted /dev/xvdb mklabel gpt

5.Partition and format them:
parted /dev/xvdb mkpart primary xfs 1 -1

6.Create volume group with the name of vg for /dev/xvdb1 and /dev/xvdg1
vgcreate vg /dev/xvdb1 /dev/xvdg1

7. Create logical volume group with the name of docker on vg for the entire free space.
lvcreate -l 100%Free -n docker vg

8. Format as xfs filesystem
mkfs.xfs /dev/vg/docker

9. edit /etc/fstab
vi /etc/fstab
add
/dev/vg/docker /docker defaults 0 0

10. mount
mount -a

Tuesday, April 30, 2013

Calculating R^2 with gnuplot

A relevant knowledge on the ANOVA.

SST = SSR+SSE

where, each term represents following items.

Total sum of squares, SST: the total variation in the observed values of the reponse variable.
Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression
Error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression.

Also, the coefficient of determination, r^2 is

R^2 = SSR/SST

There is a command 'fit' in gnuplot.

The tricky part is to obtain SST using the fit command, which is a curve fitting, but produces some useful variables, FIT_WSSR and FIT_NDF. For details, refer to gnuplot tricks: Basic statistics.

mean(x)= m

fit mean(x) 'your data file' using 1:2 via m # 1 is the x axis and 2 is the y axis

SST = FIT_WSSR/(FIT_NDF+1)

f(x) = a*x + b

fit f(x) 'your data file' using 1:2 via a, b

SSE=FIT_WSSR/(FIT_NDF)

SSR=SST-SSE

R2=SSR/SST

set label sprintf("f(x)=%fx+%f\nR^2=%f", a, b, R2) # print r^2.

plot 'your data file' using 1:2 title 'data', f(x) notitle

Thursday, January 24, 2013

Restarting Hadoop Namenode when you have java exceptions

Occasionally, accumulated log files could create disk full errors, which stops Hadoop's Namenode daemon. While resolving this,

Create available space, perhaps delete or back-up logs.
After that, execute $HADOOP_HOME/bin/hadoop-daemon.sh start namenode

Confirm that namenode daemon continues to run even after some seconds later, i.e., 20 seconds.
If yes, you are good.

If no, you might have one of those situations:

Namenode Null Pointer exception

namenode number format exception

If you have the above exceptions,

Go to your directory for dfs.name.dir in your hdfs-site.xml or by default it is /dfs/name/current
$printf "\xff\xff\xff\xee\xff" > edits
if you have edits.new, then $printf "\xff\xff\xff\xee\xff" > edits.new

Go to the step 2. Check your namenode work.

Tuesday, January 22, 2013

Running YCSB over Apache Cassandra

1. Introduction

Apache Cassandra is one of an open source key-value storage that utilizes application's in-memory cache data structure to store data. It is inspired by Google's Big Table and Amazon's Dynamo. To benchmark Cassandra, one of the common benchmarking tools is YCSB.

2. Setup

Installing cassandra and YCSB are as simple as un-tarring downloaded files. Instructions can be found here. I describe some updates from the referred web site, which stems from updates in Cassandra 1.2.0.

For creating keyspace in Cassandra 1.2.0 or later Cassandra 0.8.0

A proper syntax to create the keyspace is:
create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}];

Others remain the same from the link.

3. Run YCSB

A good guide from Brian Kooper is here.

So far, we discussed a setup from scratch. Cassandra is not very complex to get things work in general, but you may experience with some troubles when you want to re-create Cassandra cluster. Some of random collections that I faced were summarized in the remainder of this section. And it will grow as I face more troubles.

3.1 Trouble shootings

Adding nodes to cluster outputs 'no other nodes seen' message.

This could be party because your prior installation may use different seed node list in cassandra.yaml and accordingly, your system information, which will be referred by default when you re-launch Cassandra. A solution is clear all files from /var/lib/cassandra/data/system or accordingly you configured before.

How much traffic can I generate from a certain number of YCSB client threads?

In my experience, when the number of threads exceeds some decent numbers like 128 and fixed number of machines are used for generating workloads, the overall maximally achievable throughput without throughput throttling will be flat with out the throughput target option (-target ). Before running actual experiments, you should confirm what is the maximally achievable throughput number with your client machines. Otherwise, your experimental results would be bounded by the clients instead of Cassandra servers.

Search This Blog