Search This Blog

Thursday, January 24, 2013

Restarting Hadoop Namenode when you have java exceptions

Occasionally, accumulated log files could create disk full errors, which stops Hadoop's Namenode daemon. While resolving this,

  1. Create available space, perhaps delete or back-up logs.
  2. After that, execute $HADOOP_HOME/bin/hadoop-daemon.sh start namenode
    1. Confirm that namenode daemon continues to run even after some seconds later, i.e., 20 seconds.
    2. If yes, you are good.
  3. If no, you might have one of those situations:
    • Namenode Null Pointer exception
    • namenode number format exception
  4. If you have the above exceptions, 
    1. Go to your directory for dfs.name.dir in your hdfs-site.xml or by default it is /dfs/name/current
    2. $printf "\xff\xff\xff\xee\xff" > edits
    3. if you have edits.new, then $printf "\xff\xff\xff\xee\xff" > edits.new
  5. Go to the step 2. Check your namenode work.

Tuesday, January 22, 2013

Running YCSB over Apache Cassandra

1. Introduction

Apache Cassandra is one of an open source key-value storage that utilizes application's in-memory cache data structure to store data. It is inspired by Google's Big Table and Amazon's Dynamo. To benchmark Cassandra, one of the common benchmarking tools is YCSB.

2. Setup

Installing cassandra and YCSB are as simple as un-tarring downloaded files. Instructions can be found here. I describe some updates from the referred web site, which stems from updates in Cassandra 1.2.0.

For creating keyspace in Cassandra 1.2.0 or later Cassandra 0.8.0
 
A proper syntax to create the keyspace is:
create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}];

Others remain the same from the link.
 

3. Run YCSB

A good guide from Brian Kooper is here.

So far, we discussed a setup from scratch. Cassandra is not very complex to get things work in general, but you may experience with some troubles when you want to re-create Cassandra cluster. Some of random collections that I faced were summarized in the remainder of this section. And it will grow as I face more troubles.

3.1 Trouble shootings

Adding nodes to cluster outputs 'no other nodes seen' message.

 This could be party because your prior installation may use different seed node list in cassandra.yaml and accordingly, your system information, which will be referred by default when you re-launch Cassandra. A solution is clear all files from /var/lib/cassandra/data/system or accordingly you configured before.

How much traffic can I generate from a certain number of YCSB client threads?

 In my experience, when the number of threads exceeds some decent numbers like 128 and fixed number of machines are used for generating workloads, the overall maximally achievable throughput without throughput throttling will be flat with out the throughput target option (-target ). Before running actual experiments, you should confirm what is the maximally achievable throughput number with your client machines. Otherwise, your experimental results would be bounded by the clients instead of Cassandra servers.