Friday, May 25, 2012

Cassandra Stress test on a Raspberry Pi


So here are my initial results at using  a  Raspberry Pi to run Cassandra.  At the moment I’m running a single Pi with Cassandra 0.8.10 (compacting is not available and Snappy  will not run currently on the Pi) I’m using the  java stress tests that come with Cassandra source.  These tests where run with the stress test classes running on the same Pi as the Cassandra instance.  To be honest it’s not looking great,  but looking at ways of getting faster IO and tuning:

Pi with class 10 Sd card

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
289,28,28,0.5914359861591696,11
1266,97,97,0.13869907881269192,22
2056,79,79,0.5817556962025316,32
3744,168,168,0.31356575829383887,43
5737,199,199,0.20355143000501758,54
7703,196,196,0.179558494404883,64
9514,181,181,0.2858961899503037,74
11305,179,179,0.1481675041876047,85

As you can see this ramps up to a interval op rate of nearly 200.  Using an external HD on the USB is actually slightly worse (extract from later in the run):

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
889736,133,133,0.263335837716003,7116
890287,55,55,0.5460671506352087,7126
891445,115,115,0.37030224525043176,7136
892668,122,122,0.4226631234668847,7147
894149,148,148,0.23508845374746792,7159
895317,116,116,0.3649169520547945,7170
896475,115,115,0.23375302245250432,7180
897574,109,109,0.47590354868061874,7190
898558,98,98,0.41302845528455284,7201
899626,106,106,0.39788951310861426,7211
900910,128,128,0.21852570093457943,7221
902078,116,116,0.41339640410958906,7232

For comparison, here’s the results from my new Apple Mac Air with a SSD :

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
30581,3058,3058,0.013996631895621465,10
137194,10661,10661,0.003453800193222215,20
277241,14004,14004,0.0024328332631188103,30
408411,13117,13117,0.0025887245559197986,40
535147,12673,12673,0.0027360339603585406,51
662768,12762,12762,0.0026654704163107954,61
792233,12946,12946,0.0026509326845093268,71
919061,12682,12682,0.002678690825369792,81
1000000,8093,8093,0.0026410383128034694,88
END

Monday, May 21, 2012

Snappy Compression fails for Apache Cassandra on a Raspberry Pi

Although I’ve managed to get Apache Casssandra running on a Raspberry Pi I’ve been struggling to make any use of it.    I’ve been using the latest build of Cassandra (1.1.0) and when ever I’ve tried creating a column family I’ve been getting the following error:

SnappyCompressor.create() threw an error: org.xerial.snappy.SnappyError [FAILED_TO_LOAD_NATIVE_LIBRARY] null

As you are aware compression on Column Families was introduced in  Cassandra 1 for space saving and increased disk IO (http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression).  By default the compression is Snappy using the Google Snappy Library (http://code.google.com/p/snappy-java/ ) which is a wrapper for Google Snappy (http://code.google.com/p/snappy/) written in C++.  There is no port for Google Snappy on the Pi and my initial attempts to build one have not been successful (autogen.sh is failing with undefined macro AC_DEFINE)

You should be able to create a column family with DeflateCompressor from  cqlsh as follows:

create TABLE users (KEY varchar Primary key, password varchar, gender varchar) WITH compression_parameters:sstable_compressor = 'DeflateCompressor';

However that’s failing with the same Snappy Compressor error.   Describing the Keyspace with Cassandra-cli gives the following:
Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: users
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      DC Local Read repair chance: 0.0
      Replicate on write: true
      Caching: KEYS_ONLY
      Bloom Filter FP chance: default
      Built indexes: []
      Column Metadata:
        Column Name: password
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
        Column Name: gender
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compressor: DeflateCompressor
        sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Note the two compressors in action.  I’m not sure quite what’s going on, or if this is the correct result at the moment but I’m continuing to investigate (any answers gratefully received.    So for now it’s back to Apache Cassandra 0.8.10 which doesn’t have compression and seems to work on the Pi (once the env file has been changed).

Update

Thanks to Jonathan Ellis for pointing out the command should be:

Create TABLE users2 (KEY varchar Primary key, password varchar, gender varchar) WITH compression_parameters:sstable_compression = 'DeflateCompressor';

Thursday, May 17, 2012

Starting Cassandra on Raspberry Pi restart



(may be useful for other Raspberry Pi services)



A couple of days ago I blogged about getting Cassandra running on a  Raspberry Pi,  a fairly straight forward procedure!    However the way I had it set up required Cassandra to be restarted manually each time the Pi was turned on which is not ideal.  We really want our Cassandra to start each time the Pi is started and to run as a service in the background.  Services on Debian (which I am using) are  defined in Init.d and are managed by the update-rc.d command.     So, we need a command script to put into /etc/init.d that will be run when the Pi restarts.  There is an example script at: http://www.jansipke.nl/centos-cassandra-init-start-stop-script/  but for the pi this has a number of problems.    


The first problem is that the script uses the  daemon command to run the Cassandra script.  This command does not exist under Debian instead we will need to use the start-stop-daemon command.  Instead of :


daemon $CASSANDRA_BIN -p $CASSANDRA_PID >> $CASSANDRA_LOG 2>&1


we will use:


start-stop-daemon --start  --pidfile $CASSANDRA_PID --startas $CASSANDRA_BIN -p $CASSANDRA_PID  >> $CASSANDRA_LOG 2>&1


The second problem is that usleep doesn’t exist under this Debian, use “sleep 0.500000” instead.  Finally we will need to add the java path and java_home to the start of the file:


PATH=$PATH:/usr/local/bin/java
JAVA_HOME=/usr/local/bin/java


The full file is available on github  here: https://github.com/acobley/CassandraStartup.  Remember you will need to change the variables that point to the locations of Cassandra and it’s PID file and lock file.   Once the file is copied to /etc/init.d (remember to change I so it’s executable “sudo chmod +x Cassandra”) you can use update-rc.d  to add it to all the /etc/rc?.d files:


update-rc.d  cassandra defaults


No doubt this procedure and the Cassandra file can be improved !


For more info on the update-rc.d file see: http://www.debuntu.org/how-to-manage-services-with-update-rc.d


Thursday, May 10, 2012

Apache Cassandra on a Raspberry Pi


One of the  reason I got hold of a Raspberry Pi (the $35 arm based Linux
machine) was to play around with building a cluster of them  for handling
"Big Data".  This is a real exercise in tinkering, very much a “what if” scenario.  The first thing I wanted to play with was getting Apache Cassandra to run on  the Pi.  Of course Cassandra is built with Java, there is no java on the Pi out of the box.    Several people suggested building Openjdk (http://openjdk.java.net/) but I plumbed for Oracle’s java SE for  embedded available here:


Download Java SE for Embedded 7 (ARMv6/7 Linux - Headless )and install it on the PI.  Once done and with the path and java_home correctly set you should be able to run the java on your pi.

Next get hold of a version of Apache Cassandra (http://cassandra.apache.org/)  I was using version 1.1.0.  Install it as usual.    If yo u try and run Cassandra from the bin directory it will fail to start with the serror:

“Invalid initial eden size: -Xmn0M”

The problem is that cassandra-env.sh is trying to work out the heap size by getting the  amount of memory on the machine and multiplying it it by the number of processors (line 69)

max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*" $system_cpu_cores`

The number of system cpu cores (line 22 or there abouts):

system_cpu_cores=`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`

is failing on the Pi (there is no processor line in /proc/cpuinfo).  For now I’ve manually altered cassanda-env.sh  to report only on core. 

system_cpu_cores=1

Cassandra now runs on the Pi.     There must be a better way of doing this( my Linux programming is failing me for the moment)  but for now I’m just tinkering so it will do.

Next up, try and get some performance figures for the Pi running Cassanadra.

Update

The correct way to fix this is to add change the Linux section of cassandra-env.sh to
           system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
            system_cpu_cores=`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`
            if [ "$system_cpu_cores" -lt "1" ]
            then
               system_cpu_cores="1"
            fi
            echo "Linux"
            echo "memory" $system_memory_in_mb
            echo "cores" $system_cpu_cores