Friday, May 25, 2012

Cassandra Stress test on a Raspberry Pi

So here are my initial results at using  a  Raspberry Pi to run Cassandra.  At the moment I’m running a single Pi with Cassandra 0.8.10 (compacting is not available and Snappy  will not run currently on the Pi) I’m using the  java stress tests that come with Cassandra source.  These tests where run with the stress test classes running on the same Pi as the Cassandra instance.  To be honest it’s not looking great,  but looking at ways of getting faster IO and tuning:

Pi with class 10 Sd card


As you can see this ramps up to a interval op rate of nearly 200.  Using an external HD on the USB is actually slightly worse (extract from later in the run):


For comparison, here’s the results from my new Apple Mac Air with a SSD :


Monday, May 21, 2012

Snappy Compression fails for Apache Cassandra on a Raspberry Pi

Although I’ve managed to get Apache Casssandra running on a Raspberry Pi I’ve been struggling to make any use of it.    I’ve been using the latest build of Cassandra (1.1.0) and when ever I’ve tried creating a column family I’ve been getting the following error:

SnappyCompressor.create() threw an error: org.xerial.snappy.SnappyError [FAILED_TO_LOAD_NATIVE_LIBRARY] null

As you are aware compression on Column Families was introduced in  Cassandra 1 for space saving and increased disk IO (  By default the compression is Snappy using the Google Snappy Library ( ) which is a wrapper for Google Snappy ( written in C++.  There is no port for Google Snappy on the Pi and my initial attempts to build one have not been successful ( is failing with undefined macro AC_DEFINE)

You should be able to create a column family with DeflateCompressor from  cqlsh as follows:

create TABLE users (KEY varchar Primary key, password varchar, gender varchar) WITH compression_parameters:sstable_compressor = 'DeflateCompressor';

However that’s failing with the same Snappy Compressor error.   Describing the Keyspace with Cassandra-cli gives the following:
Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: users
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      DC Local Read repair chance: 0.0
      Replicate on write: true
      Caching: KEYS_ONLY
      Bloom Filter FP chance: default
      Built indexes: []
      Column Metadata:
        Column Name: password
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
        Column Name: gender
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compressor: DeflateCompressor

Note the two compressors in action.  I’m not sure quite what’s going on, or if this is the correct result at the moment but I’m continuing to investigate (any answers gratefully received.    So for now it’s back to Apache Cassandra 0.8.10 which doesn’t have compression and seems to work on the Pi (once the env file has been changed).


Thanks to Jonathan Ellis for pointing out the command should be:

Create TABLE users2 (KEY varchar Primary key, password varchar, gender varchar) WITH compression_parameters:sstable_compression = 'DeflateCompressor';

Thursday, May 17, 2012

Starting Cassandra on Raspberry Pi restart

(may be useful for other Raspberry Pi services)

A couple of days ago I blogged about getting Cassandra running on a  Raspberry Pi,  a fairly straight forward procedure!    However the way I had it set up required Cassandra to be restarted manually each time the Pi was turned on which is not ideal.  We really want our Cassandra to start each time the Pi is started and to run as a service in the background.  Services on Debian (which I am using) are  defined in Init.d and are managed by the update-rc.d command.     So, we need a command script to put into /etc/init.d that will be run when the Pi restarts.  There is an example script at:  but for the pi this has a number of problems.    

The first problem is that the script uses the  daemon command to run the Cassandra script.  This command does not exist under Debian instead we will need to use the start-stop-daemon command.  Instead of :


we will use:

start-stop-daemon --start  --pidfile $CASSANDRA_PID --startas $CASSANDRA_BIN -p $CASSANDRA_PID  >> $CASSANDRA_LOG 2>&1

The second problem is that usleep doesn’t exist under this Debian, use “sleep 0.500000” instead.  Finally we will need to add the java path and java_home to the start of the file:


The full file is available on github  here:  Remember you will need to change the variables that point to the locations of Cassandra and it’s PID file and lock file.   Once the file is copied to /etc/init.d (remember to change I so it’s executable “sudo chmod +x Cassandra”) you can use update-rc.d  to add it to all the /etc/rc?.d files:

update-rc.d  cassandra defaults

No doubt this procedure and the Cassandra file can be improved !

For more info on the update-rc.d file see:

Thursday, May 10, 2012

Apache Cassandra on a Raspberry Pi

One of the  reason I got hold of a Raspberry Pi (the $35 arm based Linux
machine) was to play around with building a cluster of them  for handling
"Big Data".  This is a real exercise in tinkering, very much a “what if” scenario.  The first thing I wanted to play with was getting Apache Cassandra to run on  the Pi.  Of course Cassandra is built with Java, there is no java on the Pi out of the box.    Several people suggested building Openjdk ( but I plumbed for Oracle’s java SE for  embedded available here:

Download Java SE for Embedded 7 (ARMv6/7 Linux - Headless )and install it on the PI.  Once done and with the path and java_home correctly set you should be able to run the java on your pi.

Next get hold of a version of Apache Cassandra (  I was using version 1.1.0.  Install it as usual.    If yo u try and run Cassandra from the bin directory it will fail to start with the serror:

“Invalid initial eden size: -Xmn0M”

The problem is that is trying to work out the heap size by getting the  amount of memory on the machine and multiplying it it by the number of processors (line 69)

max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb "*" $system_cpu_cores`

The number of system cpu cores (line 22 or there abouts):

system_cpu_cores=`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`

is failing on the Pi (there is no processor line in /proc/cpuinfo).  For now I’ve manually altered  to report only on core. 


Cassandra now runs on the Pi.     There must be a better way of doing this( my Linux programming is failing me for the moment)  but for now I’m just tinkering so it will do.

Next up, try and get some performance figures for the Pi running Cassanadra.


The correct way to fix this is to add change the Linux section of to
           system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
            system_cpu_cores=`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`
            if [ "$system_cpu_cores" -lt "1" ]
            echo "Linux"
            echo "memory" $system_memory_in_mb
            echo "cores" $system_cpu_cores