Saturday, June 16, 2012

3 Node / 4 Node Cassandra Stress test on a Raspberry Pi cluster

One of the things I’m interested in is using  tiny Raspberry Pi computers for teaching database and network admin to Undergraduate and MSc students.  In the first instance I’ve been looking at building a large cluster of these devices for to run a cluster of apache Cassandra database servers.  I’m in no way expecting these to get any where near the performance of  real servers or even VM installations but, for me at least, they give a feeling of working with real hardware.   The first thing I’m doing is conducting stress tests  with various configurations,  but I’m limited by availability of the  devices.  I started out with a cluster of 3 and have just managed to add another node.    The stress test is using the stress command Cassandra provides in the tools directory of a standard installation (some distributions missed the directory so  you may need to get the source and build the stress tool yourself).   After we’ve looked at the chart, I’ll look a little at the process of adding a new node to a Cassandra cluster.  For the record the commands I used to stress the cluster are as follows:

./stress -d,, -o insert -I DeflateCompressor


./stress -d,, -o read

For a 4 node test I added the new node into the list of hosts.  Note also I’m using DeflateCompressor as I’ve not yet managed to get snappy compressor compiled for the Pi.  I used a Mac book air to drive the stress test over a wifi connection to the cluster which is connected via a Netgear 10Meg switch which should handle the data rates form a Pi

Here then  is a graph combining inserts and reads for 3 and 4 node clusters:

One thing I do want to note here, for both the 3 and 4 node clusters the insert performance drops suddenly towards the end of the run.  I’m not sure why that happens.  The clusters where in both case balanced with each node running 90% CPU.  Here’s the ring information for the cluster arrangements (optained from the nodetool command ./nodetool -h ring)

Address         DC          Rack        Status State   Load            Effective-Owership  Token                                       
                                                                                           113427455640312821154458202477256070485    datacenter1 rack1       Up     Normal  14.67 MB        33.33%              0                                      datacenter1 rack1       Up     Normal  14.42 MB        33.33%              56713727820156410577229101238628035242    datacenter1 rack1       Up     Normal  14.51 MB        33.33%              113427455640312821154458202477256070485     

pi@raspberrypi:/home/space/apache-cassandra-1.1.0/bin$ ./nodetool -h ring
Address         DC          Rack        Status State   Load            Effective-Owership  Token                                       
                                                                                           127605887595351923798765477786913079296    datacenter1 rack1       Up     Normal  11.24 MB        25.00%              0                                      datacenter1 rack1       Up     Normal  11.24 MB        25.00%              42535295865117307932921825928971026432    datacenter1 rack1       Up     Normal  11.38 MB        25.00%              85070591730234615865843651857942052864    datacenter1 rack1       Up     Normal  11.1 MB         25.00%              127605887595351923798765477786913079296     

Moving from 3 to 4 nodes.

Here’s the procedure I used to move from 3 to 4 nodes. Providing your  cluster is already balanced with the initial_token correctly set in the Cassandra.yaml file you can add the new node with it’s correct key.  Once it’s bootstrapped on each of the other nodes you can use nodetool move to change that nodes token, something like:

sudo ./nodetool -h move 42535295865117307932921825928971026432

Does this on each node that needs to be moved, so not the first node with a token of 0 and the new node you've just added with the correct initial token.  After the node is moved you will need to run cleanup to delete any data that the node doesn’t need:

./nodetool -h cleanup

There’s a simple python code you can use to calculate the keys (this version courtesy of a good friend  on twitter)

import sys
if (len(sys.argv) > 1):
   num = int(sys.argv[1])
   num = int(raw_input("How many nodes? :"))
for i in range(0,num):
   print 'node %d: %d' % (i, (i*(2**127)/num))

I’m looking forward to going beyond 4 nodes soon !

Getting more memory on the Pi

The Pi is a little short on memory for this type of server.  The situation isn’t helped by some of the memory being shared by the GPU, the default being 64M.  You can move this down to 32 M by changing the start.elf file.

Change to /boot  on the pi
Copy start.elf to start.elf.old  (sudo cp start.elf start.elf.old)
Copy arm224_start.elf to start.elf (sudo cp arm224_start.elf to start.elf)

Reboot.  You can use the top command to see the performance of your Pi and how much memory it has.   See for more information on the elf files available and how much memory the GPU uses for each.

A Pic of the setup
Just for completeness, here's a pic of 4 Raspberry Pi running apache cassandra


  1. Hi, thnks a lot for the information, What tool did you use to build the graph?

  2. I used EXCEL, nice and simple !

  3. Hi Andy,

    This is a great article and thanks.

    I have a project that needs some Cassandra skills and thus I'm looking for someone to configure and admin a cassandra cluster for me. Can you recommend anyone please?

    (a n d i AT m c b u r n i e DOT c o m)