Saturday, June 16, 2012

3 Node / 4 Node Cassandra Stress test on a Raspberry Pi cluster

One of the things I’m interested in is using  tiny Raspberry Pi computers for teaching database and network admin to Undergraduate and MSc students.  In the first instance I’ve been looking at building a large cluster of these devices for to run a cluster of apache Cassandra database servers.  I’m in no way expecting these to get any where near the performance of  real servers or even VM installations but, for me at least, they give a feeling of working with real hardware.   The first thing I’m doing is conducting stress tests  with various configurations,  but I’m limited by availability of the  devices.  I started out with a cluster of 3 and have just managed to add another node.    The stress test is using the stress command Cassandra provides in the tools directory of a standard installation (some distributions missed the directory so  you may need to get the source and build the stress tool yourself).   After we’ve looked at the chart, I’ll look a little at the process of adding a new node to a Cassandra cluster.  For the record the commands I used to stress the cluster are as follows:

./stress -d,, -o insert -I DeflateCompressor


./stress -d,, -o read

For a 4 node test I added the new node into the list of hosts.  Note also I’m using DeflateCompressor as I’ve not yet managed to get snappy compressor compiled for the Pi.  I used a Mac book air to drive the stress test over a wifi connection to the cluster which is connected via a Netgear 10Meg switch which should handle the data rates form a Pi

Here then  is a graph combining inserts and reads for 3 and 4 node clusters:

One thing I do want to note here, for both the 3 and 4 node clusters the insert performance drops suddenly towards the end of the run.  I’m not sure why that happens.  The clusters where in both case balanced with each node running 90% CPU.  Here’s the ring information for the cluster arrangements (optained from the nodetool command ./nodetool -h ring)

Address         DC          Rack        Status State   Load            Effective-Owership  Token                                       
                                                                                           113427455640312821154458202477256070485    datacenter1 rack1       Up     Normal  14.67 MB        33.33%              0                                      datacenter1 rack1       Up     Normal  14.42 MB        33.33%              56713727820156410577229101238628035242    datacenter1 rack1       Up     Normal  14.51 MB        33.33%              113427455640312821154458202477256070485     

pi@raspberrypi:/home/space/apache-cassandra-1.1.0/bin$ ./nodetool -h ring
Address         DC          Rack        Status State   Load            Effective-Owership  Token                                       
                                                                                           127605887595351923798765477786913079296    datacenter1 rack1       Up     Normal  11.24 MB        25.00%              0                                      datacenter1 rack1       Up     Normal  11.24 MB        25.00%              42535295865117307932921825928971026432    datacenter1 rack1       Up     Normal  11.38 MB        25.00%              85070591730234615865843651857942052864    datacenter1 rack1       Up     Normal  11.1 MB         25.00%              127605887595351923798765477786913079296     

Moving from 3 to 4 nodes.

Here’s the procedure I used to move from 3 to 4 nodes. Providing your  cluster is already balanced with the initial_token correctly set in the Cassandra.yaml file you can add the new node with it’s correct key.  Once it’s bootstrapped on each of the other nodes you can use nodetool move to change that nodes token, something like:

sudo ./nodetool -h move 42535295865117307932921825928971026432

Does this on each node that needs to be moved, so not the first node with a token of 0 and the new node you've just added with the correct initial token.  After the node is moved you will need to run cleanup to delete any data that the node doesn’t need:

./nodetool -h cleanup

There’s a simple python code you can use to calculate the keys (this version courtesy of a good friend  on twitter)

import sys
if (len(sys.argv) > 1):
   num = int(sys.argv[1])
   num = int(raw_input("How many nodes? :"))
for i in range(0,num):
   print 'node %d: %d' % (i, (i*(2**127)/num))

I’m looking forward to going beyond 4 nodes soon !

Getting more memory on the Pi

The Pi is a little short on memory for this type of server.  The situation isn’t helped by some of the memory being shared by the GPU, the default being 64M.  You can move this down to 32 M by changing the start.elf file.

Change to /boot  on the pi
Copy start.elf to start.elf.old  (sudo cp start.elf start.elf.old)
Copy arm224_start.elf to start.elf (sudo cp arm224_start.elf to start.elf)

Reboot.  You can use the top command to see the performance of your Pi and how much memory it has.   See for more information on the elf files available and how much memory the GPU uses for each.

A Pic of the setup
Just for completeness, here's a pic of 4 Raspberry Pi running apache cassandra

Thursday, June 7, 2012

Raspberry Pi, not just for teaching programming

There’s quite rightly been a lot of talk about the Raspberry Pi, and quite some discussion on what it’s good for.      Whether it will  succeed in it’s mission to create a new army of programmers is any one’s guess at this point, but for me it’s already succeeding.  No, not for programming but for teaching computer administration.   I’ve got a small cluster of  Pi (three, 2 borrowed at the moment) and I’ve been having a lot of fun configuring apache Cassandra on  them.    So for   less than £100 I’ve got a Linux cluster I can blow away at any moment and start again.    I can reconfigure the Cassandra settings, start the cluster again and run stress tests on the thing.

I take backups of the SD cards once in a while so I can go back to previous configs at any point  which is quite easy.    On a Mac (or Linux) just  put the card in a card reader and use the following command:

dd bs=1m if=/dev/rdisk1 of=disk.img

where rdisk1 is the USB port you will have identified when creating your first image and disk.img is the file you want to create.

As a teacher,  having cheap hardware around like this is going to allow  us to give students machines to  play on, to set up, muck around with little or no chance of damage.  Our undergraduate networking course is  going to get a whole lot more hands on !  Sure you could do it with VMs  but that just won’t feel as real as plugging a cluster together.  Once we have our data Science MSc up and running (News here )  I’m hoping that we can give the students access to a 2 data center cluster of 20 to 30 machines, all for around 100.    Doing that with VMs is possible but a lot work (although you can of course automate it)  and would probably cost a  lot more to setup !

Looks like the Pi can do a lot more than teach programming.