Friday, July 18, 2014

Solving Eclipse and Maven problems:

"Failed to read artifact descriptor for"


I've spent most of today trying to add a dependency for the Cassandra Java driver to a maven pom in eclipse with little luck at all.  Worst when I tried it on another machine it worked fine so it was something wrong with my mac laptop.  Nothign I did would work, I kept on getting a error " Failed to read artifact descriptor for com.datastax.cassandra" etc. Looking at:

Stackoverflow: Maven: Failed to read artifact descriptor 

It suggested Maven -> Update Project and click on the force option.  No joy there !  It was only when I tried a manual mvn -U clean install command that I got the full error:
 "Failed to execute goal on project testmaven: Could not resolve dependencies for project uk.ac.dundee.computing.aec:testmaven:jar:0.0.1-SNAPSHOT: Failed to collect dependencies at com.datastax.cassandra:cassandra-driver-core:jar:2.0.3: Failed to read artifact descriptor for com.datastax.cassandra:cassandra-driver-core:jar:2.0.3: Could not transfer artifact com.datastax.cassandra:cassandra-driver-core:pom:2.0.3 from/to central (http://repo.maven.apache.org/maven2): Specified destination directory cannot be created: /Users/Administrator/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.0.3"
Looking at the permissions on /Users/Administrator/.m2/repository/com/datastax/cassandra/ did I see that the sub directories where owned by root. I must have used sudo at some point to manually build the cassandra java driver from a git repo (in fact I knew I did).

The answer then was to delete  /Administrator/.m2/repository/com/datastax and then run the force maven project update in eclipse.

All now works well !

Update to Java 1.8

I also ran into a problem changing the java version of a project from 1.5 to 1.7 or 1.8.  Yes you can change the project facet, but you'll find that a maven update will change it back to 1.5.  This stakoverflow has the correct answer:

Java. Warning - Build path specifies execution environment J2SE-1.4

Open the pom.xml file and add the following to the section
      <pluginManagement>  
           <plugins>  
                <plugin>  
                     <groupId>org.apache.maven.plugins</groupId>  
                     <artifactId>maven-compiler-plugin</artifactId>  
                     <configuration>  
                          <source>1.8</source>  
                          <target>1.8</target>  
                     </configuration>  
                </plugin>  
           </plugins>  
      </pluginManagement>  


You will need to do a maven update from eclipse after that.

Change Web app facet to 3.0

If you create a dynamic webapp from the file->new->maven project and select maven-archetype-webapp you may find that it is "stuck" at version 2.3.  If you try and change it to 2.4 or higher (3.1 for instance) you'll be prevented. Deep inside this thread is on stackoverflow is the correct answer (for me):

Cannot change version of project facet Dynamic Web Module to 3.0?


In eclipse, go to window->show vie -> navigator.  Now in the navigator window you should see the .settings folder.  Open the folder and open the file org.eclipse.wst.common.project.facet.core.xml  Inside that file you should see jst.web and you can change the webapp facet version there.  Again do a maven update from eclipse after that.

Wednesday, March 19, 2014

Saving an image in Cassandra BLOB field

We had an occasion today to be able to store images in a blob field of a Cassandra tables.  More to the point I needed to extract it and send it from a java servlet to a web browser as an image.   The code for storing the image is quit easy but there is a small gotcha when retrieving it.   So, suppose we have a table that looks something like:

String CreateTweetTable = "CREATE TABLE if not exists Messages ("+
                "user varchar,"+
                " interaction_time timeuuid,"+
                " tweet varchar,"+
                " image blob," +
                " imagelength int,"+
                " PRIMARY KEY (user,interaction_time)"+
                ") WITH CLUSTERING ORDER BY (interaction_time DESC);";

Our image will be in the blob field and we also store the size of the image for reference. We can load a picture from a file on the local machines hard disk like this:

FileInputStream fis=new FileInputStream("/Users/Administrator/Desktop/mystery.png");
byte[] b= new byte[fis.available()+1];
int length=b.length;
fis.read(b);

We now need to convert the byte array into a bytebuffer:

ByteBuffer buffer =ByteBuffer.wrap(b);

Writing the record becomes simply:

 PreparedStatement ps = session.prepare("insert into Messages ( image, user, interaction_time,imagelength) values(?,?,?,?)");
BoundStatement boundStatement = new BoundStatement(ps);
session.execute(  boundStatement.bind( buffer, "Andy",  convertor.getTimeUUID(),length));


Getting the image back is simple.  Use a Select to get the result set:

PreparedStatement ps = session.prepare("select user,image,imagelength from Messages where user =?");
BoundStatement boundStatement = new BoundStatement(ps);
ResultSet rs =session.execute ( boundStatement.bind("Andy"));

We can now loop through the result set (here we are assuming only one image comes back)

ByteBuffer bImage=null;
for (Row row : rs) {
 bImage = row.getBytes("image") ;
 length=row.getInt("imagelength");
}

However to display the image we will need it as a byte array.  We can’t use bImage.get() as this reaches down in to the raw buffer (see: https://groups.google.com/a/lists.datastax.com/forum/#!searchin/java-driver-user/blob$20ByteBuffer/java-driver-user/4_KegVX0teo/2OOZ8YOwtBcJ for details )  Instead we can use :

byte image[]= new byte[length];
image=Bytes.getArray(bImage);

In the servlet we can return this image in one of 2 ways:

OutputStream out = response.getOutputStream();
response.setContentType("image/png");
response.setContentLength(image.length);
out.write(Image);

Writes the image as a single lump which may use too much memory.  You might be better using a bufferedinput stream (http://stackoverflow.com/questions/2979758/writing-image-to-servlet-response-with-best-performance

InputStream is = new ByteArrayInputStream(Image);
BufferedInputStream input = new BufferedInputStream(is);
byte[] buffer = new byte[8192];
for (int length = 0; (length = input.read(buffer)) > 0;) {
    out.write(buffer, 0, length);
}
out.close();

Wednesday, January 22, 2014

Running Cassandra 2.x.x on Windows 7 and 8

This blog post describes how to get the Cassandra 2.x.x family running on a windows machine.  It's clear that Cassandra should not be run for production on Windows (except perhaps on Azure), but if you're a student learning to use C* it may well be you have no choice to run it on Windows 7 or 8 on your laptop.  Lets get started : 

Install JRE 7

Open a command prompt and type java -version to see if it is installed properly.  If not find a jre from oracle and install it.  Make sure it's a version 7 at the least (version 6 will not work).

You'll need to set JAVA_HOME. Find the control panel (on windows 8 search for it).  Go to "system and security" and then "system". Click on "Advanced Settings" and then the "Environment Variables" button. Click on the new button an in the Variable name box type JAVA_HOME
under the value you'll need to put in the path to the java you are using.  Mine is

c:\program files\java\jre7

but yours may be different, especially if you have a jdk.  If you are going to program java clients for C* you will need a JDK but that's a different post

 

Install Cassandra

Download Cassandra from http://cassandra.apache.org/ probably a file like
apache-cassandra-2.0.4-bin.tar.gz
You'll need to unpack this file and that will depend on which flavor of windows you have.  At this point I'll assume you have a legal copy of winzip or similar.  Unpack the downloaded file to the root of the c: or d: drive on your machine

You can now change to the Cassandra install directory in your command prompt, change to the bin directory to start Cassandra, type Cassandra to start it.  The window will print a lot of information but you are looking for a line like:

 INFO 19:00:31,031 Listening for thrift clients...

to make sure it's working.

CQLSH

So now we have C* running, we need to check we can connect to it.  Start by opening another command prompt and type cqlsh. Sadly it won't start, cqlsh now needs an installation of python, so lets get one installed. Download one from  http://www.python.org/  and go to downloads then "individual release". click on the 2.x stable release and then
scroll down to the download section.  Your looking for the Windows MSI installer.  I used:

http://www.python.org/ftp/python/2.7.6/python-2.7.6.msi

Download it and run it to install Python, you'll need a version 2 of python, NOTE THIS WELL, version 3 will not work! This installs a nice windows version of python but does not install
a path to the executable.  You'll need to set it by hand I guess.  Once again Find the control panel (on windows 8 search for it).  Go to "system and security" and then "system"
Click on "Advanced Settings" and then the "Environment Variables" button.  Under the system variables find PATH.  Highlight it and click edit.

Careful!  We don't want to wipe the current contents (if you do hit cancel) go to the end of the current path and enter

;c:\python27

Note the ; at the beginning.  Again this will depend on the current version of python you've installed and should mirror the path to your python installation. Click OK to close the
dialog boxes and open a command prompt again.

Now you should be able to change to the cassandra directory and then the bin directory and type cqlsh.  With luck you should get the
cassandra cqlsh prompt:

Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh>

type "use system;" followed by "describe  keyspaces;", cqlsh should reply:

system  system_traces

You're now connected and ready to start work.

BTW, folks at datastax and apache Cassandra, why is this so hard ?  Would Datastax Devcenter work easier ?

Setting up a Cassandra cluster on Windows with a vagrant virtualbox

Setting up Cassandra on windows can now be a pain with all it's dependancies, but it's something I'll cover in a later post.  One simpler way is to get C* running in a virtual box and perhaps even run it as a mini cluster.  This can be helped a lot by using Vagrant, but even that isn't quite straight forward.

The following has worked for me and is based heavily on the work done by calebgroom  and his github contribution vagrant-cassandra.  I've altered it a bit for use with the latest C* which adds virtual nodes etc.  Using these instructions you should be able to provisiona 3 node C* cluster with vnodes.

1: Install oracle Vm Virtualbox from https://www.virtualbox.org/ the latest version should do.
2: Install git for windows http://msysgit.github.io/ ensure you select the option to run git from the command line.
3: Install ruby for windows http://rubyinstaller.org/ V2.x.x Select all options
4: Download devkit DevKit-mingw64-32-4.7.2-20130224-1151-sfx.exe
4.1: Extract it to a permeant location
4.2 Start "commandline" with ruby from
4.3 Change to devkit location and run
    ruby dk.rb init
    ruby dk.rb install

5 At any location run gem install librarian-chef (This may take sometime)
6 Download vagrant (http://www.vagrantup.com/ ) and install it
7: Open a command prompt and git clone https://github.com/acobley/vagrant-cassandra.git
8:  change to the directory vagrant-cassandra\vagrant and run
   librarian-chef install
9: Open vagrant/cookbooks/java/attributes and edit default.rb so that
    default['java']['jdk_version'] = '7'
   
10: If you want comment, out the “DL is deprecated, please use Fiddle” warning at C:\HashiCorp\Vagrant\embedded\lib\ruby\2.0.0\dl.rb
11: change to the vagrant-cassandra and run
    Vagrant up
This could take some time, but once it's finished you should be able to ssh to the virtual machine if you have a ssh installed.

   vagrant ssh node1

If you don't have ssh installed the the git installation comes with a ssh client so add c:\program files\git\bin to your path

 set PATH=%PATH%;c:\program files\git\bin

 Or set the path environment variable from the control panel.

 You can then ssh to the virtual host
  
  ssh vagrant@127.0.0.1 -p 2222 -i c:/users/*username*/.vagrant.d/insecure_private_key
 
Once inside the virtual machines you can test and see if  it works by getting the c* status by typing

/usr/local/cassandra/bin/nodetool -h 192.168.2.10 status





You can bring down the cluster with vagrant halt  and remove it with vagrant destroy (but then you'll need to start again!)

Vagrant can also be run on a mac.  Make sure you have vitualbox installed, clone the https://github.com/acobley/vagrant-cassandra.git and follow the instructions in the readme.

Saturday, November 2, 2013

Hadoop 2.x : jar file location for wordcount example

The Jar files  for Hadoop 2.x have moved location from Hadoop 1.x.  I found the following command

javac -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar -d wordcount_classes myWordCount.java

will allow you to compile the standard wordcount example code.  You can see that the common files are in /share/hadoop/common/ and the mapreduce files are in /share/hadoop/mapreduce/.  Finally the common lib file are in /share/hadoop/common/lib

This post is in answer to this stackoverflow question:

http://stackoverflow.com/questions/19488894/compile-hadoop-2-2-0-job

(or set your classpath lke this

 export CLASSPATH=$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

and compile like this:

javac -classpath $CLASSPATH -d myWordCountClasses myWordCount.java

)

Wednesday, October 30, 2013

Hadoop 2 on Ubuntu on Azure.

This is to be read in conjunction with http://ac31004.blogspot.co.uk/2013/10/installing-hadoop-2-on-mac_29.html

Fire up a Azure Ubuntu server and ssh to it

Install a Java JDK:
apt-get install default-jdk

On you home machine, download a copy of Hadoop and secure copy it to the Azure machine (your username and machine will be different)
scp hadoop-2.2.0.tar.gz user@Hadoopmachine.cloudapp.net:

Unzip it and untar it
gunzip hadoop-2.2.0.tar.gz
tar xvf  hadoop-2.2.0.tar

You'll still need to set up the env variables
export JAVA_HOME=/usr/lib/jvm/default-java
export HADOOP_INSTALL=/home/user/hadoop-2.2.0
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin


Also add JAVA_HOME, add Hadoop_INSTALL and change path in /etc/environment, see http://trentrichardson.com/2010/02/10/how-to-set-java_home-in-ubuntu/ for details

After setting up core-site.xml and hdfs-site.xml  you'll make the datanode and name nodename directories

mkdir -p /home/hadoop/yarn/namenode
mkdir /home/hadoop/yarn/datanode

Everything else should be the same.

Tuesday, October 29, 2013

Installing Hadoop 2 on a Mac

I've had a lot of trouble getting Hadoop 2 and yarn 2 running on my MAC.  There are some tutorials out there but they are often for
beta and alpha versions of the hadoop 2.0 family.  These are the steps I used to get Hadoop 2.2.0 working on my MAC running OSX 10.9


Get hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/

make sure JAVA_HOME is set (if you have Java 6 on your machine):
export JAVA_HOME=`/usr/libexec/java_home -v1.6`

point HADOOP_INSTALL to the hadoop installation directory
export HADOOP_INSTALL=/Applications/hadoop-2.2.0

And set the path
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

You can test hadoop is found with
hadoop -version

make sure ssh is set up on your machine:
system preferences -> sharing -> remote login is ticked

try:
ssh @localhost

where is the name you used to logon.

in $HADOOP_INSTALL/etc these are the conf files I changed.

core-site.xml

 <configuration>  
 <property>  
   <name>fs.default.name</name>  
   <value>hdfs://localhost:9000</value>  
  </property>  
 </configuration>  


hdfs-site.xml

 <configuration>  
 <property>  
   <name>dfs.replication</name>  
   <value>1</value>  
  </property>  
  <property>  
   <name>dfs.namenode.name.dir</name>  
   <value>file:/Users/Administrator/hadoop/namenode</value>  
  </property>  
  <property>  
   <name>dfs.datanode.data.dir</name>  
   <value>file:/Users/Administrator/hadoop/datanode</value>  
  </property>  
 </configuration>  


Make the directories for the namenode and datanode data (note the file above and the mkdir below will need to reflect where you  want to store the files, I've stored mine in the home directory of the Administrator user on my Mac).

mkdir -p /Users/Administrator/hadoop/namenode
mkdir -p /Users/Administrator/hadoop/datanode

hadoop namenode -format

yarn-site.xml
 <configuration>  
 <!-- Site specific YARN configuration properties -->  
 <property>  
 <name>yarn.resourcemanager.address</name>  
 <value>localhost:8032</value>  
 </property>  
 <property>  
 <name>yarn.nodemanager-aux-services</name>  
 <value>madpreduce.shuffle</value>  
 </property>  
 </configuration>  


start-dfs.sh
start-yarn.sh
jps

should give
9430 ResourceManager
9325 SecondaryNameNode
9513 NodeManager
9225 DataNode
9916 Jps
9140 NameNode

if not check log files.  If data node note started and  you get incompatible id's error, stop everything delete datanode directory and recreate
datanode directory

try  a ls
hadoop fs -ls

if you get

ls: `.': No such file or directory

then there is no home directory in the hadoop file system.  So

hadoop fs -mkdir /user
hadoop fs -mkdir /user/<username>
where is the name you are logged onto the machine with.

now change to $HADOOP_INSTALL directory and upload a file

hadoop fs -put LICENSE.txt


finally try a mapreduce job:

cd share/hadoop/mapreduce
hadoop jar ./hadoop-mapreduce-examples-2.2.0 wordcount LICENSE.txt out