Back in November, we gave you an overview of the technology landscape that we use at Scandit to run our product analytics platform (Scanalytics). In response to that post, we received quite a few questions from people asking about our experience with the Apache Cassandra project, which we use for storing scan data and analytics reports. Because of this interest we thought it would be a great idea to share with you some more details on this. So today, we are going to give a quick overview on how we upgraded our Cassandra 0.6.x cluster to the latest 1.0.8 release.
When we initially deployed our platform in spring 2010, Cassandra was at version 0.5.1. Even thought the project was still pretty young at that time, we were very impressed by its performance and stability. After extensive testing, we found that it fit our needs and decided to use the 0.6.0 release for our first roll out. Over the next 12 months, we kept upgrading our cluster until we reached 0.6.13, which was the last release in the 0.6.x branch.
In the meantime, Cassandra was evolving at an amazing speed. Many cool new features, such as secondary indices, CQL and schema support were added. Since we were very happy with our deployment, we moved a little slower and skip the 0.7.x releases. Now that 1.0.x has been around for a few months, we decided it was time to upgrade. Because the list of changes between the two versions was fairly long, we did the upgrade in two steps: First from 0.6.13 to 0.8.7 and then from 0.8.7 to 1.0.8.
Upgrading from 0.6.13 to 0.8.7
Upgrading to version 0.8.7 can be complex because:
1. You have to take all nodes offline at the same time. The reason is that the network protocol used by nodes to exchange messages changed between the two versions.
2. The configuration file (storage-conf.xml) has a new name (cassandra.yaml) and format. Cassandra provides a tool (bin/config-converter) to convert your old configuration file to the new format.
3. Keyspace and column family definitions are no longer stored in a configuration file, but in the system keyspace. Cassandra provides a tool (bin/schematool) to import the keyspace and column family definitions from the configuration file into the system keyspace.
* Install Cassandra 0.8.7 in parallel to 0.6.13.
* Install Cassandra 0.7.9 too. While you will never start the 0.7.9 version, you still need to install this release because the “config-converter” and “schematool” tools are only available in the 0.7.x branch.
* Using the tool from the 0.7.9 installation, create a new configuration file for your 0.8.7 installation based on your 0.6.13 configuration:
Do this on all nodes.
* On all nodes, create a backup of your data. You can do so by running the “nodetool snapshot” command. This will flush all pending writes to disk and create a hard link for every data file (SSTable). Because no files are actually copied, this step doesn’t take long or eat up a lot of disk space.
* Empty the commitlog and stop the node from accepting writes:
* On all nodes, stop Cassandra 0.6.13.
* On all nodes, start Cassandra 0.8.7.
* Import the schema and column family definitions. You need to do this once only (on a single node).
* You should now be able to access your data. Do some testing. Make sure that you use the new Thrift API. (Clients still using the old Thrift API will not be able to access the upgraded cluster.)
* On all nodes, run “nodetool scrub”.
* Run “nodetool repair”.
Upgrading from 0.8.7 to 1.0.8
This step is easier. Because the inter-node communication of 0.8.7 is compatible with 1.0.8, you can upgrade one node at a time without taking the whole cluster offline.
* Install 1.0.8 in parallel to 0.8.7.
* On all nodes, create a backup of your data (see above).
* On all nodes: stop Cassandra 0.8.7 and start Cassandra 1.0.8
* On all nodes, run “nodetool scrub”.
Hope you Cassandra geeks out there find this information helpful! Be sure to look out for more technical How-To posts from the Scandit team coming in the next few months, and be sure to post your comments and questions below!