21 Nov, 2011

Sneak peek at Scandit’s scalable backend infrastructure


As you probably know, Scandit is much more than a barcode recognition technology. If you integrate the Scandit SDK in your iPhone, Android or Nokia app, you also get access to our web-based product analytics platform (Scanalytics) that lets you see what kind of products your users scan (groceries, electronics, cosmetics, etc.), where they scan (at home, in store) and more. This means that our tech team is not just busy constantly improving our image recognition algorithms, but also looks after a backend infrastructure to collect statistics and process product data. Every now and then, people ask us what technologies we use and how our infrastructure looks like. So we thought it might be time to give you a glimpse into how our backend is built.

For data storage, we use an Apache Cassandra cluster. Cassandra is basically a distributed hash table (DHT) based on a ring topology. Data is distributed among a number of physical servers in a redundant way. We use Cassandra for several reasons:

  1. The cluster is very easy to run and maintain. Every node looks exactly the same. There are no central coordinating nodes or masters and slaves. This means that there is no single point of failure. If a server goes down, clients can simply connect to any other node that will serve the request just like the original node.
  2. Cassandra performs very well in write-heavy environments, which is what we have.
  3. Cassandra scales linearly without the complexity introduced by sharding. As your data grows, you can just add more physical machines to the cluster and the ring will rebalance itself.


For our web applications, we use Ruby on Rails, jQuery and the Highcharts library. Highcharts is written entirely in Javascript and lets you create really nice looking charts for your data. All REST APIs are again implemented in Ruby, but use the Rack framework only. Frontend to all these applications is an Apache webserver with the Phusion Passenger module enabled.

In addition to Cassandra, we also have a small MySQL database that holds some data for the web applications that is relational and not very well suited for storage in a DHT.

Finally, we also have a bunch of background worker processes (again written in Ruby) which take care of long running jobs that cannot run in the Apache/Passenger environment. We use the Beanstalkd message queue to pass jobs from the web application to the worker processes.

So far, this mix of technologies has worked very well for us, and we’re excited to see our infrastructure grow with the number of apps that rely on Scandit

Leave a Reply