As you are certainly aware of, at the core of all our offerings lies an image recognition system. Such an image recognition system functions very similar to a text search engine – with some added complexity and pitfalls here and there. In fact, image recognition is much less advanced in its capabilities than text search. One of the challenges is posed by the current recognition capabilities. While significantly more advanced than just 5 years ago, many image search tasks still suffer from the so-called “semantic gap”: many high-level queries (e.g. “give me a picture with a car in the rain”) are just impossible to solve with current state-of-the art technology.
At kooaba we offer currently services based on technology that works today: recognition of specific items (Media covers, posters, landmark buildings, magazine pages) works pretty well already nowadays. (Obviously, we will continuously expand our services as technology improves).
However, even or in particular for this type of visual object retrieval one huge challenge remains, which is scalability. Only very recently, vision researchers were able to demonstrate real-time search in collections in the order of millions of items. Obviously, when rolling out such a system as a product, you need significant hardware resources. The database/index is even early on too large-to fit into the RAM on a standard server, and CPU demands are very high also. Problems obviously get worse, when you have to handle many users, since you have to replicated the whole structure to handle the load in parallel. This used to mean a lot of upfront investment in hardware – which is a problem for any startup.
Thus, we at kooaba we know from the moment cloud computing offers started appearing a few years ago, that it would be a crucial success factor for us. In particular amazon’s ec2 caught our eye. We made it a central part of our technology strategy.
Today, we have deployed a system which indexes and recognizes around 7 million items. The architecture is shown below.

We run our own local servers, which handle incoming traffic from our Mobile applications (iPhone and Android) and also serve dynamic web-pages for search results, the kooaba library, content and data management etc.
However, the complete image recognition is deployed on a set of servers on ec2. Due to the demanding nature of our search engine we rely on large instance types. (Also, we were very glad when reserved instances were announced, to cut our costs further). The instances run our custom images with our proprietary search software written in C++ on Ubuntu Linux.
Requests are distributed to the instances in the amazon cloud, and collected and aggregated at our local servers.
In addition to the image search we also run Apache SOLR for fulltext search functions in the kooaba library. The text index is stored on elastic block storage and backuped to s3 using snapshots triggered by cron jobs.
So far we have never regretted our move to cloud computing. The benefit of avoiding up-front hardware investments is enormous, and the ec2 service has been very reliable so far – in fact, even more reliable than our local server housing partners.
We are currently considering moving also our web applications (written in Ruby on Rails), relational databases (mysql), and potentially even other parts of the infrastructure to ec2. Overall, we got very hesitant in buying our own hardware and always first compare offers to cloud computing – not only for computation, but also for storage.