Indexing data that's not in your database with Sphinx and Thinking Sphinx

October 13, 2010

At AboutUs where I work we recently re-implemented the site’s search feature. Behind the scenes the new search uses Sphinx (a super-fast and scalable search engine) and Thinking Sphinx (which allows easy configuration and querying of Sphinx from within Rails).

One thing that makes Sphinx so much faster than alternatives like Solr is that it communicates directly with your database when it is building its search index. This is far faster than talking with your application layer, and indexing the result of your application models’ methods. In our case we’re able to reindex about 40 million records in around 4 hours. Solr used to take days.