In my last post, I explained the REST-style API that underlies Maven Central's browser-based search UI. That API essentially comes "for free" with the main components on which Maven Central Search is built:
- Apache Solr, the popular, blazing fast open source enterprise search platform from the Apache Lucene project -- http://lucene.apache.org/solr
- ajax-solr -- http://evolvingweb.github.com/ajax-solr
In this post, I will highlight those components and describe how they were used to implement Maven Central Search.
When we started the project, we looked at a couple of options for implementing search, including Solr and the existing Nexus search capability built directly on top of Apache Lucene. The Nexus approach initially seemed compelling as we clearly have significant experience with it and Nexus search even provides a REST API for full-text search that we could have leveraged. So, why did we end up choosing Solr when we could have simply re-used the search functionality in Nexus or even crafted a web UI backed by an instance of Nexus running on top of Central? Two reasons:
- Flexibility -- We discovered early on during the design phase of Central Search that we needed changes to the schemas, fields, and even field contents in the Lucene indices being used by Nexus. Making those changes to the schemas would have required other changes within the Nexus codebase. With Solr, we could simply point our Solr installation against an existing index or even have Solr build a new index from scratch by adding documents through Solr's REST API. We could rapidly prototype schema changes (often in 1-2 lines of xml and not even requiring us to restart Solr) and see our updated search results almost immediately.
- Scalability -- Solr bills itself as an "enterprise search platform." One of the enterprise features that attracted us to Solr was its built-in support for replication. As query load increases in the future, we can simply balance that load across hardware serving multiple copies of the same data. Solr's support for multiple indexes also leaves us a path open for sharding our index data, once it becomes so large as to be difficult to serve out of a single index on a single server.
Once we made the decision to use Solr, we quickly discovered that practically all the search functionality we needed came “out-of-the-box” with Solr’s REST API. In fact, the entire second half of the Maven Central API Guide is simply a set of URLs that are proxied to our running Solr instance. We proxy the requests so that we can do some filtering and transformation of inbound requests in order to prevent a malformed or malicious request from taking down our server.
The ajax-solr website is a great resource for understanding the architecture of ajax-solr and provides an excellent tutorial for building your first Solr-powered AJAX-based website. Our developers took hold of that tutorial and very quickly fashioned a prototype version of Central Search. During development, two major benefits of ajax-solr stood out:
- MVC Pattern – The Model–view–controller (MVC) software architecture, often used for web applications, isolates "domain logic" from the user interface, permitting independent development, testing and maintenance of each. Ajax-solr applied the MVC pattern to Solr result sets within the browser which makes for a clean and easily extensible way of working with Solr result sets (the model) and ajax-solr widgets (the views). It also helped that MVC is an easy to understand pattern.
In summary, we used several standard open source Java components to build the Maven Central search and along the way our team added several new tools to our bag of tricks. We now have a very strong foundation for continuing improvement of Maven Central. We hope you have found the new features useful and we look forward to hearing your feedback at Get Satisfaction (http://getsatisfaction.com/sonatype).