RailsConf 2013

Using Elasticsearch with Rails Apps

Using Elasticsearch with Rails Apps

by Brian Gugliemetti

The video titled "Using Elasticsearch with Rails Apps" presented by Brian Gugliemetti at Rails Conf 2013 provides a comprehensive overview of integrating Elasticsearch, an open-source search engine, into Ruby on Rails applications. The presentation aims to educate viewers on Elasticsearch's fundamental concepts, practical examples of queries and filters, and tools for managing the system efficiently.

Key Points Discussed:
- What is Elasticsearch:
- An open-source, distributed, RESTful search engine.
- Built on Apache Lucene, facilitating clustering, failover, and discovery with a schemaless JSON document store.
- Real-World Application:
- Gugliemetti shares how Spiceworks uses Elasticsearch to enhance product research tools for IT professionals, emphasizing features like autocomplete and complex filtering.
- Basic Terms and Concepts:
- Explains nodes, shards (primary and replica), indices, document types, and document IDs within Elasticsearch, highlighting their importance in configuring performance.
- Setup and Integration with Rails:
- Guidance on how to install and run Elasticsearch, including basic curl commands to interact with the system.
- Introduction to the Tire gem, which simplifies integrating Elasticsearch with ActiveRecord models in Rails applications.
- Procedures to index existing models and handle data efficiently using active record callbacks.
- Creating Complex Queries:
- Demonstrates how to create autocomplete functionality and filtering with real-time results, leveraging Elasticsearch's querying capabilities.
- Discusses advanced queries—including pagination, facets, filters, and how to offer relevant search results based on user criteria.
- Performance and Scalability:
- Introduces monitoring tools and plugins, like Big Desk and Paramedic, to track cluster health and performance metrics.
- Explains how to manage replicas, scale clusters, and when to create new indices for growing data needs.
- Conclusion:
- Highlights the flexibility and efficiency of Elasticsearch in improving search functionalities in Rails applications while encouraging developers to experiment with its capabilities for optimal performance.

Overall, Gugliemetti emphasizes the necessity of adapting efficient search practices to cater to user needs and the benefits of utilizing Elasticsearch in modern web applications.

00:00:16.840 Happy Mayday, everyone! Here in the US, welcome to the presentation on using Elasticsearch with Rails applications. My name is Brian Gugliemetti, and I work for Spiceworks. We're also hiring in Austin, Texas, so come see me.
00:00:23.640 In this talk, I have a few goals I'd like to accomplish. One is to teach you the Elasticsearch terms and concepts, and how you can use Elasticsearch within Rails. I will give you examples of queries and filters, and what you can do with Elasticsearch. Finally, I'll provide you with tools to manage and tune Elasticsearch for your environment.
00:00:41.000 So, what is Elasticsearch? It's an open-source distributed RESTful search engine built on top of Lucene. It provides clustering, failover, and auto-discovery. It's a schema-less document store that deals in JSON on documents. By a show of hands, how many people do some sort of full-text search today? Awesome! And how many are doing that in a database? Oh good, not too many! How many are using Elasticsearch already? Awesome!
00:01:02.280 I'll give a little background on how Spiceworks decided to use Elasticsearch. We have a big focus on helping our users with product research, catering to many IT professionals in small- to medium-sized businesses. We want to give them tools to learn about products, do some research, and figure out what data they need to make a purchase. Along those lines, we wanted to offer auto-complete when searching for products to provide suggestions. If you look at a specific product group, we wanted the ability to filter and narrow down results. For example, if someone is looking for networking equipment, they might care about specific manufacturers, price range, and so forth.
00:01:34.479 As time went on and we became more familiar with Elasticsearch, we wanted to replace the inefficient searches we were doing in our database. We use PostgreSQL, which has a full-text search engine, known as a reverse index. However, it’s not as efficient as Elasticsearch. Additionally, our traffic has grown immensely over the past couple of years, hitting the upper bounds of our free site searches. This prompted us to bring the search in-house, not only to reduce costs but also to add additional features.
00:02:24.720 Now, I want to go over a few Elasticsearch terms. The primary thing to understand is a node, which is an instance of Elasticsearch that belongs to a cluster. You can have a single standalone instance or multiple instances in a cluster. Next, we have shards, which are the primary data partitions within a node. There are two types of shards: primary shards and replica shards. You specify the number of primary shards at index creation time, and this number affects performance; more primary shards generally mean faster indexing of documents.
00:03:06.319 Replica shards, which are copies of primary shards, help improve performance and provide failover capabilities. An index in Elasticsearch is a top-level data partition, and multiple shards can exist within an index across various nodes. Document types are analogous to database tables within an index, and each document is identified by a unique document ID. Typically, when working with Rails, your ID column will serve as the document ID in the Elasticsearch index, but Elasticsearch can generate it if necessary.
00:04:08.040 Now, I want to show you how easy it is to get Elasticsearch running and start playing with it. We're using a browser to demonstrate this. First, I downloaded Elasticsearch, unzipped it, and entered that directory. You start Elasticsearch by running the command 'bin/elasticsearch', and it should be up shortly. If we go to the browser and check the endpoint, we can see that it's running and get information like the version and node name. Currently, there's nothing in Elasticsearch since it's a fresh install, but there are endpoints available to retrieve data.
00:05:10.320 Let’s load some data. I created some simple JSON documents based on this talk. From here, we can load this data into Elasticsearch using curl. You specify the index name you want to use and what file contains the JSON data. The response confirms whether the document was indexed successfully. We’ve successfully loaded a couple of documents. If we check the stats again, we can see the documents are in the index, and we have some statistics on the size and the time it took to index them.
00:06:30.080 Now that we have some documents, let’s search for them. Elasticsearch exposes a search endpoint where we can search for keywords like ‘cloud,’ and we see documents that relate to it. That’s a basic example of how to get started and get documents indexed quickly. I have extracted all the talks from the conference and let's check how many were about the cloud this year—we found five results. This demonstrates how quickly you can get Elasticsearch up and running.
00:07:56.640 Now, let’s discuss how to use Elasticsearch within a Rails app. A popular gem called Tire has been around for a while and has progressed significantly in the past year or so, making it easy to integrate with. If you're adding this to a new project, you'll edit your Gemfile, add tire, run bundle install, and then you need to include a few modules in the ActiveRecord models you want to index. These modules are Tire::Model::Search and Tire::Model::Callbacks. We will touch on what those do shortly.
00:09:12.519 You can import data into Elasticsearch using a rake command or directly from the Rails console. I have a forum application where I have test data ready. I added the required modules to the Topic model and will now call topic.import. This will batch-load all the topic entries into Elasticsearch, and we have about 9,000 documents being indexed.
00:10:24.959 Once the import is complete, we can check the statistics for the topics index in our browser. We loaded 9,606 documents, and we can perform searches on this data. For example, searching for 'Ruby' shows us a couple of relevant topics. Next, let’s discuss the modules we included earlier. Tire::Model::Search exposes the query DSL for interacting with Elasticsearch and allows you to define index settings, including the number of shards and replica settings.
00:11:44.880 It also defines the data types within the index. Mappings are important for ensuring Elasticsearch understands how to handle the data you import. You can define analyzers, which determine how Elasticsearch indexes a given field, and boost settings that increase the score for specific fields when performing searches.
00:12:55.960 For example, if you have a sentence and you want to index it, Elasticsearch will break it down using a tokenizer that removes stop words and punctuation and converts everything to lowercase. This process is customizable per field. Now, let's show something cool with an autocomplete example we're working on in the Topic model. This involves defining a method using the Tire DSL to search on the subject for a given term.
00:14:07.360 In the controller, we expose an endpoint for autocomplete, which matches the subject against user input and returns results. For the frontend, we utilize jQuery's autocomplete plugin to handle user input, connecting it all together for a smooth experience. For instance, searching for ‘spice works’ gives us relevant topics linked to it.
00:15:01.440 Now let’s get a bit more advanced with our queries. We can enhance our queries to match specific criteria, such as requiring results from a particular form, or restricting topics to those created in a specific month, like February. Additionally, we can ask Elasticsearch to highlight matched terms in the subject, providing better feedback.
00:16:01.079 These aspects illustrate how powerful and quickly you can set things up with Elasticsearch, making querying and result filtering easy. For paging through results, you can specify how many results to return and from which page, as is standard in search functionality. You also have facets and filters, which allow you to refine searches based on product attributes like vendor, manufacturer, and pricing.
00:17:29.520 For example, in a test harness I created that includes various product catalogs, I can filter results by vendor and manufacturer. You can also combine multiple filters to narrow down results further, aiding users as they search through large datasets. The syntax for defining these filters is straightforward and aligns with common needs across application development.
00:18:58.080 There are also Boolean search capabilities, allowing you to create complex queries that combine must, should, and must-not clauses to refine results. Tire DSL supports many default queries and filters, but more intricate queries handled by ElasticSearch's native API can still be constructed using JSON documents, which allows for great flexibility.
00:20:07.120 As you scale your application, keep in mind that utilizing a multi-node Elasticsearch setup requires load balancing solutions since the default setup only queries a single node. This is important for failure-resistant architectures. One common approach is to create client-only nodes that do not store data but proxy queries across the cluster, allowing for more balanced search workloads.
00:21:54.880 Monitoring your Elasticsearch set up can be achieved through various plugins such as Paramedic and Big Desk, which provide insights into the performance metrics of your cluster, including JVM stats and request rates. These plugins allow you to track index behavior and resource usage, informing your optimization strategies.
00:23:06.960 When creating indexes, be mindful that the number of primary shards you define limits the maximum number of nodes in your Elasticsearch cluster. Changing the number of primary shards is not possible once the index is created, necessitating a careful analysis of your future workload when defining these values initially.
00:24:17.360 Increasing replicas is advantageous for read-heavy workloads, as it allows search requests to be distributed across multiple nodes post-creation. Recently, Elasticsearch introduced features allowing you to direct queries and index data strategically across nodes, optimizing search performance.
00:25:28.480 As we wrap up, I want to mention that Elasticsearch 0.90 was released recently, bringing Lucene 4's benefits, and improvements in document balancing with better shard management. Tomorrow at 9:00 a.m., there's a webinar dedicated to the new features of Elasticsearch. Thank you for your time, and I’m open to any questions you may have.
00:27:17.360 Thank you for your time.