RailsConf 2016
Multi-Table Full Text Search with Postgres

Multi-Table Full Text Search with Postgres

by Caleb Thompson

In the video "Multi-Table Full Text Search with Postgres," presented by Caleb Thompson at RailsConf 2016, the focus is on leveraging Postgres to build efficient search functionalities within applications without relying on external search services. The session explores a practical feature evolution that begins with a simple substring search and evolves into a complex full-text search solution, utilizing database views for better organization and performance optimization.

Key Points Discussed:
- Introduction to Search in Applications:
- Caleb begins by advocating for using the database as the primary search tool rather than external services, highlighting Postgres' capabilities.

  • Basic Search Techniques:

    • Discusses initial search methods using SQL's LIKE clause for basic substring searches.
    • Emphasizes the limitations of this approach for users who lack exact search terms or knowledge of casing.
  • Introduction of Full Text Search:

    • Describes the advantages of full-text search, including natural language processing features that refine search results by removing stop words and recognizing various forms of words (stemming).
  • Database Views for Efficient Searches:

    • Explains how to create database views that aggregate search queries from multiple tables, simplifying complex searches and enhancing performance.
    • Demonstrates SQL's CREATE VIEW syntax and its integration with ActiveRecord in Rails.
  • Use of Gems for Enhanced Search Capabilities:

    • Introduces Textacular, a gem that simplifies full-text search implementation in Rails applications. Claims that it supports a variety of search functionalities, which could be expanded as application needs grow.
  • Managing Changes in Views:

    • Discusses the importance of managing database views through migrations to keep the application’s data stable and updated.
  • Future Feature Expansions:

    • Illustrates how to adapt search mechanisms to include comments associated with articles, indicating a continuous evolution of search features.

Conclusions and Takeaways:

- The video reinforces the notion that with the right tools offered by Postgres, developers can implement robust and efficient search functionalities directly within their applications. By encapsulating complex logic in database views and leveraging libraries like Textacular, searching can be more intuitive for users and manageable for developers. This approach minimizes the need for extra dependencies, making maintenance and testing simpler.

00:00:10.379 Today, we're going to talk about a somewhat non-standard approach to search in your applications. Often, search is performed using an external service. However, instead of doing that, we're going to use the tool that's already responsible for storing and finding data. Welcome to multi-table full text search in Postgres. I'm Caleb Thompson, and we are not doing a Wednesday hug. You can find me at these places on the internet; feel free to tweet at me during the talk. I love that! But if you're not tweeting, please close your laptops. There will be a lot of code shown here, and it's going to be hard to get anything out of this talk if you're not paying attention.
00:00:21.750 Now that I've told you to close your laptops, I don't like to jump into credentials at the beginning of a talk. You're already here, and that's sort of the point. We had the abstract and the bio, so you'll hear a little more about me at the end. But I do have one claim to fame: my birthday is mentioned in the Simple Delegator documentation, so you know I'm an expert!
00:00:38.170 I'm going to talk to you about a real-life feature. We will iterate, explore other options, and optimize, just like we do when developing a feature in our applications. We will discuss full-text search: what it is, how it can help us. Hopefully, you could have guessed that from the title. We'll cover views, but not the HTML views. We will discuss database views. Naturally, we want to explore the performance implications of everything we're doing. We're going to look at some of those performance implications and how to mitigate them. Additionally, we will talk about materialized views as one way to improve performance. We'll examine a couple of gems that can assist us with this fun stuff while also considering some of the other options available.
00:01:38.919 Alright, let's look at the classic example: let's search for articles in our application. The simplest thing we could possibly do is to search for substrings. Here, we have articles where the body includes specific substrings. This approach works if your users know exactly what they're searching for. For instance, if they want to find an article that includes the word 'book' in it and they know that it's going to be lowercase somewhere in the middle of a sentence, they can search like this. This method, however, is limited. A small step forward we could take is to perform case-insensitive searches using the 'LIKE' clause in SQL. While it's not perfect, it does improve upon the basic substring search.
00:02:57.549 Let’s leave that for now since it sort of gets the job done. Of course, as features evolve, we need to expand our search capabilities. We probably should have seen that coming. When searching, we also want to consider titles of our articles. To accomplish this, we will extend our previous method by passing in the query twice instead of once and still use case-insensitive 'LIKE' with the '%' wildcard character to ensure it captures substrings anywhere in the text. Now, we might want to search by the author's name, which complicates things further. But it's something we should have anticipated.
00:04:31.449 Let’s join onto our user model. In this talk, we will use the user and authors tables interchangeably. Essentially, this is the same query, but we let Rails handle the join, pulling out the user's name along with the article's title and body. However, we soon discover that our query is starting to become cumbersome, which is undesirable in our codebase. One way to refactor is to encapsulate the logic in a query object. This method may be less performant, but it arguably makes the code easier to understand, especially when you're reviewing it. However, we still observe poor results since we're only searching for case-insensitive substrings.
00:05:33.910 What if the word is plural and we have a singular query? Or what if we're searching for different forms of the same word? Users expect our search features to handle these nuances, similar to what Google does. This is where full text search comes in. Full text search allows us to query rows based on natural language searching. Natural language searching removes stop words from queries—these are words that appear in nearly every sentence and don’t add semantic value.
00:06:16.600 For example, we don't want search results that include the word 'and' or 'or.' We seek to provide more relevant results. We also aim to handle casing appropriately and include synonyms in our searches, allowing users with varying vocabularies to find what they need. Additionally, stemming is a feature of natural language search. This process involves recognizing different forms of the same root words, such as 'try,' 'trying,' and 'tries,' which could return similar results in our searches. Now, let's explore making that same query with these natural language features enabled.
00:07:40.580 We'll examine some important pieces in the code as we integrate these features. We're going to look at the text that we want to join, using the title and concatenating it with an empty space followed by the body. We will call this concatenated text simply 'text' because we need to provide it a name for Postgres to work correctly. Next, we will pull in the author's name alongside the article's ID. It's important to ensure that we return unique results because we don’t want the same article appearing multiple times in our results if it is repeated in different fields.
00:08:44.000 Where do we put all of this code? We could integrate it back into our query object or simply use a scope to pass the SQL through. The challenge here is that SQL doesn't really belong in our Ruby files. Happy news: Postgres has a solution for us in the form of views. A view is a stored query that can return a set of columns to search on later. Since views are effectively just queries, they can pull from multiple source tables, making it simpler to manage complex searches.
00:09:16.630 For example, if we were to create a view for users/authors and articles, we could streamline our queries significantly. In this view, we will abstract away the complexity, focusing on just the relevant text, allowing us to perform meaningful 'WHERE' clauses later on. Building a view is relatively straightforward, as we utilize the 'CREATE VIEW' syntax. We can select distinct user IDs and gather comprehensive information about all users along with their last active timestamps. Our view can limit the selection to users who have been active within the last week.
00:10:45.450 With views, searching feels much like querying a regular table. We can use Rails’ ActiveRecord to interact with our views as if they were standard tables in our database. We can create a model to represent the active users, facilitate querying, and enhance our overall experience. This allows us to continue using our Rails applications seamlessly while taking advantage of the power of full-text search without introducing unnecessary complexity.
00:11:58.170 Our first gem of interest is Textacular. Developed by Aaron Patterson, it manages the full-text search aspects of our queries. Textacular presumes you want full-text search across all text fields in your records, providing several useful variant search options, including basic, fuzzy, and advanced search functionalities. Notably, we will focus on the basic search feature as it offers the most utility when building a simplified input for users.
00:12:28.310 Let's dive back and analyze that search we wrote. It’s designed to return articles based on their title, body, or the author's name. For our Rails implementation, we can develop a simple three-line class that includes Textacular, establishing a connection to our article through the designated field names. When searching, we simply map our query against the articles, enabling a simple search for terms like 'Sandy.' It’s important to note that we also have access to Enumerable, a powerful Ruby standard library feature that enhances our ability to work with collections elegantly.
00:14:04.350 Creating views is straightforward, but meticulously managing updates becomes essential. Your migration file would execute the SQL you need for creation, including the necessary DROP commands for backtracking, should that be required. Understanding this can be critical in view management because if you need to update a view while ensuring the database returns to a stable state afterward, the migration process can be tedious.
00:15:13.040 As features often evolve, a product manager might request additional capabilities. For example, comments associated with articles may need to be factored into search results. Now, we are searching through article titles, bodies, author names, and comment bodies, aggregating all relevant content. We would update our existing SQL command to include these additional parameters.