00:00:10.379
Today, we're going to talk about a somewhat non-standard approach to search in your applications. Often, search is performed using an external service. However, instead of doing that, we're going to use the tool that's already responsible for storing and finding data. Welcome to multi-table full text search in Postgres. I'm Caleb Thompson, and we are not doing a Wednesday hug. You can find me at these places on the internet; feel free to tweet at me during the talk. I love that! But if you're not tweeting, please close your laptops. There will be a lot of code shown here, and it's going to be hard to get anything out of this talk if you're not paying attention.
00:00:21.750
Now that I've told you to close your laptops, I don't like to jump into credentials at the beginning of a talk. You're already here, and that's sort of the point. We had the abstract and the bio, so you'll hear a little more about me at the end. But I do have one claim to fame: my birthday is mentioned in the Simple Delegator documentation, so you know I'm an expert!
00:00:38.170
I'm going to talk to you about a real-life feature. We will iterate, explore other options, and optimize, just like we do when developing a feature in our applications. We will discuss full-text search: what it is, how it can help us. Hopefully, you could have guessed that from the title. We'll cover views, but not the HTML views. We will discuss database views. Naturally, we want to explore the performance implications of everything we're doing. We're going to look at some of those performance implications and how to mitigate them. Additionally, we will talk about materialized views as one way to improve performance. We'll examine a couple of gems that can assist us with this fun stuff while also considering some of the other options available.
00:01:38.919
Alright, let's look at the classic example: let's search for articles in our application. The simplest thing we could possibly do is to search for substrings. Here, we have articles where the body includes specific substrings. This approach works if your users know exactly what they're searching for. For instance, if they want to find an article that includes the word 'book' in it and they know that it's going to be lowercase somewhere in the middle of a sentence, they can search like this. This method, however, is limited. A small step forward we could take is to perform case-insensitive searches using the 'LIKE' clause in SQL. While it's not perfect, it does improve upon the basic substring search.
00:02:57.549
Let’s leave that for now since it sort of gets the job done. Of course, as features evolve, we need to expand our search capabilities. We probably should have seen that coming. When searching, we also want to consider titles of our articles. To accomplish this, we will extend our previous method by passing in the query twice instead of once and still use case-insensitive 'LIKE' with the '%' wildcard character to ensure it captures substrings anywhere in the text. Now, we might want to search by the author's name, which complicates things further. But it's something we should have anticipated.
00:04:31.449
Let’s join onto our user model. In this talk, we will use the user and authors tables interchangeably. Essentially, this is the same query, but we let Rails handle the join, pulling out the user's name along with the article's title and body. However, we soon discover that our query is starting to become cumbersome, which is undesirable in our codebase. One way to refactor is to encapsulate the logic in a query object. This method may be less performant, but it arguably makes the code easier to understand, especially when you're reviewing it. However, we still observe poor results since we're only searching for case-insensitive substrings.
00:05:33.910
What if the word is plural and we have a singular query? Or what if we're searching for different forms of the same word? Users expect our search features to handle these nuances, similar to what Google does. This is where full text search comes in. Full text search allows us to query rows based on natural language searching. Natural language searching removes stop words from queries—these are words that appear in nearly every sentence and don’t add semantic value.
00:06:16.600
For example, we don't want search results that include the word 'and' or 'or.' We seek to provide more relevant results. We also aim to handle casing appropriately and include synonyms in our searches, allowing users with varying vocabularies to find what they need. Additionally, stemming is a feature of natural language search. This process involves recognizing different forms of the same root words, such as 'try,' 'trying,' and 'tries,' which could return similar results in our searches. Now, let's explore making that same query with these natural language features enabled.
00:07:40.580
We'll examine some important pieces in the code as we integrate these features. We're going to look at the text that we want to join, using the title and concatenating it with an empty space followed by the body. We will call this concatenated text simply 'text' because we need to provide it a name for Postgres to work correctly. Next, we will pull in the author's name alongside the article's ID. It's important to ensure that we return unique results because we don’t want the same article appearing multiple times in our results if it is repeated in different fields.
00:08:44.000
Where do we put all of this code? We could integrate it back into our query object or simply use a scope to pass the SQL through. The challenge here is that SQL doesn't really belong in our Ruby files. Happy news: Postgres has a solution for us in the form of views. A view is a stored query that can return a set of columns to search on later. Since views are effectively just queries, they can pull from multiple source tables, making it simpler to manage complex searches.
00:09:16.630
For example, if we were to create a view for users/authors and articles, we could streamline our queries significantly. In this view, we will abstract away the complexity, focusing on just the relevant text, allowing us to perform meaningful 'WHERE' clauses later on. Building a view is relatively straightforward, as we utilize the 'CREATE VIEW' syntax. We can select distinct user IDs and gather comprehensive information about all users along with their last active timestamps. Our view can limit the selection to users who have been active within the last week.
00:10:45.450
With views, searching feels much like querying a regular table. We can use Rails’ ActiveRecord to interact with our views as if they were standard tables in our database. We can create a model to represent the active users, facilitate querying, and enhance our overall experience. This allows us to continue using our Rails applications seamlessly while taking advantage of the power of full-text search without introducing unnecessary complexity.
00:11:58.170
Our first gem of interest is Textacular. Developed by Aaron Patterson, it manages the full-text search aspects of our queries. Textacular presumes you want full-text search across all text fields in your records, providing several useful variant search options, including basic, fuzzy, and advanced search functionalities. Notably, we will focus on the basic search feature as it offers the most utility when building a simplified input for users.
00:12:28.310
Let's dive back and analyze that search we wrote. It’s designed to return articles based on their title, body, or the author's name. For our Rails implementation, we can develop a simple three-line class that includes Textacular, establishing a connection to our article through the designated field names. When searching, we simply map our query against the articles, enabling a simple search for terms like 'Sandy.' It’s important to note that we also have access to Enumerable, a powerful Ruby standard library feature that enhances our ability to work with collections elegantly.
00:14:04.350
Creating views is straightforward, but meticulously managing updates becomes essential. Your migration file would execute the SQL you need for creation, including the necessary DROP commands for backtracking, should that be required. Understanding this can be critical in view management because if you need to update a view while ensuring the database returns to a stable state afterward, the migration process can be tedious.
00:15:13.040
As features often evolve, a product manager might request additional capabilities. For example, comments associated with articles may need to be factored into search results. Now, we are searching through article titles, bodies, author names, and comment bodies, aggregating all relevant content. We would update our existing SQL command to include these additional parameters.