Database Optimization

Find a bAr with the Power of PostGIS

Find a bAr with the Power of PostGIS

by Patryk Pastewski

In this presentation titled "Find a bAr with the Power of PostGIS," Patryk Pastewski explores the utilization of PostGIS—an extension of PostgreSQL meant for managing spatial data—in the context of creating a Ruby on Rails application aimed at helping users find bars and pubs in Hamburg. This introductory talk highlights the intricacies of working with geographic data, especially when considering the Earth's spherical shape rather than a flat plane.

Key points discussed include:

- Introduction to the Problem: Pastewski sets the scene by stating that finding the right bar is not as simple as it seems. He suggests that typical methods like random picks or asking strangers are insufficient and proposes the development of an app using spatial data.
- Spatial Data with PostGIS: He explains how PostGIS enhances PostgreSQL by adding spatial data types like points, lines, and polygons, along with useful functions for distance calculations and spatial relationships.
- Using Active Record with PostGIS: The discussion covers how to define geographical columns in a database and query them using Ruby on Rails, making the data more interactive, such as displaying bars on a map.
- Querying Nearby Bars: Pastewski details how to create queries to find bars within a specific radius from a given point, indicating that additional data context is necessary to enhance user experience.
- Performance Optimization: He emphasizes the importance of indexing for efficient queries, introducing the specialized GiST index that improves spatial data performance.
- Finding the Best District: A sample query is presented to find districts in Hamburg with the highest concentration of bars, demonstrating the use of predicates and joins in complex queries.
- Route Finding with PGRouting: The presentation introduces PGRouting, a PostGIS extension for pathfinding and routing, allowing users to find the best routes to bars using metro networks, illustrating the application of classic graph algorithms like Dijkstra's algorithm.
- Examples of Practical Application: The speaker shares his experience working with location-based data, noting how PostGIS played a crucial role in processing massive datasets in production environments.
- Invitation to Explore: He concludes by encouraging attendees to experiment with these technologies, referencing additional resources such as blog posts and tutorials for deeper understanding.

This talk underlines the functionality and practicality of using PostGIS and Ruby on Rails together to create user-centric applications catering to location-based needs. It encourages further exploration and understanding of spatial data handling through engaging real-world examples.

00:00:15.529 Okay, hello everyone! Actually, this is my first talk ever at a conference, and I came up with this idea just yesterday at a party. I want to share some of my experiences working with graphical data, especially in PostGIS, but I will show examples using Ruby on Rails.
00:00:20.820 As you can see, we are trying to solve a really serious problem here: finding the right bar where we can have fun and some drinks. Let's imagine you are visiting a very beautiful city, like Hamburg, for the first time. One of the best ways to get to know the city is to have some fun, but how do we find the right place to go? It's not easy! Normal people would simply pick a random bar off the street or ask someone for recommendations. But we’re not normal people—we don’t want to rely on random picks, and, of course, we don’t really like talking to strangers. So, what do we do? We can create an application to help us!
00:00:54.839 Let’s load a list of bars and pubs in Hamburg. We have places like the Lion Bar, some football pitches, and others. Unfortunately, that information is not enough to pick the perfect spot. We need to add some location context to measure distances and find the best district to visit. For example, we could observe a party in one park and move to another. Fortunately, it’s very easy to do because we can treat our Earth as a flat plane. We can use XY coordinates and apply some Cartesian geometry, which we learned in school. But wait! What if the Earth is not flat? What if it’s round? I don’t want to hurt anyone’s feelings, but is there anyone from the flat Earth society here? Okay, so the truth is the Earth is not flat.
00:02:06.750 This fact complicates things because linking geometry on a spheroid is not as straightforward. There are thousands of projections, and they depend on where you are on the Planet, which makes it a bit complex. Of course, we can import this data into Ruby and try to calculate distances, but maybe we don’t have to reinvent the wheel. Perhaps there's something that already exists and is production-ready. This is where PostGIS comes in—an extension for PostgreSQL that adds all the good spatial data types to our database. It allows us to work with spatial data just like we would with normal types, like integers or strings.
00:02:44.989 We have simple data types, such as points, line strings, and polygons. Additionally, there are numerous useful functions available. For instance, we can calculate the distance between two geometries or the length of a line string. There are hundreds of functions you can use, all documented, but I’ll highlight the most important ones. There are also predicates that check if two geometries intersect, if one is within another, or if they are within a certain distance. This makes it really nice to use. And since this is a Ruby conference, of course, you can use it with Rails. There’s a massive record PostGIS adapter that helps convert those PostGIS data types to Active Record.
00:03:50.480 These tools allow you to work with geometric data in Ruby, connecting PostgreSQL and Rails. With this, we can create a table and a migration that defines a column using the ST_Point data type, tagging it as a geographic column. Thus, the list of bars gets better because now we can display it to users on a map. You can see there are plenty of bars and pubs in Hamburg, and this data is sourced from OpenStreetMap. Further, we can create a model with a method to find bars within a certain radius from a given geographic point. For example, we can set a point where we are and pass in a distance in meters to find nearby bars.
00:05:27.740 Using the spatial query functions, we can calculate distances between the locations of our bars and the specified point. We check to see if the calculated distance is smaller than the given radius. While this process is quite simple, we need additional data. For example, we might not want to go to just one bar; instead, we may want to find the best district to visit. Therefore, we can create a query to find a district with a substantial number of bars situated within it. Using the predicates at our disposal, we can perform joins based on geographic context and group bars to count how many are in a specific district.
00:07:00.410 This is fantastic because we can determine if one geometry is within a certain radius of another. However, if you have hundreds of thousands of bars or other geometries, performing such queries can be quite slow. When performance issues arise in databases, we typically implement indexing, and this applies here too. It’s important to note that applying a standard SQL index may not yield good results since the data is stored in binary form and doesn’t make sense. That's why PostGIS provides a specialized GiST index, which indexes data based on their bounding boxes.
00:08:06.450 Using this type of index allows us to access a portion of the data very quickly. Now, let's find the district in Hamburg with the greatest number of bars. Can you guess which one it is? To identify a good district, I added the entire metro network of Hamburg to our database. This leads to another excellent PostGIS extension called PGRouting, which functions much like Google Maps. It can find paths to our desired locations, using graph representations of data. We’re all familiar with graph structures, where we have nodes and edges. PGRouting implements classic pathfinding algorithms, including Dijkstra’s algorithm, to find the best route.
00:09:12.600 To find a route from where we are to our chosen location, we first need to identify the closest metro station. This is done by selecting the ID of the nearest metro station and ordering the results by distance to that point. We can use functions that are built to return the coordinates in a specific projection; for example, the EPSG:4326 projection, which is the standard latitude and longitude format. After finding the closest metro station to our initial location, we also perform the same calculation for San Paulo. We then obtain the two station IDs and can utilize the PGRouting function based on Dijkstra’s approach.
00:10:38.060 This requires us to pass the list of edges that represent the segments connecting the metro stations. We calculate the travel cost based on the length of the distance between these stations. Next, we specify from which node we wish to travel to which other node, and we print the resulting data. The output may not look glamorous, but it provides the cost of each segment, detailing the various segments we must traverse to reach our destination. Additionally, we can see the total aggregated cost, which could be in meters.
00:12:17.720 And just like that, we have found the perfect route! It’s effectively a three-metro-line journey that works almost like Google Maps. You can create an app with this functionality, but keep in mind that it does not yet incorporate street routing. However, you can also implement that by importing data from the OpenStreetMap database if desired.
00:13:13.720 By advancing the cost function to include traffic conditions or road types, you can refine your application even further. I wanted to share this simple example, illustrating how quickly you can achieve this in just a few minutes. So, that is what I wanted to show you today! I encourage all of you to experiment with these technologies. It is fascinating to see how we can interact with data from completely different perspectives.
00:13:44.700 I learned a lot of this in production while working for a location-based company. We were processing hundreds of thousands of location points, and PostGIS was instrumental in assisting us with that. If you’re interested in learning more about this topic, I have written a couple of blog posts, which are tutorials available on Medium, and you can also find more examples on my GitHub. Thank you very much!
00:15:07.580 Any questions?
00:15:40.820 There is a software called QGIS, which allows you to connect to PostGIS databases and other data sources to generate visualizations. The example graphics I showed were produced using the Overpass API, which extracts data from OpenStreetMap's database.
00:15:52.630 However, I should mention that querying OpenStreetMap can be quite complex due to non-standardized naming conventions for metro systems in different cities. For extraction of specific data for a city, it can be easier to work with certain types of data, such as bicycle routes or drinking water points, since the system is open source.
00:16:01.740 This means anyone can add new points of interest, and the data gets updated very quickly. Any more questions?