00:00:15.529
Okay, hello everyone! Actually, this is my first talk ever at a conference, and I came up with this idea just yesterday at a party. I want to share some of my experiences working with graphical data, especially in PostGIS, but I will show examples using Ruby on Rails.
00:00:20.820
As you can see, we are trying to solve a really serious problem here: finding the right bar where we can have fun and some drinks. Let's imagine you are visiting a very beautiful city, like Hamburg, for the first time. One of the best ways to get to know the city is to have some fun, but how do we find the right place to go? It's not easy! Normal people would simply pick a random bar off the street or ask someone for recommendations. But we’re not normal people—we don’t want to rely on random picks, and, of course, we don’t really like talking to strangers. So, what do we do? We can create an application to help us!
00:00:54.839
Let’s load a list of bars and pubs in Hamburg. We have places like the Lion Bar, some football pitches, and others. Unfortunately, that information is not enough to pick the perfect spot. We need to add some location context to measure distances and find the best district to visit. For example, we could observe a party in one park and move to another. Fortunately, it’s very easy to do because we can treat our Earth as a flat plane. We can use XY coordinates and apply some Cartesian geometry, which we learned in school. But wait! What if the Earth is not flat? What if it’s round? I don’t want to hurt anyone’s feelings, but is there anyone from the flat Earth society here? Okay, so the truth is the Earth is not flat.
00:02:06.750
This fact complicates things because linking geometry on a spheroid is not as straightforward. There are thousands of projections, and they depend on where you are on the Planet, which makes it a bit complex. Of course, we can import this data into Ruby and try to calculate distances, but maybe we don’t have to reinvent the wheel. Perhaps there's something that already exists and is production-ready. This is where PostGIS comes in—an extension for PostgreSQL that adds all the good spatial data types to our database. It allows us to work with spatial data just like we would with normal types, like integers or strings.
00:02:44.989
We have simple data types, such as points, line strings, and polygons. Additionally, there are numerous useful functions available. For instance, we can calculate the distance between two geometries or the length of a line string. There are hundreds of functions you can use, all documented, but I’ll highlight the most important ones. There are also predicates that check if two geometries intersect, if one is within another, or if they are within a certain distance. This makes it really nice to use. And since this is a Ruby conference, of course, you can use it with Rails. There’s a massive record PostGIS adapter that helps convert those PostGIS data types to Active Record.
00:03:50.480
These tools allow you to work with geometric data in Ruby, connecting PostgreSQL and Rails. With this, we can create a table and a migration that defines a column using the ST_Point data type, tagging it as a geographic column. Thus, the list of bars gets better because now we can display it to users on a map. You can see there are plenty of bars and pubs in Hamburg, and this data is sourced from OpenStreetMap. Further, we can create a model with a method to find bars within a certain radius from a given geographic point. For example, we can set a point where we are and pass in a distance in meters to find nearby bars.
00:05:27.740
Using the spatial query functions, we can calculate distances between the locations of our bars and the specified point. We check to see if the calculated distance is smaller than the given radius. While this process is quite simple, we need additional data. For example, we might not want to go to just one bar; instead, we may want to find the best district to visit. Therefore, we can create a query to find a district with a substantial number of bars situated within it. Using the predicates at our disposal, we can perform joins based on geographic context and group bars to count how many are in a specific district.
00:07:00.410
This is fantastic because we can determine if one geometry is within a certain radius of another. However, if you have hundreds of thousands of bars or other geometries, performing such queries can be quite slow. When performance issues arise in databases, we typically implement indexing, and this applies here too. It’s important to note that applying a standard SQL index may not yield good results since the data is stored in binary form and doesn’t make sense. That's why PostGIS provides a specialized GiST index, which indexes data based on their bounding boxes.
00:08:06.450
Using this type of index allows us to access a portion of the data very quickly. Now, let's find the district in Hamburg with the greatest number of bars. Can you guess which one it is? To identify a good district, I added the entire metro network of Hamburg to our database. This leads to another excellent PostGIS extension called PGRouting, which functions much like Google Maps. It can find paths to our desired locations, using graph representations of data. We’re all familiar with graph structures, where we have nodes and edges. PGRouting implements classic pathfinding algorithms, including Dijkstra’s algorithm, to find the best route.
00:09:12.600
To find a route from where we are to our chosen location, we first need to identify the closest metro station. This is done by selecting the ID of the nearest metro station and ordering the results by distance to that point. We can use functions that are built to return the coordinates in a specific projection; for example, the EPSG:4326 projection, which is the standard latitude and longitude format. After finding the closest metro station to our initial location, we also perform the same calculation for San Paulo. We then obtain the two station IDs and can utilize the PGRouting function based on Dijkstra’s approach.
00:10:38.060
This requires us to pass the list of edges that represent the segments connecting the metro stations. We calculate the travel cost based on the length of the distance between these stations. Next, we specify from which node we wish to travel to which other node, and we print the resulting data. The output may not look glamorous, but it provides the cost of each segment, detailing the various segments we must traverse to reach our destination. Additionally, we can see the total aggregated cost, which could be in meters.
00:12:17.720
And just like that, we have found the perfect route! It’s effectively a three-metro-line journey that works almost like Google Maps. You can create an app with this functionality, but keep in mind that it does not yet incorporate street routing. However, you can also implement that by importing data from the OpenStreetMap database if desired.
00:13:13.720
By advancing the cost function to include traffic conditions or road types, you can refine your application even further. I wanted to share this simple example, illustrating how quickly you can achieve this in just a few minutes. So, that is what I wanted to show you today! I encourage all of you to experiment with these technologies. It is fascinating to see how we can interact with data from completely different perspectives.
00:13:44.700
I learned a lot of this in production while working for a location-based company. We were processing hundreds of thousands of location points, and PostGIS was instrumental in assisting us with that. If you’re interested in learning more about this topic, I have written a couple of blog posts, which are tutorials available on Medium, and you can also find more examples on my GitHub. Thank you very much!
00:15:07.580
Any questions?
00:15:40.820
There is a software called QGIS, which allows you to connect to PostGIS databases and other data sources to generate visualizations. The example graphics I showed were produced using the Overpass API, which extracts data from OpenStreetMap's database.
00:15:52.630
However, I should mention that querying OpenStreetMap can be quite complex due to non-standardized naming conventions for metro systems in different cities. For extraction of specific data for a city, it can be easier to work with certain types of data, such as bicycle routes or drinking water points, since the system is open source.
00:16:01.740
This means anyone can add new points of interest, and the data gets updated very quickly. Any more questions?