Exploring Graphs with Ruby

Balkan Ruby 2019

00:00:26.630 Now, after the previous talk, it can be difficult to stay optimistic, but I believe that to improve things, the first step is to recognize that they are not as good as they could be. So, I’m trying to be optimistic. I will talk about graphs with Ruby.

00:00:42.360 First, I would like to introduce myself and share what I know about graphs. My name is Eli Kroumova. I was born in Bulgaria, but I now live and work in Spain. I work at a very small company based in Bilbao, which is located in the north of Spain. We develop various products for clients.

00:01:10.020 We work closely with our clients, discussing the problems they face that cannot be solved with more generic solutions. We aim to develop applications that will solve their specific problems. Our process involves communicating with the client to identify the issues, creating proposals, making adjustments to fit their needs, and then installing and maintaining the product in production.

00:01:45.119 It’s an interesting process that can often be enjoyable, but not always. One of our clients is the International Union for Crystallography. I imagine most of you may not have heard of it before. This work connects to my previous experience at the university, as I am a physicist. I have been working in the fields of material science and solid-state physics.

00:02:07.829 We have some projects in collaboration with this union. One of the projects involved developing a website that allows users of their products to access information that is typically found in extensive printed volumes, such as large tables. Our goal was to make this information available online, and we also included simple calculations to ease the scientists' ability to find what they need.

00:02:53.630 As the project evolved, we needed to include more complicated calculations, including those related to graphs. Fortunately, the graphs we work with aren't very large, making them manageable with Ruby. We know that the maximum number of nodes in our graphs is about 230, so we can handle them effectively. Initially, we started by reversing the graphs to extract some insights, and last year, we developed an algorithm for finding the shortest paths within these graphs.

00:03:41.950 That’s when I began digging into graph algorithms, implementing some of them, and trying to understand how they work. Today, I will provide a gentle introduction to graphs. I won’t delve into material science or crystallography, but I have prepared simple examples to illustrate how graphs can be used, along with useful data structures for representing graph data, and some graph algorithms.

00:04:43.660 When I proposed this talk, I initially thought about discussing various graph algorithms with numerous examples. However, the subject is extensive, and it is impossible to cover everything in one session. Therefore, I will focus solely on graph traversal algorithms, as they form the foundation for other algorithms useful for addressing various problems.

00:05:10.050 So, what is a graph? By definition, a graph is a way to represent pairwise relationships between objects. These objects can be anything, and the important aspect of the graph is not only the objects themselves but the relationships between them.

00:05:34.120 Historically speaking, the first problem solved using graphs was Euler's Seven Bridges of Königsberg problem, which dates back to 1736. At the time, the term 'graph' did not yet exist. It was first used in 1878 as a mathematical concept, but lacked practical applications. The first textbook on graph theory was published in 1936, but it remained a niche topic until the definitive study on graph theory was released in 1969. From that moment, scientists across various disciplines began using graph theory to address real-world problems.

00:06:12.759 As I mentioned, in a graph, the objects that make up your system are important, as are the relationships between them. The terminology used in this talk will define the related objects as vertices of the graph and the relationships between them as edges. Thus, we will refer to the objects as vertices and the relationships as edges, which can be directed, indicating a one-way relationship.

00:06:29.810 What makes graphs particularly significant is that we can gather information not only about the objects but also about their relationships. For instance, if vertex A causes vertex B, but not vice versa, we can represent that in a graph. Additionally, we can identify connections between vertices that are not directly related, providing context for all objects within the graph.

00:06:49.180 Now that we understand what a graph is, how can we represent it effectively? When discussing general graphs, there are various data structures available. One simple and intuitive representation is the adjacency matrix, which is a matrix of rows and columns that correspond to vertices. Each position in the matrix indicates whether there is a relationship between two vertices, marked with either 1 (yes) or 0 (no).

00:08:00.100 From this matrix, you can ascertain the number of relationships each vertex has. If certain vertices are heavily connected, while others have few connections, you can also see that the adjacency matrix can consume a lot of storage, especially when many entries are zero. Another way to represent graphs is to use a row for each vertex and columns for the edges. If you are dealing with graphs where there are many vertices but few connections, this method may require less memory.

00:08:58.750 Alternatively, you can create a list of edges, indicating where each edge starts and ends. This representation depends on the nature of your data and whether it meets your needs. Another method is one where each vertex is assigned a list of its neighboring vertices, which can be particularly useful for dynamically changing graphs. However, if the algorithm you are applying requires fast access to edges, this representation may not be appropriate.

00:09:41.920 In recent years, graphs have gained popularity, particularly in conjunction with machine learning and big data. For these applications, standard data structures might fall short, warranting the use of graph databases. In these databases, vertices are treated as first-class citizens, making access to information much easier, which is crucial for effective graph traversal. These databases also allow for adaptive data models, as we often cannot predict the data we will collect.

00:10:06.710 Today, I will showcase simple examples where graph databases are unnecessary, illustrating that you can still play with graphs using Ruby. While it may not be production-ready, understanding how these systems work can be enjoyable. Sometimes, you find problems that are solvable with graphs, although writing your own algorithms may not be necessary, as ample libraries are available.

00:10:41.930 Understanding how these algorithms operate is essential for determining the right one for your situation, given the plethora of options available that can make it challenging to choose appropriately. To build a class for a graph, we might start by outlining a simple structure that allows us to manage vertices and edges, providing initializers, and including methods to add edges and find neighboring vertices.

00:11:07.840 Using a JSON-like adjacency matrix representation, where rows and columns depict vertices and each matrix element indicates the presence of an edge (1) or not (0), we establish a foundation. The Ruby code I will demonstrate isn’t particularly optimized. Rather, I aim for clarity and understanding of the concepts at hand.

00:11:40.870 With this setup, we can begin discussing graph algorithms. During the talk, I'll focus solely on graph traversal algorithms due to time constraints and their foundational importance to the topic. Typically, the first step in studying a graph involves examining its structure and the relationships present, enabling you to determine how to extract meaningful information.

00:12:19.400 You may be familiar with some of the algorithm names I will mention. The first one is depth-first search, which progresses through the graph from a selected source vertex to all connected vertices. The second is breadth-first search, which operates similarly but employs a different method.

00:13:06.500 I will also briefly discuss shortest-path algorithms: Dijkstra's and the A* algorithm, along with the cache shortest path algorithm, which we utilized in our applications for the International Union for Crystallography. The depth-first search is simple; you start with the source vertex and add its neighbors to a list of vertices to visit next, selecting the last one added for the next step.

00:13:53.540 In the case of this traversal, we use a stack to implement a last-in-first-out structure. For a straightforward graph, we start at vertex 'A', moving through 'C' and 'B' to traverse the entire structure. As we depict each step, it becomes clear that we traverse the graph by branches; for example, moving from 'A' to 'E' and backtracking to explore node 'C'.

00:14:40.770 This method is effective for determining connectivity within the graph, allowing us to identify groups of related vertices that are isolated from others. Additionally, it helps us understand how many groups exist with interrelated vertices but with no internal connections.

00:15:26.679 For instance, imagine professors who need classrooms for their lectures. There's a relationship between professors and classrooms, but not among professors or between classrooms themselves. This scenario illustrates a bipartite graph.

00:16:07.320 Knowing the types of graphs we have will guide us in selecting optimized algorithms for information retrieval since general algorithms can often be resource-intensive—making them difficult to implement and use in production.

00:16:54.960 One important application of depth-first search involves topological sorting, which is crucial for ordering related tasks. An example would be creating a rake task that depends on other tasks. When executing the rake command, these tasks must complete in the correct sequence, hence using depth-first search enables you to establish the order of execution effectively.

00:17:44.340 Another method for traversing a graph is breadth-first search, which focuses on finding the shortest path between two vertices. In this method, we examine the graph layer by layer, first accounting for all neighbors of the initial vertex, then moving on to their neighbors and so on.

00:18:41.700 Optimally, this algorithm does not require the traversal of the entire graph to find a specific target vertex, stopping as soon as we identify the shortest route to it. The illustration shows that starting from 'B' and 'C', we can find ‘U’ without needing to visit vertex 'D', yielding a shorter path to our target.

00:19:35.660 When discussing paths in this context, we measure the shortest route regarding the fewest intermediate vertices through which we traverse the graph. This same principle can be applied to compare words, where two words are connected if changing a single letter transforms one into the other.

00:20:36.480 Next, we will address weighted graphs, where edges have attributes attached to them. In this case, the breadth-first search algorithm is not effective because the path with the fewest vertices may not translate to the least weight. Instead, we often use Dijkstra's shortest path algorithm, which calculates the cost for a vertex based on the cumulative weights of the edges that link it to the source vertex.

00:21:44.920 As we navigate the graph, we need to maintain this cost rather than merely focusing on visited vertices. Consequently, the adjacency matrix would now include weights for edges that relate to the distances between vertices.

00:22:34.769 As mentioned, Dijkstra’s algorithm is conceptually similar to other traversal algorithms. You initiate from the source vertex and expand outward, though the key difference lies in how you assess which vertex to consider next.

00:23:21.630 Instead of solely utilizing distance, we incorporate the weight attributes into our calculations. During this process, we also maintain a log of the shortest distance found to each of the vertices, allowing us to know not only the shortest path between two targeted vertices but also the distances to all intermediate vertices.

00:24:09.100 Finally, when working with a weighted graph, the frontier transforms into a priority queue featuring vertices arranged based on their associated distances. In Ruby, there’s a library named 'priority queue' that can simplify the implementation of this structure.

00:24:53.419 During traversal, we also track the parent vertices for each visited vertex, allowing us to backtrack after finding a vertex to identify the path leading to it effectively. In the simple example used throughout the discussion, the shortest path from source vertex ‘A’ to target vertex ‘E’ would be 'A’ to 'C’, demonstrating the connection across the graph.

00:25:45.000 In practical applications, distances likewise correspond to corrected values determined by various means, such as the lengths or weights relating to specific routes. The representation of this graph can adapt to fit diverse systems, accommodating requirements pertaining to different contexts.

00:26:51.600 To summarize, not only can we represent graphs with relations among vertices, but we also address situations when complexities arise, or when weighted edges impose different limitations. Among the existing algorithms for shortest paths, some may require adaptations based on the type of problem at hand.

00:27:38.500 For instance, traveling salespersons face significant computational challenges due to the sheer number of possible routes as the number of cities increases. Although optimal solutions exist, performance limitations may render them impractical, resulting in the need for approximate algorithms.

00:28:36.540 In conclusion, if you take away a few key points from this talk, I hope it highlights the vast array of systems describable by graphs. This abstract representation allows for effective applications of graph algorithms to analyze and obtain additional insights about such systems.

00:29:34.810 However, remember that implementing graph algorithms can be challenging. Many recent frameworks offer assistance in overcoming these difficulties, but it’s critical to understand their functionalities to choose the best-suited one for your problem. Similar to any analytical task, you must first thoroughly grasp the problem, select the appropriate representation for your data, and finally, identify the optimal algorithm. With some patience, you can enjoy solving these problems with graphs. Thank you.