Euruko 2019

A gentle introduction to Data Structure Trees

A gentle introduction to Data Structure Trees

by Ashley Jean

In her talk titled "A Gentle Introduction to Data Structure Trees" at EuRuKo 2019, Ashley Jean provides an accessible and thorough overview of tree data structures, their significance, and their practical applications in programming. She begins the session by introducing herself as a Ruby developer at MDLogix and emphasizing the importance of understanding tree structures in computer science. Ashley engages the audience with questions about their coding backgrounds to illustrate the diverse experiences present at the conference and highlights that many attendees may only know about trees for interview preparation.

The key points of her presentation include:

- Understanding Data Structures: Ashley differentiates between linear (arrays, linked lists) and non-linear (trees, graphs) data structures, explaining that trees organize data hierarchically.
- Defining Trees: She provides a clear definition of a tree as a collection of nodes, introducing essential concepts such as nodes, roots, edges, children, and subtrees.
- Visual Representation: She explains tree structures visually, likening them to family trees where hierarchical relationships are easy to follow.
- Calculations in Trees: The depth and height of nodes, as well as how to measure edges, are discussed, providing the audience with foundational knowledge about tree properties.
- Benefits of Using Trees: Ashley elaborates on why trees are beneficial, noting their efficiency in storing hierarchical data and their superior performance in searching, inserting, and deleting operations compared to simpler data structures, especially with larger datasets.
- Tree Variants: She explains binary search trees and their recursive nature, highlighting the logarithmic time complexity (O(log n)) for operations like insertion and searching, showcasing advantages over arrays.
- Tree Traversals: Traversal techniques such as pre-order, in-order, post-order, and breadth-first search are briefly mentioned.
- Real-World Applications: Ashley provides examples of how trees function in real-world scenarios, including navigation in dictionaries, database indexing, and B-trees for handling large datasets efficiently.
- Self-Balancing Trees: Finally, she discusses self-balancing trees like AVL trees and B-trees, emphasizing their importance in maintaining efficiency even with varying data sizes.

Ashley concludes her talk with encouragement to continue exploring this vital area of computer science, summarizing that trees help organize hierarchical data efficiently. She expresses her gratitude to the audience, inviting them to connect with her for any questions and offering additional resources for further study.

00:00:05.759 Welcome everyone. Let's move on to our next speaker, Ashley. Ashley is of Haitian and American descent, originally from New York, but she also has roots in Baltimore, which she says is pronounced 'Baltimore.' Ashley works at MDLogix, where they assist individuals with chronic care management. I'm very excited to introduce her to you. Please give a warm welcome to Ashley!
00:00:52.079 Thank you! Hello, folks! The title of my talk today is 'A Gentle Introduction to Tree Data Structures,' with a key emphasis on the gentle aspect. Let me share a little bit about myself. I work as a Ruby developer for a company called MDLogix, based in Baltimore, Maryland. Our company creates applications for behavioral health integration and chronic care management. Additionally, I'm a part-time graduate student at Towson University, studying computer science.
00:01:06.560 Before we dive in, I have a couple of questions for the audience. How many of you learned to code the traditional way, by majoring in computer science or studying in an academic setting? Please raise your hands. Okay, thank you, you can lower them now. Now, how many of you learned to code in non-traditional ways, such as through boot camps or being self-taught? Great! How many of you actually understand trees and data structures? That's wonderful! It's great to see such an educated crowd.
00:01:40.159 Now, here's the interesting part: how many of you know about trees only for the purpose of tackling interview questions? Yes, I hear that. I asked these questions to highlight the diverse experiences among us conference attendees and to show that data structures can still be a complex subject for many.
00:02:03.640 Personally, when I took an advanced data structures course last fall, my emotions were incredibly similar to this GIF. One day, I would feel really confident about the material, but then a pop quiz would hit, and I would realize, 'Nope, I don’t got it!' from there, it was back to the drawing board. Computer science fundamentals can take time to fully grasp, but over time, you'll reach your goals.
00:02:54.000 In the next 30 minutes, we'll discuss why trees are useful, the fundamentals of commonly used trees and their properties. More importantly, I'll provide examples to show you how trees are relevant in our code. For those familiar with these concepts, this talk should serve as a review; for those new to the material, I hope this talk provides exposure and ultimately a foundation on which to build.
00:03:06.400 When it comes to data structures, there are many types to choose from. We have linear structures, such as arrays and linked lists. In a linear structure, every item is related to its previous and next item. We also have what's called a non-linear data structure, such as trees and graphs. In a non-linear data structure, every item can be connected to many other items.
00:03:42.560 So, let's start with defining what a tree is exactly. A tree manages data in a hierarchical way. For example, consider a family tree. In a family tree, the grandparents are at the top, followed by the parents, and at the bottom are the children. Every family member is displayed accordingly and can be accessed easily.
00:04:04.880 I want to provide you with a comprehensive definition from my data structures and algorithms textbook: a tree is a collection of nodes. This collection can be empty; otherwise, a tree consists of a distinguished node called the root and zero or more non-empty subtrees. Each of these roots is connected by directed edges. When I first read this definition in class, I thought, 'What does this even mean?' It may sound long-winded, so let's break it down into four key terms: nodes, root, edges, and subtrees.
00:04:55.440 In a tree, we often visualize it as being upside down. We start with a node, which is simply an object that contains data. To create a tree, we start with a root node, which is at the top of the tree; every element below the root is trickled down. Nodes are connected by what we define as edges. If a root node is connected to another node or nodes, we say that the root becomes the parent of those nodes. In this context, the yellow node acts as the parent to the red nodes.
00:05:47.840 Children nodes are the nodes below a parent. For example, in our case, the blue nodes are the children nodes of the red nodes. Importantly, the two red nodes are known as sibling nodes because they're on the same hierarchical level. We also have what are called subtrees; a subtree is a portion of a tree data structure that can be viewed as a complete tree in itself. For instance, beneath the root node, we have a left subtree and a right subtree.
00:06:35.360 It's worth mentioning that leaf nodes are the last nodes in our tree; they are the nodes that don't have any child nodes connected to them. Next, let's cover some calculations. The depth of a node is equal to the length of the path from the root to that node, while the height of a node is the longest path from that node to a leaf node.
00:07:09.919 Lastly, the number of edges is calculated by taking the number of nodes in that path and subtracting one. For example, if we're looking at the right-hand side with four nodes, subtracting one gives us three edges. So, why should we use trees? Trees are great for storing naturally hierarchical data and organizing it for quick searching, inserting, or deleting nodes in moderate time.
00:08:06.560 You might be familiar with examples such as a filing system, which consists of files and directories. We start at the root, and each folder breaks down into files and directories. Similarly, the DOM (Document Object Model) is a representation of HTML to the browser and allows you to manipulate the page like a tree, branching out from a single parent item into child items that further branch into smaller trees.
00:08:59.760 Trees are all around us, and we use them frequently without realizing their significance. But why should we care? The answer lies in performance; the runtime of using trees tends to be much faster than other data structures, especially when executing certain operations. For those unfamiliar, runtime is defined as the duration during which a program is running, and we measure that time using what's known as big-O notation.
00:09:50.240 In simple terms, big-O notation helps us understand how long an algorithm takes to run by measuring it in the worst-case scenario. On this graph, the best-case scenario is shaded in green, and in contrast, the worst-case scenario would be represented by a higher value like n factorial. Now, you might wonder why we're discussing logarithms while discussing trees; just bear with me—it'll all make sense in the coming slides.
00:10:48.800 Let's compare some data structures. For our first two example data structures, let's hypothetically say our dataset is on the smaller side—remember that phrase for later. The first one is an array. Searching through an array typically has a runtime of O(1) since we would need to iterate through the elements. The same applies to the removal process, where we locate the elements to remove.
00:11:25.440 Next, for a linked list, similar to an array, the runtime for searching, inserting, and removing elements is also O(1). With these great runtimes, you might wonder why you'd ever choose another data structure, especially when the big-O graph we previously viewed shows arrays and linked lists performing well.
00:12:14.240 However, keep in mind that we mentioned our data sets are small. If the data sets increase, it will take longer to search and manage, and the runtime will not remain the same as in our examples. Herein lies the role of trees. By using a binary search tree, for example, we can perform much faster operations by sorting the data in an array.
00:12:57.440 When inserting a number into that existing array, a binary search tree will look for numbers greater than that specific number and insert it in the correct order, effectively cutting the search work in half. Thus, the average runtime for searching, inserting, and deleting in a binary search tree is O(log n), making it a more efficient option than arrays or linked lists.
00:13:59.840 Now that we've discussed binary search trees, let's move on to the next type, the binary tree. A binary tree is structured so that each node can have at most two children. Each tree may vary in appearance but adheres to this child limit. Even if a node doesn't necessarily have children, it still represents a binary tree as long as the children don't exceed two.
00:14:34.960 Let's shed light on the binary search tree. By definition, a binary search tree ensures that all values of the nodes in the left subtree are less than all the nodes in the right subtree. In this setup, the left subtree will always contain values less than or equal to that of its parent, while the right subtree holds values greater than or equal to those of its parent. When numbers are inserted, the binary search maintains this organization.
00:15:23.680 For example, if 21 is our root node, any number greater, like 28 or 32, is inserted on the right side, while smaller numbers, like 14 or 11, are placed on the left. A point of interest about binary search trees is that whenever a new node is inserted, it automatically becomes a leaf node, which contributes to how we traverse the trees.
00:16:08.160 Now, let's briefly discuss tree traversals. This process involves visiting each node in a tree and printing their values. This topic could take another hour to cover, and since we have a party to attend, I’ll keep it short. During traversal, there are typically four methods: pre-order, in-order, post-order, and breadth-first search. Each method outlines a way of accessing the nodes.
00:16:42.560 Returning to our examples, let's consider how binary trees can be applied in real-life searching. If we were to look for a word in a dictionary, we would navigate to the section where words starting with 'R' are located. This method, splitting the sections in half while searching, mimics the searching mechanism in binary search, improving efficiency. Therefore, searching for a participant's information in a healthcare database utilizes indexing to search and retrieve data effectively.
00:17:40.960 Another implementation is seen with B-trees, which are incredibly efficient for sorting data in database indexes. B-trees can manage multiple nodes, facilitating handling of large amounts of data, specifically when data may exceed main memory capacity. Searching for a participant begins from the root and traverses downwards, discarding unnecessary indexes along the way until the correct value is found.
00:18:22.560 This method exemplifies a divide-and-conquer approach where we halve the workload repeatedly until our desired data is located. It’s important to note that in computer science, this is often referred to as divide and conquer. Now, let's explore how binary search trees can be represented in Ruby.
00:19:10.240 In Ruby, we begin by initializing a node. The variables of interest include the value that the node carries and its connections to left and right nodes, if they exist. Additionally, we establish the root node to indicate where the tree originates.
00:19:31.440 When inserting a value into the root, we first check if the node is nil. If it is, we create the root node with that value. If not, we pass the value to the insert function, where further comparison is made with the existing nodes. If the value is less than or equal to the stored value, it will be placed in the left subtree. We utilize a ternary operator for ease of checking whether the left subtree is nil.
00:20:22.240 If the left is not nil, we continue inserting into the left subtree while repeating the process. Similarly, if the new value is greater than the existing value, a comparable approach is taken in the right subtree until the appropriate spot for the new node is found.
00:21:05.760 Now, what happens if we wish to search for a node? We again use a recursive approach to traverse the tree, halving our search with each iteration. By comparing values with the existing entries in the tree, if the value is less than the existing one, we leverage the search function to repeat the steps.
00:21:41.760 The same procedure applies if the value is greater until we locate the node, returning the desired value. While constructing the tree, we can visually represent it: as demonstrated, the root node appears nil with progressively smaller values on the left side.
00:21:51.680 For instance, values like six and three reside on the left while eight sits on the right of six. With ten being a leaf node, it lacks left or right values. Similarly, for the right subtree with eleven, values like sixteen exist alongside thirty-two, which is also a leaf node.
00:22:32.000 As we conclude the discussion about trees, we've covered their examples and learned about their functionality and benefits. However, those knowledgeable about trees are aware that binary search trees are only most effective when they are balanced.
00:23:01.600 If trees are unbalanced, the average runtime deteriorates and may become O(n), resulting in a considerably longer search process. Consider the key visual distinction between a balanced and an unbalanced tree: searching in an unbalanced tree resembles looking for a word within a dictionary at an arbitrary page—certainly not efficient.
00:23:57.280 We need to find a faster option; fortunately, we have self-balancing trees. The first self-balancing tree we will examine is called AVL trees. This technique employs a series of rotations to maintain balance. In simple terms, the AVL tree checks the height difference between the left and right subtrees, ensuring it doesn't exceed one. If the height surpasses this limit, right or left rotations are conducted until the tree achieves a balanced height.
00:25:47.200 Another variant is the splay tree, which organizes itself whenever a node is accessed, ensuring that frequently accessed nodes are quick to reach. There's also the B-tree, capable of managing multiple nodes and efficiently balancing data across vast datasets, especially suitable for scenarios where the data may not fit in memory.
00:26:12.480 Undoubtedly, other types of trees exist, such as heaps, tries, red and black trees, and more. The takeaway is that there's always more to explore. Now that we've revisited or learned about these foundational aspects of data structures, don’t hesitate to continue your learning journey.
00:26:52.080 In summary, trees are beneficial for organizing naturally hierarchical data, allowing for quick searching, as well as inserting and deleting keys in moderate time frames. Binary trees, characterized by having at most two children per node, maintain the binary search criteria where values in the left subtree are less than those in the right.
00:27:02.560 Traversals serve as processes to visit each node within a tree, while B-trees are self-balancing trees accommodating above two nodes, efficiently handling data incapable of fitting within main memory.
00:27:12.960 The examples discussed today included family trees, HTML DOM, database indexing, and implementing caches as well as general database searches. If you're interested in pursuing further studies in trees or related data structures and algorithms, I have recommended resources.
00:27:38.560 To conclude my talk, I thank you for your attention. If you have questions, please feel free to find me in the speaker lounge afterward.
00:27:51.440 Thank you very much! Oh, and I have a small present for you all.