Talks

Beyond the language. An introduction to algorithms

Beyond the language. An introduction to algorithms

by Simone Carletti

In the opening keynote titled 'Beyond the Language' at RubyDay 2015, speaker Simone Carletti introduces the concept of algorithms through the practical problem of identifying a poisoned beignet among several on a table. Carletti explains that an algorithm is a set of rules or processes to follow for problem-solving, especially in computing, and is crucial for programmers. He starts with a relatable problem involving a coffee table with beignets, where one is poisoned and heavier than the others. The goal is to devise an efficient algorithm to identify the poisoned beignet without wasting time weighted against the desire to eat as many good beignets as possible.

Key points discussed include:
- Definition of Algorithms: An algorithm dictates the operational rules for solving problems, primarily through computation.
- Understanding Efficiency: The efficiency of an algorithm relates to its cost, measured in CPU time and storage, and is classified using Big O notation.
- Example of Algorithm Design: Carletti illustrates a simple algorithm to find the poisoned beignet by weighing each one against the others, originally presenting a linear complexity of O(n). He argues that efficiency doesn't merely relate to speed but involves finding faster algorithms for identical tasks.
- Advanced Algorithms: Multiple strategies are introduced to improve the initial approach, including the 'divide and conquer' method, which reduces the search exponentially by weighing groups. Another refined method involves weighing three groups, improving the efficiency of finding the poisoned beignet.
- Importance of Data Structures: Carletti discusses how data structures, like hash tables in Ruby, significantly affect algorithm performance. He explains that optimizing hash functions leads to better execution times. Poor structural design can yield inefficient algorithms.

In conclusion, while one does not need to master algorithms and data structures merely to be labeled a programmer, the pursuit of becoming a competent or exceptional programmer necessitates understanding these concepts. Carletti recommends seminal literature such as 'The Algorithm Design Manual' and 'Ruby Under a Microscope' for further exploration into algorithm design. He acknowledges his influences for the talk, reinforcing the importance of targeted learning in the field of programming.

00:00:14.040 I want to talk to you about something that is not directly connected with Ruby, but the presentation will also feature some Ruby details. What I want to discuss is algorithms.
00:00:20.920 I call this talk 'Beyond the Language.' You will understand why during the presentation.
00:00:26.720 I want to start the talk by presenting a problem. A little background: this talk was originally scheduled to be after the coffee break, which is why I mentioned the beignet. However, all of you probably had breakfast this morning, so we can keep the beignet here, maybe it was a croissant, but that's fine.
00:00:39.719 Today, the problem we want to solve is as follows: there is a coffee table with a lot of beignets. Of course, you're greedy and want to eat as much as possible, but unfortunately, there's a mean person who put some poison in one of the beignets. Because of the poison, that beignet weighs a little bit more than the others.
00:00:56.600 Let’s pretend that all the other beignets have the same weight. The only way for you to identify the poisoned beignet is to weigh them, either by hand or using a scale. Each time you weigh a beignet, you waste time.
00:01:07.960 Your goal is to eat as many beignets as possible without eating the poisoned one, unless you're accustomed to that, before all the good ones are gone. There are a lot of greedy people in the room.
00:01:19.479 So, what you really want is an algorithm that will tell you which beignet is poisoned in a fast and efficient way.
00:01:28.879 I leave this question for you to ponder for a moment. A quick introduction about myself: my name is Simone, and you can find me with the handle "weapos" pretty much everywhere, on GitHub, Twitter, and so on.
00:01:36.280 I work for DNSimple. Now, quick question: how many of you have installed a gem? Great! That means you use our service. We provide DNS hosting, domain registration, and SSL certificates for various services and for cool projects like RubyGems, Atom, GitHub, and Travis.
00:01:49.200 I have several interests. According to the conference page, my interests include diving, sharks, and algorithms. I was born and raised here in Turin, and I know that sharks and diving are not really relevant to Piedmont, so I will focus on algorithms.
00:02:05.440 It is important to start by defining what an algorithm is. An algorithm is a process, or better, a set of rules to follow in calculations or other problem-solving operations, especially by a computer. There is a close connection between algorithms and computers.
00:02:36.280 There is another definition: it is a term used by programmers when they do not want to explain what they did. For instance, if asked how they solved a problem, they might say, "It’s a complex algorithm. You wouldn’t understand." It's important to clarify that an algorithm is not strictly a program. A problem can also be viewed as an elaborate algorithm.
00:02:53.200 In general, computer programs contain several algorithms that specify the instructions a computer should perform to execute a specific task. Now, back to the original question I saw about a year ago on Programmer's Stack Exchange, which is a sister project to the more famous Stack Overflow.
00:03:10.720 The question was: Do I need to understand algorithms and data structures to be called a programmer? I don't want to answer that right now; I will address it at the end of this talk, but I want you to start thinking about it.
00:03:30.040 The whole purpose of this talk is to help you start thinking about discovering, learning, and studying algorithms. Let's go back to our original problem of the tasty beignet.
00:03:44.640 To recap, we have a coffee table with beignets, one of which has a poison that makes it heavier. I want to find the heavier one, but remember, there’s a cost associated with each weighing, and I want to eat as many beignets as I can, ideally without dying.
00:04:03.520 The next step is to formalize the algorithm. One of the most challenging tasks in solving a problem is finding a way to express the real-world scenario in a mathematical or formalized model. I will do that for you today.
00:04:31.600 So, the problem is to find the heaviest item among N items in a collection. The instance of the problem is defined as a set of items, including the one we want to find. The computational model for today is simple: a scale, but we can also use our hands.
00:04:47.160 Each weight operation takes a fixed cost, and our algorithm will focus on a waiting strategy. The goal is to create an algorithm that finds the heaviest item in the collection, ideally ensuring that it is correct and efficient.
00:05:08.160 Let’s try to find the simplest algorithm to do this. If you are older than 30, you might remember Kinder Surprise eggs and the tiny figures inside. What kids, including myself, used to do was take one egg and compare the others to find the best surprise.
00:05:23.520 The simplest algorithm takes the first item and compares it to all the others. Easy! So we take the first beignet and compare it with the second. Then we repeat this for the third, fourth, fifth, and sixth until we find the poisoned beignet.
00:05:40.560 In our example with eight beignets, it took six steps to find the poisoned one, which was in position seven. An important thing to remember is that when measuring an algorithm, we always consider the worst-case scenario. We don't care where the beignet is; we assume we have to account for the worst-case.
00:06:02.320 Is the algorithm correct? Raise your hand if you say yes. Raise your hand if you say no. It's generally considered correct, but the question of efficiency is more challenging. Raise your hand if you think it's efficient.
00:06:18.480 In reality, we need to talk about the cost of the algorithm: what does it mean for an algorithm to be efficient? The cost of an algorithm can be defined as an estimation of the resources it needs to solve a specific computational problem.
00:06:36.360 Resources can include CPU time, storage, etc. The lower the cost, the higher the efficiency of the algorithm. In computer science, we use Big O notation to classify the cost of an algorithm relative to changes in input size. This notation helps us understand the algorithm's behavior in worst-case scenarios.
00:07:05.360 The Big O notation is an asymptotic notation, which means it approximates the behavior of an algorithm as input size grows. Unfortunately, we need a bit of math to understand the complexity of an algorithm, but I'll keep it simple.
00:07:21.680 Here is a list of common costs for an algorithm depending on input size. Starting from two items to larger sizes, you can see how different types of algorithms perform.
00:07:33.600 For example, constant time (O(1)) performs similarly no matter the input size, which is the most desirable. Linear time grows with the input size, and logarithmic scales perform very slowly and are often more manageable.
00:07:50.760 Then we have algorithms with quadratic growth (O(n^2)) and exponential growth, which indicates that your algorithm might take a very long time to complete.
00:08:18.280 The graph here illustrates the execution time of an algorithm versus input size: elements on the X-axis and operations needed on the Y-axis. This growth affects the performance significantly.
00:08:33.800 The interesting part arises when we work with specific algorithms. For example, we could find complexities represented as O(n log n), characteristic of sorting algorithms. But the key point is to keep in mind the efficiency based on the specific situation.
00:09:02.680 Now, let's analyze the cost of our first algorithm: one versus all. We take the first item and compare it with each other. This algorithm is correct and exhibits linear complexity, which means its cost in Big O notation is O(n). But is it efficient?
00:09:29.440 The answer depends on how we define efficiency. An algorithm is considered efficient if there isn't a faster algorithm that can produce the same results.
00:09:44.120 Another algorithm can improve performance by comparing pairs of items one by one. The cost is halved, but in terms of Big O, it's still O(n). Though technically different, both algorithms yield similar performance outcomes because Big O notation ignores constant factors.
00:10:15.520 The third algorithm I want to introduce is called 'divide and conquer.' Here, we split the item collection into two sets, weigh them, and discard the lighter group, repeating this until we narrow it down to the poisoned beignet.
00:10:38.720 In this case, the cost of the logarithmic scale reduces computational complexity significantly, and we can even verify that the third algorithm is efficient.
00:11:04.080 Yet, I have a fourth algorithm that improves upon this approach by splitting into three groups. You weigh two of them, and if they're the same weight, you know the poisoned one is in the group you haven't weighed.
00:11:25.280 This algorithm also operates logarithmically, but it's marginally faster because of the way it eliminates possibilities.
00:11:44.120 In large scenarios, like if you have a huge coffee table, this speed difference becomes significant as you scale the number of items.
00:12:10.360 The performance between these algorithms mathematically demonstrates how crucial it is to choose an efficient algorithm over just a faster language. Algorithms significantly impact execution times, sometimes even more than language performance.
00:12:36.840 Now, I want to transition to how data structures align with algorithms. A significant data structure would be the hash table in Ruby. Each operation in an ideal hash table should have constant time, reflecting optimal performance.
00:13:01.840 However, if we poorly design this structure, like using a linked list that chains items, we run into inefficient linear time when trying to search through combined entries.
00:13:30.560 In contrast, Ruby maintains nearly constant time operations by utilizing arrays of buckets, handling key hashing with efficiency. It performs quick lookups, mainly relying on the number of entries facilitated by hashing.
00:14:03.280 If we define a bad object that always returns a constant value as a hash, we can mess up performance dramatically. When tested with various keys, you can see how poorly designed hash functions affect efficiency.
00:14:20.520 Through observations of paired performance graphs, you'll notice considerable differences in response times based on how well the hash functions were implemented.
00:14:43.680 To wrap up, let me return to our original question: Do you need to understand algorithms and data structures to be called a programmer? Probably not to simply be termed a programmer. But if you aspire to be a good or great one, then yes, you should start exploring algorithm design.
00:15:00.680 I recommend two great resources for this journey: "The Algorithm Design Manual" and "Ruby Under a Microscope." While exploring algorithm concepts in Ruby, you'll find these insights crucial.
00:15:18.960 Lastly, I want to thank Luan Gua, my professor who inspired me deeply for this talk, and Po, the author of the 'Ruby Under a Microscope' book.
00:15:39.200 Thank you!