00:00:14.040
I want to talk to you about something that is not directly connected with Ruby, but the presentation will also feature some Ruby details. What I want to discuss is algorithms.
00:00:20.920
I call this talk 'Beyond the Language.' You will understand why during the presentation.
00:00:26.720
I want to start the talk by presenting a problem. A little background: this talk was originally scheduled to be after the coffee break, which is why I mentioned the beignet. However, all of you probably had breakfast this morning, so we can keep the beignet here, maybe it was a croissant, but that's fine.
00:00:39.719
Today, the problem we want to solve is as follows: there is a coffee table with a lot of beignets. Of course, you're greedy and want to eat as much as possible, but unfortunately, there's a mean person who put some poison in one of the beignets. Because of the poison, that beignet weighs a little bit more than the others.
00:00:56.600
Let’s pretend that all the other beignets have the same weight. The only way for you to identify the poisoned beignet is to weigh them, either by hand or using a scale. Each time you weigh a beignet, you waste time.
00:01:07.960
Your goal is to eat as many beignets as possible without eating the poisoned one, unless you're accustomed to that, before all the good ones are gone. There are a lot of greedy people in the room.
00:01:19.479
So, what you really want is an algorithm that will tell you which beignet is poisoned in a fast and efficient way.
00:01:28.879
I leave this question for you to ponder for a moment. A quick introduction about myself: my name is Simone, and you can find me with the handle "weapos" pretty much everywhere, on GitHub, Twitter, and so on.
00:01:36.280
I work for DNSimple. Now, quick question: how many of you have installed a gem? Great! That means you use our service. We provide DNS hosting, domain registration, and SSL certificates for various services and for cool projects like RubyGems, Atom, GitHub, and Travis.
00:01:49.200
I have several interests. According to the conference page, my interests include diving, sharks, and algorithms. I was born and raised here in Turin, and I know that sharks and diving are not really relevant to Piedmont, so I will focus on algorithms.
00:02:05.440
It is important to start by defining what an algorithm is. An algorithm is a process, or better, a set of rules to follow in calculations or other problem-solving operations, especially by a computer. There is a close connection between algorithms and computers.
00:02:36.280
There is another definition: it is a term used by programmers when they do not want to explain what they did. For instance, if asked how they solved a problem, they might say, "It’s a complex algorithm. You wouldn’t understand." It's important to clarify that an algorithm is not strictly a program. A problem can also be viewed as an elaborate algorithm.
00:02:53.200
In general, computer programs contain several algorithms that specify the instructions a computer should perform to execute a specific task. Now, back to the original question I saw about a year ago on Programmer's Stack Exchange, which is a sister project to the more famous Stack Overflow.
00:03:10.720
The question was: Do I need to understand algorithms and data structures to be called a programmer? I don't want to answer that right now; I will address it at the end of this talk, but I want you to start thinking about it.
00:03:30.040
The whole purpose of this talk is to help you start thinking about discovering, learning, and studying algorithms. Let's go back to our original problem of the tasty beignet.
00:03:44.640
To recap, we have a coffee table with beignets, one of which has a poison that makes it heavier. I want to find the heavier one, but remember, there’s a cost associated with each weighing, and I want to eat as many beignets as I can, ideally without dying.
00:04:03.520
The next step is to formalize the algorithm. One of the most challenging tasks in solving a problem is finding a way to express the real-world scenario in a mathematical or formalized model. I will do that for you today.
00:04:31.600
So, the problem is to find the heaviest item among N items in a collection. The instance of the problem is defined as a set of items, including the one we want to find. The computational model for today is simple: a scale, but we can also use our hands.
00:04:47.160
Each weight operation takes a fixed cost, and our algorithm will focus on a waiting strategy. The goal is to create an algorithm that finds the heaviest item in the collection, ideally ensuring that it is correct and efficient.
00:05:08.160
Let’s try to find the simplest algorithm to do this. If you are older than 30, you might remember Kinder Surprise eggs and the tiny figures inside. What kids, including myself, used to do was take one egg and compare the others to find the best surprise.
00:05:23.520
The simplest algorithm takes the first item and compares it to all the others. Easy! So we take the first beignet and compare it with the second. Then we repeat this for the third, fourth, fifth, and sixth until we find the poisoned beignet.
00:05:40.560
In our example with eight beignets, it took six steps to find the poisoned one, which was in position seven. An important thing to remember is that when measuring an algorithm, we always consider the worst-case scenario. We don't care where the beignet is; we assume we have to account for the worst-case.
00:06:02.320
Is the algorithm correct? Raise your hand if you say yes. Raise your hand if you say no. It's generally considered correct, but the question of efficiency is more challenging. Raise your hand if you think it's efficient.
00:06:18.480
In reality, we need to talk about the cost of the algorithm: what does it mean for an algorithm to be efficient? The cost of an algorithm can be defined as an estimation of the resources it needs to solve a specific computational problem.
00:06:36.360
Resources can include CPU time, storage, etc. The lower the cost, the higher the efficiency of the algorithm. In computer science, we use Big O notation to classify the cost of an algorithm relative to changes in input size. This notation helps us understand the algorithm's behavior in worst-case scenarios.
00:07:05.360
The Big O notation is an asymptotic notation, which means it approximates the behavior of an algorithm as input size grows. Unfortunately, we need a bit of math to understand the complexity of an algorithm, but I'll keep it simple.
00:07:21.680
Here is a list of common costs for an algorithm depending on input size. Starting from two items to larger sizes, you can see how different types of algorithms perform.
00:07:33.600
For example, constant time (O(1)) performs similarly no matter the input size, which is the most desirable. Linear time grows with the input size, and logarithmic scales perform very slowly and are often more manageable.
00:07:50.760
Then we have algorithms with quadratic growth (O(n^2)) and exponential growth, which indicates that your algorithm might take a very long time to complete.
00:08:18.280
The graph here illustrates the execution time of an algorithm versus input size: elements on the X-axis and operations needed on the Y-axis. This growth affects the performance significantly.
00:08:33.800
The interesting part arises when we work with specific algorithms. For example, we could find complexities represented as O(n log n), characteristic of sorting algorithms. But the key point is to keep in mind the efficiency based on the specific situation.
00:09:02.680
Now, let's analyze the cost of our first algorithm: one versus all. We take the first item and compare it with each other. This algorithm is correct and exhibits linear complexity, which means its cost in Big O notation is O(n). But is it efficient?
00:09:29.440
The answer depends on how we define efficiency. An algorithm is considered efficient if there isn't a faster algorithm that can produce the same results.
00:09:44.120
Another algorithm can improve performance by comparing pairs of items one by one. The cost is halved, but in terms of Big O, it's still O(n). Though technically different, both algorithms yield similar performance outcomes because Big O notation ignores constant factors.
00:10:15.520
The third algorithm I want to introduce is called 'divide and conquer.' Here, we split the item collection into two sets, weigh them, and discard the lighter group, repeating this until we narrow it down to the poisoned beignet.
00:10:38.720
In this case, the cost of the logarithmic scale reduces computational complexity significantly, and we can even verify that the third algorithm is efficient.
00:11:04.080
Yet, I have a fourth algorithm that improves upon this approach by splitting into three groups. You weigh two of them, and if they're the same weight, you know the poisoned one is in the group you haven't weighed.
00:11:25.280
This algorithm also operates logarithmically, but it's marginally faster because of the way it eliminates possibilities.
00:11:44.120
In large scenarios, like if you have a huge coffee table, this speed difference becomes significant as you scale the number of items.
00:12:10.360
The performance between these algorithms mathematically demonstrates how crucial it is to choose an efficient algorithm over just a faster language. Algorithms significantly impact execution times, sometimes even more than language performance.
00:12:36.840
Now, I want to transition to how data structures align with algorithms. A significant data structure would be the hash table in Ruby. Each operation in an ideal hash table should have constant time, reflecting optimal performance.
00:13:01.840
However, if we poorly design this structure, like using a linked list that chains items, we run into inefficient linear time when trying to search through combined entries.
00:13:30.560
In contrast, Ruby maintains nearly constant time operations by utilizing arrays of buckets, handling key hashing with efficiency. It performs quick lookups, mainly relying on the number of entries facilitated by hashing.
00:14:03.280
If we define a bad object that always returns a constant value as a hash, we can mess up performance dramatically. When tested with various keys, you can see how poorly designed hash functions affect efficiency.
00:14:20.520
Through observations of paired performance graphs, you'll notice considerable differences in response times based on how well the hash functions were implemented.
00:14:43.680
To wrap up, let me return to our original question: Do you need to understand algorithms and data structures to be called a programmer? Probably not to simply be termed a programmer. But if you aspire to be a good or great one, then yes, you should start exploring algorithm design.
00:15:00.680
I recommend two great resources for this journey: "The Algorithm Design Manual" and "Ruby Under a Microscope." While exploring algorithm concepts in Ruby, you'll find these insights crucial.
00:15:18.960
Lastly, I want to thank Luan Gua, my professor who inspired me deeply for this talk, and Po, the author of the 'Ruby Under a Microscope' book.
00:15:39.200
Thank you!