Simulated Annealing: The Most Metal Algorithm Ever 🤘

Simulated annealing is a fascinating algorithm that's designed to help find a particular type of solution (near-optimal, aka "good enough") to a particular type of problem (constrained optimization). It's inspired by the science of metallurgy, and because it's grounded in a real-world process I find it incredibly approachable. In this talk I'll explain in plain terms about what simulated annealing is, what a constrained optimization problem is, why you might want a "good enough" solution, and how we can use the Annealing gem to add simulated annealing to our Ruby apps.

RubyConf 2022

00:00:00 Ready for takeoff.

00:00:17 Hey everybody! We have a lot to talk about, and I'm going to get started. Today, we're here to talk about simulated annealing, the most metal algorithm ever. In honor of this topic, I'm wearing a shirt from my favorite metal band, Overcast, whose members went on to join other great metal bands like Shadows Fall and Killswitch Engage.

00:00:22 But this isn't about them; this is about simulated annealing. My name is Chris, and I've been developing software since the late 90s. I currently work as a senior software engineer for GitHub, where I focus on open-source software supply chain security. I also work on several open-source projects, the most notable of which is the Annealing gem that I collaborate on with Louise Oscoitia, and we will see that in action a little later.

00:00:46 I've also been known to have a pretty metal past, but these days I'm a bit more chill. I live in Florida with my beautiful family, and we raise backyard chickens. I can't disclose the children's names since they value their privacy, but I can introduce you to our chickens! On the left, we have Peppa, followed by Peggy, Millie, and Cindy. You might say that I've settled into my lowest energy state these days. Don’t worry, that joke will make sense in about 20 minutes.

00:01:19 So, what is simulated annealing? It's an algorithm designed to find approximate solutions to complex optimization problems. In this talk, we'll introduce what optimization problems are, why we might want an approximate or good enough solution, and we'll go through a real-world example of an optimization problem. Finally, we'll see how I utilize simulated annealing to solve that problem. Let’s kick off by defining what an optimization problem is.

00:01:50 In mathematics and computer science, an optimization problem involves finding the best or optimal solution from all possible solutions based on some objective cost function. In particular, we'll be discussing two subclasses of optimization problems: combinatorial optimization problems and constrained optimization problems. Combinatorial optimization problems deal with finding the optimal arrangement of a set among all potential arrangements, while constrained optimization problems involve one or more variables that have constraints when not met, which may disqualify a potential solution or penalize it via the objective cost function.

00:02:07 A classic example of a combinatorial optimization problem is the traveling salesperson problem, which has several variants. A simple version of the problem can be stated as follows: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and then returns to the origin city?

00:02:45 Let's say you're in an edgy metal crossover band that also happens to be made up of software developers, and you're sitting in a pizza shop planning your next tour. You have a list of eight cities you want to visit and a list of distances between all those cities. In this case, the potential solutions consist of all permutations or arrangements of those cities, each constituting a potential route. The optimal solution is the arrangement of cities where the sum total of distances between them is the shortest.

00:03:17 To solve this problem, we need to generate a list of all potential routes, sum the total distances for each route, and then select the route with the shortest distance. We might define a cost function that looks like this: it accepts a potential route, adds the starting city to the end to make it a round trip, and totals the distances using our distance matrix.

00:03:49 Then, we can use it to find the optimal route like this: first, we gather a list of all our cities, then we find all permutations of those cities as our potential routes. Ruby has a built-in permutation method that makes this easy. Next, we loop over all potential routes, calculate the total distance along each route using our cost function, and compare the current route to the best route we've already selected. We keep the one that has the shortest distance.

00:04:48 At the end, we find the shortest possible route, which originates from a city and follows a specific path. For example, one possible route is a 6,900-mile round trip that starts in Atchison, Kansas, and visits several cities before returning to Atchison, promising fame and glory.

00:05:00 For fun, here's an example of a less optimal route that spans 15,000 miles!

00:05:28 So if the goal of an optimization problem is to find the optimal solution, why would we ever want to settle for an approximate solution? One common issue with optimization problems is that they do not scale well. The formula we can use to calculate the number of permutations of a list—or the ways to arrange them—is n factorial. For eight cities, there are 40,320 potential routes to consider. But if we double that to 16 cities, we would face nearly 21 trillion routes to assess!

00:06:16 On my M1 laptop, it took about 150 milliseconds to run the script that generates the 40,000 permutations for the eight-city tour, which would scale to approximately 270,000 routes per second. However, if we assume that rate remains linear and no other constraints are considered, it would take about 2.5 years to find the optimal route through 21 trillion potential routes, which in computer science terms means this algorithm is NP-hard. This is a simplified explanation because, as we won't dive deeper into that topic, it means that solving NP-hard problems can eventually become impossible.

00:07:07 Here's an interesting sidebar for math enthusiasts: with a set of 100 items, there are 9.3 times 10 to the 161st permutations to consider. Given that the route is a loop and can be started from any city, if we eliminate those equivalent routes, we still have an incredibly large number to deal with. So, if we take into account that there may be groups of any number of members sharing degrees of overlap with different groups, this adds more complexity to finding unique arrangements.

00:07:29 So, let's break this down into some new requirements. Groups must contain subsets, supersets, or exclusive sets of members relative to each other, and each member prefers to be grouped with members they have not been grouped with before. Given enough time, every member of a group will inevitably meet every other member at different rates depending on the groups they belong to.

00:08:01 With all these considerations, we must conclude that selecting an optimal arrangement is no longer simply a matter of finding a unique arrangement even for very large groups. Instead, we should consider unique arrangements ideally when possible, but if not, we would look for arrangements of members who have met least frequently or most recently. Therefore, we need to define a new cost function.

00:08:56 This table gives an example of how we might consider calculating the cost of subgroups containing any two members, given a history of five previous groupings. The recency and favorability rows indicate their pairing history, and the columns represent potential meeting scenarios where costs scale as we go from left to right.

00:09:29 So, let's solve this for a small group first. Start by defining the cost function which accepts a potential arrangement of members into subgroups. Initially, we'll loop over each subgroup and find all combinations of members. For each pair, we check to see if they've met before and add a cost based on the recency of their meeting. If they've never met, their cost is zero.

00:10:05 Next, we'll apply this function to our permutations. If the cost of a subgroup equals zero, it means nobody within that subgroup has met before, and we can accept that arrangement as the best grouping. Otherwise, we compare the current cost to see if it's lower than the best grouping found previously.

00:10:34 This approach works for a small six-person group but isn't efficient for larger groups. Since our history grows deeper, cost calculations increase leading us back to the top of the hierarchy of permutations. This makes it time for an approximate solution, especially since our group consists of friendly hubbers wanting to socialize.

00:11:06 It's time to introduce simulated annealing. As you might expect from the name, simulated annealing is modeled after physical annealing—a process of heating a material then cooling it slowly at a controlled rate. This lets the material, such as metal, rearrange its structure to achieve a more desired outcome.

00:11:47 In physical annealing, when temperatures are high, the molecular structure becomes weak, allowing significant changes. As the material cools, its structure becomes stronger and less likely to alter. The slow cooling rate supports the structure to eventually settle into its lowest energy arrangement.

00:12:00 Simulated annealing mimics this process but applies it to the problem we're attempting to solve. The different states represent potential solutions, while the cost functions reflect the energy of each state. The annealing schedule dictates the number of states sampled and the adjustments permitted between them.

00:12:28 The process unfolds like this: first, we define an annealing schedule with an initial temperature and a cooling rate. Next, we select a random solution as the current state and generate a new solution from it, utilizing the cost function to assess the energy of both states. Afterward, we decide which state will become the current one.

00:12:56 We then reduce the current temperature according to the annealing schedule and check the limits of the simulation to see if it has reached the end. We repeat this process, allowing the algorithm to occasionally select less optimal states to escape local minima.

00:13:20 The goal is to determine which state to keep as we iterate, adjusting the temperature methodically until it approaches zero. For our application, we set up our simulation as an instance of an annealing simulator and run it to discover the optimal arrangement based on the defined parameters.

00:14:06 By the end of this simulation, we’ll have an approximate arrangement that we can evaluate to determine effectiveness. Hopefully, by now, you have an understanding of what an optimization problem is, why accepting approximate solutions may sometimes be necessary, and how simulated annealing can aid in identifying these.

00:14:45 If you're keen to explore simulated annealing more thoroughly, I highly recommend the original research paper titled "Optimization by Simulated Annealing," published in 1983 in Science magazine. For algorithms in general, "Grocking Algorithms" by Aditya Prakash is a commendable read. If optimization problems are of interest to you, I also advocate checking out Google OR Tools, an open-source suite aimed at helping solve such issues.

00:15:20 I’m not active on social media, but I welcome questions or comments at [email protected]. Don’t forget to check out the annealing gem and thank you for your time!