Samuel Giddins

Making CocoaPods Fast with Modern Ruby Tooling

Writing performant code is hard. Writing performant ruby code that does lots of stuff is really hard. CocoaPods got to be pretty slow “at scale”, and this is the story of how we made our pod install times bearable again.

By Samuel Giddins https://twitter.com/@segiddins

Samuel Giddins is a developer well-versed in the rituals of writing developer tools that occasionally work. By day, Samuel works on making the mobile developer experience at Square less arduous; by night he can be found breaking Bundler and CocoaPods. Before this whole “developer” thing, Samuel studied in the highly impractical Mathematics & Economics departments at UChicago, learning subjects such as “numbers”, “social theory”, and “memes”. When not coding, Samuel is often in the kitchen, marveling at the fact that dinner smells better than it looks.

https://rubyonice.com/speakers/samuel_giddins

Ruby on Ice 2019

00:00:12.700 Welcome to this performance rollercoaster where we're going to talk about making CocoaPods faster with modern Ruby tooling.
00:00:18.770 Samuel Giddins is going to tell us about performance and how to write performant Ruby code for speeding up CocoaPods at scale.
00:00:25.750 Samuel has a background in Mathematics and Economics from the University of Chicago. He loves cooking, and right now works at Square helping mobile developers have a great development experience. He fiddles with Bundler and CocoaPods when he's not on the clock.
00:00:39.739 Welcome, Samuel!
00:00:58.070 So, fun story: when I submitted this talk, I just said 'making CocoaPods fast' or 'faster,' and the organizer said, 'Well, it sounds like an interesting talk, but I don't know that anyone's going to think it has anything to do with Ruby.' So, I promise it does.
00:01:12.560 The 'modern Ruby tooling' part you'll see is a bit of a fudge since I'm going to talk about both tooling and software development practices as a whole.
00:01:30.020 What spawned this talk? Writing performant code is hard. Writing performant Ruby code that does a lot of stuff is really hard. I think a bunch of talks we've seen so far this weekend have confirmed that.
00:01:46.520 CocoaPods got to be pretty slow at scale, and this is a bit of the story of how we made CocoaPods bearable again.
00:02:04.460 I'm Samuel Giddins, and you may have seen my name on the internet in connection with a few open-source projects before.
00:02:17.110 About a year ago, I joined the mobile developer experience team at Square, where it's my job to keep all of our iOS developers happy and productive. I do an okay job at that.
00:02:29.890 You might have also seen my name or my face on Bundler and RubyGems issues and pull requests. I've even obsessed a bit about performance in those projects.
00:02:35.440 But I'm not here today to talk about them.
00:02:48.670 So, what is this CocoaPods thing and why is this guy on stage talking about it here at a Ruby conference? CocoaPods is a pretty big command-line tool. It's built in Ruby and serves as a dependency and package manager for the Apple ecosystem.
00:03:10.930 At Square, we've got a pretty big set of iOS apps, and they're quite extensive. Running our `pod install` command would take about three minutes on a brand-new MacBook Pro, which was really bad, and people weren't happy.
00:03:28.480 It was time to dive in and help boost my team's productivity. With the help of some amazing tools, patience, and vague recollections of how computer science works, we managed to cut the time it takes to run the CocoaPods command in half.
00:03:37.060 We achieved this by changing less than 200 lines of code.
00:03:52.510 We saved 90 seconds per pod install, primarily through the types of performance improvements you can find in basically any Ruby library.
00:04:02.560 CocoaPods is essentially what would happen if you took RubyGems, combined it with Bundler, and smashed them together. It's both a package manager and a dependency manager.
00:04:20.799 CocoaPods combines a definition of how libraries work (a podspec) with a way to integrate them into a user's application (a Podfile). If you take the word 'pod' in CocoaPods and replace it with 'gem,' you'll likely understand what I'm talking about fairly easily.
00:04:41.560 This similarity is no coincidence. For comparison, RubyGems has a gemspec, while CocoaPods has a podspec. Bundler has a Gemfile, and CocoaPods has a Podfile.
00:05:00.940 The difference lies in Xcode. Has anyone used Xcode before? I see some hands and hear a big sigh, which I understand!
00:05:15.490 Xcode is Apple's proprietary toolchain for compiling apps for Apple platforms. If you have an iPhone, all your iOS apps were built using Xcode.
00:05:33.370 It uses its own unusual manifest file format, which I am way too familiar with, given that I wrote a parser for it a couple of years ago.
00:05:48.490 The parser was buggy, and it invokes many compilation tools, including a C compiler, a Swift compiler, an asset catalog compiler, and a linker, all things we don't typically deal with in the Ruby world.
00:06:06.100 While the similarities between CocoaPods, RubyGems, and Bundler may make CocoaPods seem similar in functionality, it actually has to do a lot more.
00:06:21.330 Writing a build system for compiled artifacts is significantly more effort than simply downloading and unzipping a bunch of files while manipulating a few global variables.
00:06:33.820 As you may guess, when you run 'bundle install,' you want to install something from a Gemfile. What do you think you'd run to install something from a Podfile?
00:06:41.910 That's right—`pod install`. We were super original. This is what running `pod install` will look like for you: it fetches specifications, resolves dependencies using the same algorithm as Bundler and RubyGems, and downloads those dependencies.
00:07:00.460 Then it generates a project for Xcode, and this step would take over two and a half minutes when I started working at Square. This was the core of what CocoaPods was doing.
00:07:18.520 So, CocoaPods exists. Sam, you've told me it's slow. Why focus on performance?
00:07:28.990 It may seem like a silly question, but I think it's important to examine why we're talking so much about performance this weekend. One reason is that performance typically doesn't improve on its own—it takes a lot of active work.
00:07:53.470 You typically don't accidentally fix performance problems, so it's something we need to think about. Performance optimization can be fun, at least for someone like me, as we already have all the features in place.
00:08:09.180 We have our tests, and we have users using it, so all we need to do is make this thing faster—whatever it takes to achieve that goal.
00:08:28.300 From a business perspective, rapid iteration leads to happier and more productive developers, which, in turn, leads to monetary benefits for your boss.
00:08:40.970 Scale is also a significant consideration—how much worse does performance get as the system grows? As mentioned earlier, improving performance is hard to find, and it requires substantial effort.
00:09:01.490 So, if you have a slow Ruby app, how do you make it faster? The first question you should ask is whether your app is really slow. What is it actually doing? Is it performing inherently complex tasks requiring extensive compute power or time?
00:09:15.340 For instance, downloading 20-gigabyte files from the internet is never going to be quick. How often does your app run, and who is using it?
00:09:31.090 If it runs once a day, or if it's an endpoint that only one person in the company uses once a month to generate some stats, who really cares if it's slow? Would improving performance result in a perceptible difference?
00:09:48.400 Alternatively, can you redefine your app's role to decrease its workload, such as pointing it at a file instead of downloading data from the internet?
00:10:06.800 How can you prevent slow aspects of your system from getting worse? It’s one thing to invest effort into improving performance, but maintaining those gains over time is crucial.
00:10:21.610 Another thought is how much you would be willing to invest to enhance your app’s performance. How much would you pay a developer or a SaaS company to make your app's workload instantaneous?
00:10:38.180 Improving performance always involves a trade-off, whether it is faster execution or dedicating more time to fixing bugs or writing new features.
00:10:57.660 Finally, how do you identify which part of your app is slow? Is it a critical part, or is it just an ancillary function? It's complicated to fix problems when you lack a clear understanding of what is wrong.
00:11:10.840 Typically, the process starts with a feeling that something is slow, leading you to use a stopwatch or command-line tools to measure time and investigate any long-running functions.
00:11:27.340 This brings us to profiling, which is the practice of quantifying perceived slowness by assigning numbers to it. It's easier to discuss performance concretely in terms of data rather than subjective feelings.
00:11:46.890 So, what can you profile? You can analyze method calls, sample call stacks, memory allocations, disk I/O, network I/O, database queries, and garbage collection.
00:12:00.890 In my work with CocoaPods, I focused specifically on two of these profiling methods, relaying a particular project from about a year ago that aimed to make CocoaPods faster.
00:12:20.700 While I've utilized various tools, the correct one must be chosen for the issue at hand.
00:12:35.480 One of our contributors ran CocoaPods under a memory profiler and discovered that we allocated hundreds of millions of objects that we didn't need. Reducing that allocation resulted in huge performance gains.
00:12:52.760 Profilers are like most tools: using the right one for the problem helps immensely. A standard profiler can tell you how many times a method was called and the time spent in each call, which is a useful starting point.
00:13:11.920 However, tracing profilers can lead to distorted numbers if the profiling occurs in tight loops, increasing overhead. Using profiling can slow down the program significantly.
00:13:34.170 An anecdote: someone suggested running RubyProf to investigate a performance issue, but it took so long to run that it proved impractical.
00:13:50.130 Allocation profilers trace memory allocation records instead of method calls, but they can also create overhead and severely slow down Ruby applications.
00:14:07.780 Sampling profilers work by periodically peeking into your program without causing much interference, making them more suitable for production testing.
00:14:20.110 They lack the ability to tell how many times something has been called, however.
00:14:32.120 I coined a term: a manual profiler. This is akin to you running the time command or using a stopwatch.
00:14:48.270 Manual profiling allows you to generate specific data to understand suspected issues in your code. This method tends to yield more precise information than using larger profilers.
00:15:04.350 After realizing the utility of this approach, I built a tool called Chronometer to simplify profiling in CocoaPods.
00:15:20.370 Chronometer wraps methods and records timing data, allowing results to be outputted in a format compatible with Chrome’s tracing view.
00:15:36.840 This avoids the need for me to build a visualization tool to understand the timing data.
00:15:53.110 Profiling revealed a fundamental issue with CocoaPods architecture, alongside smaller, easier optimizations.
00:16:07.960 These optimizations included simple adjustments like relocating code outside of loops.
00:16:22.200 However, the significant architecture issue was related to graph traversal, a common performance bottleneck across build systems.
00:16:40.200 Graph traversal involves finding all nodes or vertices that come after a specified one. This often leads to duplicate visits.
00:16:56.500 Recently, an Android team faced a similar graph traversal issue that led to a significant slowdown.
00:17:16.560 Properly addressing traversal can drastically reduce performance burdens across build tools.
00:17:31.320 Most build systems operate on directed acyclic graphs, ensuring nodes can't depend on themselves. Most operations using dependency graphs care about transitive closures.
00:17:48.330 Transitive closure means finding all nodes following a specific node, no matter how far away they are.
00:18:05.920 To address this, we transitioned from a naive recursive approach to employing a set and only traversed if we had never seen the particular node before.
00:18:21.970 This approach relates closely to memoization—the storing of computed values for later retrieval.
00:18:36.430 This technique helps avoid redundant computations and therefore improves performance.
00:18:52.400 While memoization usually provides benefits, we’d need to handle cases where cached values could be invalidated.
00:19:05.680 In CocoaPods, we rewrote our build settings generation to counter these setbacks, resulting in a notable bottleneck being addressed.
00:19:21.170 This brought substantial performance improvements and rectified a long-standing bug hindering adoption of Swift at Square.
00:19:35.840 We optimized existing features while ensuring our architecture could handle future performance needs.
00:19:51.300 Despite all these efforts, CocoaPods is still not fast enough at high scales, likely due to Ruby's inherent performance limitations.
00:20:08.440 Some challenges stem from CocoaPods being an older project, lacking design foresight for current demands.
00:20:23.510 Mutable objects impede memoization, since altering attributes complicates the caching of stored results.
00:20:38.920 Additionally, file system reads and writes can further impede CocoaPods' performance, as I/O can’t be parallelized safely.
00:20:54.590 CocoaPods has emphasized its user-friendly design, which has sometimes slowed execution due to its printing and formatting.
00:21:07.920 Progress output and error tracking can complicate parallelization, as multiple operations are challenging to display accurately.
00:21:23.890 Inefficient data structures also contribute to performance hurdles. CocoaPods relies on nested hashes and arrays, complicating cache management.
00:21:39.700 Caching must be managed carefully; if the validity of cached values cannot be successfully assured, performance suffers.
00:21:56.000 Moreover, encoding data in Ruby complicates our ability to compute stable hashes of contents, given the variability of representation.
00:22:13.040 Tracking changes also becomes challenging, necessary for optimal performance as determining when an update is needed is crucial.
00:22:31.640 Square is wrapping up a project called Refinement aimed at enhancing runtime performance by allowing us to operate with better cache structures.
00:22:45.140 As we delve deeper into this field, we've observed that a well-considered architecture is imperative.
00:23:00.240 While algorithm optimization is valuable, it only goes so far when constrained by the inherent design of CocoaPods.
00:23:14.560 Looking forward, we still see a significant investment in CocoaPods due to its extensive use, with estimates suggesting it runs over a hundred million times annually.
00:23:32.200 The projects' focus is now on enhancing scale. We're currently seeing great progress in incremental installations and various aspects of our tooling.
00:23:51.600 Efforts to optimize installations by tackling performance issues are ongoing and necessary.
00:24:09.280 Addressing performance is a continuous journey. We regularly assess metrics to monitor installation processes, enabling us to identify regressions immediately.
00:24:26.430 No build system has fully resolved the challenge of scaling. Continuing advancements in this domain are critical.
00:24:41.540 The work on CocoaPods will undeniably carry on, and the discussion of making it faster remains pertinent.
00:24:57.480 Thank you for your time. I believe we have a few moments for questions.
00:25:14.790 Thank you, Samuel, for concluding our final talk. Do we have questions?
00:25:24.920 One of the perks of working in a large company equipped with numerous iOS developers is the ease of acquiring metrics.
00:25:43.840 We don't collect data directly in CocoaPods, but I implemented a plugin that reports existing method timings back to our logging server.
00:26:00.480 At the end of the day, my focus is on improving Square's developer experience, and I can directly track productivity metrics.
00:26:16.530 That's one advantage of being at Google scale; I’d love to obtain metrics for open-source projects, which requires deeper integration.
00:26:31.760 How to collect data in a secure and privacy-preserving manner for open-source projects remains an unresolved issue.
00:26:46.530 Any more questions?
00:27:04.000 Looks like we've covered everything then. Thank you!