Optimizations in Multiple Dimensions

00:00:15 Thank you all so much for being here. There were so many good talks in this time slot. It's really exciting to see so many people attending mine. This is wonderful.

00:00:28 As Valerie mentioned, my name is Jamie Gaskins, and I'm from Baltimore.

00:00:34 Let's go ahead and jump right into it. I'm not sure how long this talk will take, as each rehearsal has gone differently.

00:00:48 I'll try not to hit the spacebar too hard while speaking.

00:00:54 When we talk about optimization, we need to understand what it means. It's important to make this fuzzy term more concrete so that we can communicate about it better.

00:01:08 At a high level, optimization is about improving a metric of your choosing. You want to move that metric in a desired direction along a particular axis.

00:01:21 For instance, if we are optimizing RAM consumption, we want to reduce the amount of memory our application consumes in production.

00:01:39 The desired direction here is towards zero, because the less memory we use in our application, the more we have available for capacity spikes.

00:01:53 This may seem straightforward, but we must ask: are we really done? Did any of the other metrics we observe get worse as we improved this one?

00:02:05 As complexity grows, optimizations might have unforeseen consequences. For example, if memory consumption is high due to in-memory caching, we might end up recalculating values, which can lead our Ruby processes to use more CPU cores.

00:02:23 Today, we will discuss what metrics we should consider, how to communicate about performance, and the trade-offs we might introduce into our codebase.

00:02:40 First, we need to think about metrics. For instance, optimizing RAM consumption when your production machines approach 100% RAM usage can prevent significant delays in processing.

00:03:00 Milliseconds per transaction become important if customers must wait for a transaction to complete before proceeding. Here, 'transaction' is a generic term for a unit of work, such as a web request or background job.

00:03:20 Transactions per second is important for tracking processing capacity at scale. It's also essential to consider metrics we might not consciously think about but that influence our performance.

00:03:37 Time from feature inception to release is a metric that helps understand the initial deployment of a feature or service. This can be seen as a Greenfield metric.

00:03:56 It shows how long it takes before customers gain access to the feature. One might consider the time a feature spends in backlog; understanding its impact can be critical.

00:04:16 Inspecting time between deployments can help gauge how granular app improvements are over time. Do we need extensive changes to existing code before we can deploy a new feature, or can we just add new features easily?

00:04:33 The time from bug discovery to deployment of fixes informs us about the team's ability to respond to errors in production. Are we witnessing an increased difficulty in fixing bugs we introduce? These are metrics we may not consciously monitor but are still crucial.

00:04:52 Next, we will talk about communicating about performance. Often, we simply throw around the term optimization. However, it’s vital to understand that different people might mean different things when they discuss performance.

00:05:04 Typically, when we mention performance, we refer to execution speed. The cushion of performance centers around concepts such as how long it takes to execute once and how many times we can execute in a given time period.

00:05:24 Although they can seem inversely proportional, they may not be in complex systems. For example, if we can parallelize the process, the cost of executing multiple times could match that of executing once.

00:05:42 If caching is implemented, that cost could range anywhere from one to ten times, depending on unique inputs. Caching can help or harm your application, and it’s essential to understand its nuances.

00:06:06 If a service runs on JRuby or another implementation featuring a just-in-time compiler (JIT), later iterations of your code might execute much faster than the first due to optimizations.

00:06:22 However, be mindful that early run times may not be indicative of later performance due to JIT-related adjustments.

00:06:37 It’s crucial to consider whether subsequent iterations might trigger garbage collection, which can slow them down.

00:06:49 When we factor in I/O, we open an entirely different set of performance challenges, which can add uncertainty around the optimization of one aspect over another.

00:07:03 Requests made to a remote API are subject to not only the target system's performance but also the network conditions. If persistent connections aren't maintained, secondary actions such as TCP handshakes may slow communication.

00:07:27 Interacting with a remote cache adds processing time to initial execution as we wait for cache keys to be acknowledged. This caching interaction shares many considerations with API requests.

00:07:41 File system access can also be slow, particularly in cloud environments where disk latency is significantly higher than when running on the same machine.

00:07:54 If your application uses a database, performance characteristics like read/write capabilities and indexing need to be accounted for in relation to performance.

00:08:07 Distinguishing between latency and throughput is essential when discussing performance, as both terms have their distinct meanings.

00:08:24 When we discuss how long it takes to complete a task, we can refer to it as latency, while throughput describes how many tasks can be completed in a given time.

00:08:37 Choosing between latency and throughput matters based on your needs at any given moment. If data volume compared to processing capacity is concerning, focus on throughput.

00:08:52 Conversely, if users must wait for events, prioritizing latency becomes more important. Even then, be cautious of diminishing returns.

00:09:09 Optimizing for higher throughput may result in increased latency, but when scaled to millions, those processing times can yield significant advantages.

00:09:19 It’s vital to consider priority metrics in context. Once you've chosen a primary metric, remember it can shift over time due to changing traffic patterns.

00:09:34 If a highly-trafficked component of your application can handle spikes efficiently, prioritize latency; if not, throughput gains may need to become your focus.

00:09:46 In summary, communication about performance should focus on clarity to reduce ambiguity. Next, we will address trade-offs.

00:10:00 Potential trade-offs may include CPU consumption versus RAM, small data sets versus large, probing read times versus recalculated data distributions.

00:10:13 In data structure discussions, consider whether you’re optimizing for time or space efficiency, which translates into CPU or RAM saving considerations.

00:10:23 If you process a file line-by-line versus reading the entirety of its contents in one go, this can affect your performance outcome based on data size.

00:10:47 Similarly, memory allocation in Ruby is handled by assigning whole pages instead of calling for memory repeatedly, introducing a space/time trade-off.

00:11:01 Within Ruby’s MRI interpreter, memory allocation and garbage collection work through pages, optimizing allocation times at potential RAM cost.

00:11:17 Memoization is also an important concept, relating to caching at the instance level, storing computed values to avoid redundancy.

00:11:31 Another critical trade-off is the choice between optimizing for small versus large data sets. Focusing on one metric can lead to poor outcomes for the other.

00:11:42 It's common to use algorithms optimized for larger sets on smaller data and vice versa, leading to performance inconsistencies during production loads.

00:11:59 Analyzing intersection points of algorithm performance can help in determining which one to prefer depending on your data size.

00:12:16 Ruby’s internal behavior optimally allocates resources according to the number of keys used in a hash, dynamically adjusting from flat arrays to complex structures.

00:12:32 This adaptability ensures better performance based on how many keys are involved, minimizing the overhead required.

00:12:45 A consideration of caching follows, which can feel like a magical solution to performance issues, but it's essential to recognize the trade-offs involved.

00:13:02 Caching does carry a cost, especially when it comes to in-memory caching where RAM becomes a factor, potentially leading to performance degradation.

00:13:18 Additionally, remote caching introduces latency costs and hits to performance, such that careful analysis is necessary to assess whether it offers true benefits.

00:13:34 Thus, balancing CPU time with wall clock time becomes imperative, prioritizing either based on the needs of your application at any point.

00:13:49 Cache invalidation becomes essential; without it, stale values risk performance reliability and can drive up costs due to frequent cache misses.

00:14:05 Strategies like Least Recently Used (LRU) help maintain efficient cache management, but developers must tune these systems carefully.

00:14:24 Cache hit rates, ideally over 90%, ensure that the cost of requests remains viable. If rates dip below 90%, reevaluation becomes crucial.

00:14:40 In addition, it’s vital to keep in mind that performance under load differs significantly from performance at idle. Testing against realistic loads is crucial.

00:14:56 Experiencing significant deviations in production performance compared to development emphasizes the need for comprehensive performance profiling.

00:15:12 Understanding how different scales affect your application is essential for performance optimization, as even minor variations can have major implications.

00:15:28 In summary, we grapple with complex performance structures. Each adjustment can impact another aspect of your system.

00:15:40 Not all optimizations work equally; no silver bullet exists for performance. It's vital to measure, tune, and objectively analyze your outcomes.

00:15:56 Latency and throughput serve as distinct metrics that shouldn’t be merged in conversations. Understanding these differences guides better decisions.

00:16:10 Lastly, production workloads must be acknowledged significantly as they differ notably from development environments.

00:16:25 Optimizations should be made preemptively, fostering a robust foundation for your application’s performance.

00:16:34 If you can, leverage metrics tracking services early to develop an understanding of your app's performance, leading to informed optimization decisions.

00:16:50 That concludes my talk. I am Jamie Gaskins. Please feel free to reach out for any questions or comments you may have. Thank you so much, everyone.