Ben Woosley

Lightning Talks: RejectConf

Help us caption & translate this video!

http://amara.org/v/GUQR/

GoRuCo 2009

00:00:18.760 I’ll make this really short because I plan to discuss this topic further at the next NYCB. I want to give you a brief overview of Chef, which is a dependency management system for installing packages on servers. It powers the Engine Yard Solo and Flex, and I think it's pretty cool. The way Chef works is by creating what they call a 'cookbook.' A cookbook is essentially a directory full of recipes. A recipe is used to perform tasks such as installing Mcash, a bunch of gems, or MySQL. Essentially, you want to outline all the dependencies that your app requires so you can easily bring up your server in a deterministic way. You define these dependencies in a JSON file called 'dna.json,' though I'm unsure why it's named that.
00:00:40.600 You specify your list of packages. For example, when I was working with EC2, we needed various gems, and we had to set up the repository accordingly. It might be a bit hard to read, but I will tell it what recipes to use. A recipe consists of a few sets of Ruby code where you can specify actions for each repository in Git, like executing a git clone, performing a submodule operation, and installing packages. For Mcash, I want to install a couple of packages and retrieve some files. Chef is really cool, so come to NYCB if you want to hear more!
00:02:22.519 My name is Pat Nakajima. Earlier, I mentioned that I wanted to give a lightning talk, but I was uncertain about the topic. So, I decided to write some software to assist me in that area. It's called Thunder because before lightning comes thunder. If I enter my GitHub name, I can see my projects listed and evaluate them one by one, determining which ones are interesting enough for a talk. Because I have many projects, it will take some time to go through them all.
00:02:41.400 To demonstrate this, I can look at Josh's projects since he doesn't have as many. By going through each project, I can identify the ones I want to talk about. That's the application I wanted to show you guys, and I want to emphasize that if you have any minor needs, you should consider writing software to address them.
00:03:18.400 Now, I won't be demoing a particular product, but I want to invite you to something. I work for the Sunlight Foundation, and we recently held an Apps for America contest. We invited people to build and submit applications based on various APIs related to money, politics, and influence. During Ben Stein's talk, it was mentioned that the Office of the Chief Information Officer of the federal government recently launched data.gov. This platform acts as a directory to help you find various raw data feeds from different government sources.
00:04:25.320 Data can range from mundane information, like the locations of copper smelters, to interesting details such as the FBI’s top ten most wanted list and airline quality information over the last 30 years. It's quite diverse. We are running a second contest similar to the first, featuring a cash prize of about $155,000 sponsored by Google, O’Reilly, and TechWeb. Different prizes will be awarded.
00:04:57.679 In the first contest, the winner was a site called 'filibustered,' which was a Ruby on Rails app. The contest saw a strong level of Ruby participation, which was fantastic. I realize I'm covering a lot of ground quickly, suitable for a lightning talk. The contest allows you to build any app around any data from data.gov, and submissions can span various ideas. We even received one submission within 24 hours called 'FBI Fugitive Concentration,' which used the mentioned FBI widget to help identify different fugitives. Initially, it seemed absurd, but you ended up memorizing the faces of individuals who pose a real danger to society. This application is now in the running for a $10,000 prize. If you think you can come up with something even more engaging or relevant, please give it a shot!
00:05:37.760 Yes, that’s what the Sunlight Foundation does, and the government is working hard to promote transparency and access to data. Everyone here loves APIs and building solutions on top of them. You are prime candidates to make a difference in this contest and help your country. Thank you!
00:06:25.639 My name is Eric Mill. Who's next?
00:06:31.479 I'm Aaron Quinn, from Brooklyn. I have two quick things to discuss. A couple of months ago, I attended another conference called GOO in San Francisco. It was an awesome experience; however, there was an incident that we won't discuss today. What I will say is that out of the incident, some great initiatives emerged.
00:06:49.280 One of these initiatives was RailBridge, an organization started by individuals who wanted to get more people involved in the Ruby community and address issues of diversity and teaching kids about programming. I started a small fundraising effort, initially putting in some of my own money and encouraging others to donate as well. This effort is ongoing, and as of now, we have about $900 raised. What's great is that we plan to donate half of this money to the Anita Borg Foundation, which raises awareness for women in programming and runs some excellent initiatives. The other half will support the RailsBridge organization.
00:07:24.520 Our goal is to fund initiatives aimed at getting computers into schools and teaching Ruby, and some incredible people are involved in projects with Arduino and kids. It's really exciting! If you can donate even $10, it would be fantastic as these organizations are doing tremendous work. The second thing I want to mention is a project I created called 'Sammy.'
00:08:02.960 While it's not Ruby-based, I’m really excited about it. Sammy is similar to Sinatra, a Ruby library that I’m obsessed with, but for JavaScript. An app built with Sammy consists of defining an app, passing an application along with a set of selectors, and defining routes that act on hashtags or URLs, which then execute tasks such as forms. Unfortunately, I can't show you a specific example right now, but I encourage you to check it out. I started a mailing list to connect with other people who are interested in it, and just a week after releasing it, someone named Alex Lang in Germany embedded it into CouchDB, creating an entire application.
00:08:43.800 This entire application with Sammy acts as a client controller, which is really cool! You can find it on GitHub at github.com/quirky or at code.quirky.com/Samy. Please take a look!
00:09:50.399 It's my opinion that we've all heard about Hadoop and MapReduce. Many in our community either underestimate or underappreciate it. Personally, I've discovered a new appreciation for it in the past year. Even having heard about it before, I didn’t fully understand its potential. I wanted to help others experience that newfound understanding.
00:10:17.520 When I was first learning about Hadoop, the only examples I came across were word count examples. It seemed as if counting words in logs was the only operation possible. This limitation arose because they were using the streaming API, which is the simplest method of using MapReduce. You can write in any language with it, but you can only execute one MapReduce pass. The more interesting operations require using the Java API or DSLs that people have created, which allow for chaining, redirecting, and recursion of MapReduce jobs.
00:10:43.160 Thus, MapReduce can become a subcomponent of a larger process, akin to middleware connecting and interrelating through a common interface. Here’s an example demonstrating operations that allow filtering and grouping through the higher-level language. This approach is fascinating because it tackles complex problems—like large graphs—where MapReduce originated, mainly for mapping the web.
00:11:17.280 So, through Hadoop, you can handle enormous datasets efficiently. Recently, they set a world record for sorting a petabyte of data, something that previously seemed impossible. I believe it's essential to explore Hadoop for data generated through user actions or available on the web. It's significant!
00:12:14.320 Additionally, I developed two small Ruby gems called Abstraction and FreezeRay. The Abstraction gem allows for creating abstract classes in Ruby, since Ruby does not inherently support this concept. An abstract class cannot be instantiated and serves as a base for subclasses. For example, in a user model where you eventually want two types of users—such as super user and ordinary user—it’s more efficient to create an abstract model since there are no users that fall in between. This method allows for better organization of your code.
00:12:53.180 In this case, if you try to initiate the abstract class directly, it will raise an error, which is beneficial for test-driven development. In one instance, our team at Dropo needed to split a class into two subclasses, but the original class was still referenced in tests. We needed to ensure that none of the tests would instantiate this class, and using abstract classes assured that would not happen. The other gem, FreezeRay, addresses an issue with ActiveRecord's dirty tracking.
00:13:29.760 Dirty tracking does not work correctly in ActiveRecord because if you mutate a string directly, the object has no awareness of the change. This creates confusion when tracking changes. By integrating FreezeRay into your application, you can freeze string objects, ensuring that when they are mutated, an error is raised instead of yielding unexpected results in dirty tracking.
00:14:10.320 With FreezeRay, if you attempt to mutate an object, you'll be alerted to the mutation, preventing it from slipping through undetected. While this gem may not be for everyone, it provides peace of mind if you want to ensure your dirty tracking works consistently. Look out for potential edge cases!
00:14:54.420 In today's brief presentation, I will also share how to achieve over a 10x performance boost by fixing Ruby threads through low-level optimizations. If you're interested in this kind of content, please check out my blog at timeob.com. This talk focuses on x86 and x86-64 architectures.
00:15:11.320 Let's start by looking at what a Ruby thread is. A thread is a schedulable set of registers, a set of Ruby VM state, and a copy of the thread stack. Ruby uses green threads—specifically, in versions 1.8 and 1.9. Green threads only have state in userland, with no kernel context. This is advantageous since creating threads is inexpensive and context switching should be quick, but in Ruby 1.8 and 1.9, this is not the case, leading to performance issues.
00:15:53.800 Green threads can lead to complications, especially when blocking I/O occurs. If one green thread blocks, all green threads suffer. Thus, it's challenging to efficiently utilize multicore resources, as the kernel is oblivious to threads running in userland. The core problem lies in how thread stacks are managed; transferring between two green threads during a context switch involves copying a significant amount of memory.
00:16:35.920 The solution is to eliminate unnecessary memory copy operations. One way to improve this is by utilizing memory mapping techniques. You create memory regions where the stack resides, allowing for fast context switching with minimal overhead. In my testing, this approach significantly improved performance across various benchmarks, yielding up to 10 times speed enhancements.
00:17:13.200 This speed improvement was most noticeable in applications where threads have multiple function calls. In a 'Hello World' application with Sinatra, there was a 1.26x speed increase. Additional functions would yield even greater performance gains. I'd love to talk more about this with anyone interested in low-level optimizations.
00:17:47.960 Now, I want to share a product I found called Google Perf Tools. Some of you may already know about it, especially if you use Ruby Enterprise Edition, which incorporates parts of it. One wonderful feature of Perf Tools is its built-in CPU profiler that generates informative graphics detailing what's happening in your code. I started with event machine functions to analyze performance.
00:18:20.800 I delved into EventMachine and found it fascinating to visualize the various functions and their interactions. While interpreting this data, I discovered a long-standing performance issue when using threads alongside EventMachine. The top resource consumption originated from context switching, leading us to rectify some bugs and enhance performance.
00:19:02.399 Taking this one step further, I ran the profiler in production. This tool uses a kernel timer to profile applications and can be run in a live environment, offering a granular observation of processes. In a high-traffic scenario, I gained thousands of samples, revealing areas for performance improvement throughout our Ruby applications.
00:19:45.039 During this inspection, I discovered that many of my slow functions were rooted in other libraries and parts of the code I did not directly use. In particular, the 'date.parse' method relied on regex-heavy code, which pulled significant amounts of data from databases, leading to inefficiencies. By diagnosing this issue and optimizing code directly, I realized profound gains in performance.
00:20:29.600 After some patches to the Ruby VM and Perf Tools, I generated graphs from Ruby functions, deepening my understanding of application performance. Further examinations made it clear that various libraries exhibited different performance traits: Rails function calls vastly outperformed Merb and Sinatra, prompting important reflections on usage patterns and optimizations.
00:21:08.080 To illustrate this more clearly, I tested the Redis Ruby gem against its Python counterpart. Initially, it had performance issues, as it was relying on timers for socket reads. Removing those timers equalized performance and allowed it to match the speed of equivalent libraries in other languages. Now it's performing normally, adding to the overall efficiency of the application.
00:21:36.960 So, I encourage you to check out the GitHub repository I have for this performance profiling tool. It integrates well with your existing Ruby applications and comes with installation instructions. Additionally, the structure of your Ruby gems can always benefit from performance tuning and analysis.
00:22:30.280 To improve your Ruby gems, upgrade to the latest Ruby Gems version. You’ll gain better warnings and error messages, making it easier to refine your packaging. Additionally, new features include a more aesthetic presentation of gem descriptions. Always ensure you fill all necessary fields correctly and make use of helpful guidelines to improve the overall quality of your gem packaging.
00:23:10.800 It's essential to provide accurate documentation with contact information to facilitate communication regarding bugs and updates. As a resource, I encourage everyone to look into the documentation available within the Ruby community to enhance your projects. I have compiled a hall of shame for packaging mistakes, highlighting how crucial it is to maintain a standard while developing your gems.
00:23:52.880 One particular example criticized the brevity or vague nature of gem descriptions. If you're hoping consumers will select your gem based on information provided, make a positive impact by offering clear and comprehensive descriptions. Today’s developments with Ruby Gems and related tools aim to streamline gem packaging—so take advantage of what’s available.
00:24:34.960 To conclude, my session today will focus on my active record replacement called JIT Observer, designed to optimize observers in Ruby on Rails applications by only instantiating when necessary. By leveraging deferred loading techniques, we can achieve significant improvements over standard Observer patterns. Hence, JIT Observer can streamline how Rails applications manage observer functionality and performance efficiently. Thank you very much!
00:25:14.720 I'm excited about contributing to both existing applications and the Ruby community as a whole, and I'm looking for collaborators on this project. If you’d like to collaborate or help test it, you can find the repository on GitHub! Thank you all for your time!