Data Structures

Summarized using AI

Hidden Gems

James Edward Gray II • September 04, 2008 • Earth

In this talk titled 'Hidden Gems', James Edward Gray II presents a collection of underappreciated Ruby libraries during the LoneStarRuby Conf 2008. Gray starts by recapping his background in the Ruby community and his enthusiasm for discussing alternative subjects compared to previous years, such as the TV show 'Battlestar Galactica'. He draws parallels between the show's themes of underdogs and performance optimization in Ruby programming.

Key points discussed include:

  • Ruby's perceived performance issues, countered by emphasizing the potential for optimization using various libraries and better data structures.
  • Introduction of NArray, designed for numerical computations. Gray shares an example where he improved image generation speed significantly by switching from traditional arrays to NArray, demonstrating a performance boost from 1.3 seconds to about 0.01 seconds for generating a 400x200 pixel image.
  • Utilization of SQLite for data management. Gray discusses an example from the Ruby Quiz related to IP-to-country mapping, where using SQLite allowed efficient country retrieval for an IP address in about a third of a second.
  • Discussion on RBTree, which offers efficient binary searching capabilities and can help match IP addresses quickly, achieving search times below 1 millisecond.
  • Introduction of FSDB (File System Database), which allows hierarchical data management and supports multi-threading for integrity in data operations.
  • Highlighting Renda for inter-process communication, allowing different processes to work efficiently together and share data with ease using tuple spaces. Gray presents a practical example of processing scrambled words to find matches in a dictionary.

In conclusion, Gray encourages developers to experiment with these libraries to improve Ruby's performance and maintain competitive advantages. He concludes with a Q&A session, expressing willingness to answer any additional Ruby-related questions attendees may have. The talk emphasizes that there are numerous tools and techniques available to optimize Ruby applications effectively.

Hidden Gems
James Edward Gray II • September 04, 2008 • Earth

hidden gems 960x368 by: James Edward Gray II

Help us caption & translate this video!

http://amara.org/v/G1XH/

LoneStarRuby Conf 2008

00:00:06.359 The video equipment rental cost was paid for by PeepCode.
00:00:18.760 Today, I want to talk about some different Ruby libraries that are kind of underappreciated. I will show you what we can create using them.
00:00:25.160 I'm back at LoneStarRuby, which I'm really excited about! For those who don't know me, I'm James Edward Gray II. I created the Ruby Quiz originally and ran it for three years. Now, it's run by Matthew Moss, so I don't manage that anymore. However, I did start it.
00:00:38.520 I've released a few open-source libraries, including FasterCSV, which was mentioned earlier. I've also written a couple of Pragmatic Programmer books that contain quite a bit of Ruby content. I've had the privilege of speaking at every LoneStar Ruby Conference so far. How cool is that? You really don't realize what a miracle that is. I came down here last year and talked for one hour about the TV show 'Heroes', and they invited me back this year. That was pretty cool!
00:01:15.479 We can’t talk about that show this time because, well, season two wasn't great, right? So, I thought we could focus on some other interesting topics instead. What do you think? Let's go with that.
00:01:39.240 Of course, we might also touch upon Battlestar Galactica. For those who don’t know, the new show has created quite a buzz. Raise your hand if you haven't seen Battlestar Galactica. Oh my God! Alright, you folks have homework to do! In fact, you may have to skip the next talk. Now, in Battlestar Galactica, humans created the Cylons, who eventually grew tired of the human race and separated from them. At the beginning of the new series, the Cylons return to wipe out the human race.
00:02:19.640 This show has to be one of the most depressing on TV. It starts at the end of the world and goes downhill from there. So, why am I discussing Battlestar Galactica at a Ruby conference? Well, the ship the characters are on was supposed to be decommissioned; it was essentially their ancient junk. Yet, it becomes the only thing keeping the humans alive— their last holdout. Those of you who know me know that I love those kinds of odds. I love the underdogs!
00:02:41.800 But what does that have to do with Ruby? Well, you might have heard a rumor going around that Ruby is slow. I've heard a few people mention this and I want to share my opinion on that rumor: it's total nonsense! Some of you may remember the slides from my talk last year, which I thought was cool. Anyway, Ruby goes as fast as we want it to because we're in charge of that!
00:03:13.280 Some people say you can speed up Ruby by switching to C, but you can use C from inside Ruby, so we don’t even need to switch. However, you don’t really need to do that very often. You can use other libraries, many of which are written in C, which gives us that speed without having to switch languages entirely. If the libraries are specific to your problem, they can really speed things up!
00:03:54.480 The big win, in my opinion, always lies in adding better data structures and restructuring the problem to better fit what you're trying to do. So, we're going to focus mainly on this aspect, but also look at using libraries and processing power to speed Ruby up.
00:04:19.959 Let's start by discussing NArray. The 'N' stands for numerical and it's fantastic for numerical computations. The Cylons have a huge advantage over the humans—it's not about girls, it’s about numbers! That's right, just numbers! We'll get to the girls part later.
00:04:42.360 Most of the human race was wiped out in the initial attack, and if the Cylons die, they simply download into another body, so they continue coming back. This gives them a significant numerical advantage, so sometimes you just need to crunch numbers, and we all know that.
00:05:11.440 The numbers in Ruby were built for ease of use, which is great and we appreciate that, but unfortunately, it does make them a bit slower because they have to handle things like automatic conversions between Fixnum and Bignum or coercing different types. So, there's more processing to do, and as such, they can be slightly slower.
00:05:37.600 In contrast, C's numbers were built for speed; they can only go so big, but they operate much quicker. We can actually borrow C's numerical functionality inside Ruby using NArray. That's what it's designed for— to enable very fast numerical computations.
00:06:08.720 Let me provide an example. I have this PPM image library that is very simple. If you're unfamiliar, the PPM image format is easy to learn in about 30 seconds. It's essentially just a big grid of numbers and you can generate images from it.
00:06:38.160 This little wrapper I created is under 100 lines of code. I noticed it was taking about 1.3 seconds to generate a 400x200 pixel image. So, I decided to see if I could speed that up. I switched from using a two-dimensional array of color objects, which defined the image data, to a three-dimensional NArray.
00:07:04.599 I only had to change about ten lines of code; it wasn't complicated at all. Now, the same image generates in about one-hundredth of a second. That's a significant speed difference!
00:07:36.600 To change it, we start with the initial code, focusing on the last line where we're creating a two-dimensional array defined by width and height. In this array, I stored color objects in their respective positions.
00:08:01.840 When switching to using NArray, we require 'RubyGems' and 'NArray'. You can install NArray as a gem using 'gem install narray'. Then, the last line changes to this: we now define a three-dimensional NArray. The first dimension is the width, the second dimension is the height, and the third dimension accounts for the three RGB color values.
00:08:25.519 Next, we determine how we mark pixels. Initially, I marked pixels in row-major order by using the Y and then the X position. The updated marking in the NArray version now involves slicing into the NArray and using all dimensions together to grab all three byte values corresponding to the RGB colors.
00:08:56.639 In the drawing code, I used to iterate over the rows of the image, stringify the contents, and print them out. In the new code, iterating is best done by slicing the array to optimize the process. I can get the entire width with just one row of colors, then transpose it to match the desired layout.
00:09:18.880 NArray also has many other useful features. You saw its application where I used bytes; it can also work with two-byte integers, four-byte integers, and floats, even complex numbers. NArray offers data generation, arithmetic operations, comparisons, bitwise manipulations, and statistical functions.
00:09:37.760 There's a large API available that you can view online. The easiest way to learn how to use NArray is simply by opening IRB and creating a small NArray, perhaps a 3x3 array, and experimenting. Try adding one to it and observe what happens.
00:09:57.680 Another example, which I found in the online NArray examples, is an implementation of Conway's Game of Life. While I will simplify this demonstration, it shows how you can use NArray effectively. The code I show creates a 5x5 grid, taking the center three pixels and generating random ones and zeros for the simulation.
00:11:17.760 This code does all the necessary counting for the Game of Life by continuously slicing the NArray to get each neighboring cell. You simply add those counts to check how many neighbors each cell has.
00:11:42.160 The actual implementation of one step of the Game of Life with NArray utilizes built-in comparisons and operations to determine cell status. For instance, if a cell has three neighbors, it will be reborn, and if a neighbor was present, it survives. The operation efficiently updates the state of the entire grid.
00:12:00.240 Now, let's discuss another library that can also help improve Ruby's performance: SQLite. This is an underutilized library. While some of us use it within Rails, we often do so behind an ORM, which is a missed opportunity to explore its full capabilities.
00:12:37.600 SQLite has already solved numerous data storage issues. Using it would place you one step ahead of your data problems. The convenience of having an entire language at your disposal to express your data needs is invaluable, especially when your requirements are not clear from the outset.
00:14:10.360 Let’s review a real example. In the old Ruby Quiz, there was the IP to Country problem. The goal was to return the country associated with a given IP address by utilizing an online CSV database. Despite being a quiz, this task was something I often dealt with at my job. We perform site analysis on our website visitors, and many blogs employ similar methods.
00:14:44.640 The database essentially consists of ranges that include the beginning and end IP addresses. The large file is over 6 megabytes in size. Many participants solved this Ruby Quiz with binary search on the file, which is fast, of course.
00:15:20.360 Several of them pre-processed the file to simplify that search, and many applied smart techniques. However, I wanted to see how SQLite would handle this problem, and I was surprised by how easy it was to make SQLite match the speed of those binary solutions.
00:15:57.760 The first query I tried with SQLite was about a third of a second to find a country for an IP address. This was achieved without creating any indexes, which pleasantly astonished me. It was straightforward SQL, and I was able to utilize full country names rather than just codes.
00:16:19.680 Now, I will take you through how I set up the database. All that is required is the name of your desired file; SQLite databases reside in a single file. I use a statement to create a table with fields for low IP, high IP, and the country name.
00:17:08.880 The code used to actually load the data follows this structure. Using the open URI library alongside gzip reader, I am able to parse the CSV file directly from the internet without saving it locally. I use FasterCSV to parse the CSV data and pull the relevant pieces.
00:17:54.640 Here, you can see the hypothetical INSERT operation into the SQLite table. I've also included positional values for placeholders in my SQL statements. By using question marks, I can pass the arguments for those placeholders in the execute method.
00:18:35.679 While querying, the execution time remained around a third of a second. SQLite offers a handy helper method called 'get_first_value', which allows you to retrieve the first value from the first row in a query.
00:19:16.040 You can also use named parameters in SQLite, further enhancing usability. Many people are unaware of the exceptional qualities of SQLite such as it being completely free, with developers relinquishing copyright to place their code in the public domain.
00:19:56.040 SQLite allows you to obtain query results as hashes, working with names instead of index positions. Additionally, it can convert SQL types to Ruby types, and you can even work with in-memory databases, which can be fast.
00:20:32.040 SQLite allows for cross-database queries, enabling you to join separate database files and perform actions across them. You can also define SQL functions in pure Ruby, extending its functionality for your specific needs.
00:21:20.240 For instance, suppose I had a user database and attached the IP to country database to it. I could utilize this to look up a user's IP address within that user-managed database. To circumvent the complexity of converting an IP to an integer using SQL, I implemented a straightforward IP to int function in Ruby, facilitating the conversion for SQL statements.
00:22:45.040 Compiling the first row results as my answer was effortless, showcasing the vast potential that SQLite offers, which can sidestep known solutions like binary search.
00:23:27.520 Moving forward, let’s discuss RBTree. For those seeking efficient binary searching, I recommend utilizing RBTree, which is built to be a red-black tree. It's more efficient than an AVL tree, offering more speed and performance, specifically in C.
00:24:25.040 Using RBTree for IP to country matching, we can achieve search times that drop below one-thousandth of a second! This is commendable when considering Ruby's inherent speed limitations. The next time someone tells you that Ruby is slow, you have my permission to react with disbelief.
00:25:10.000 RBTree can easily be marshaled just like other Ruby objects, enabling us to output it for persistent databases. When performing searches using bound methods like upper_bound, lower_bound, or bound, you can obtain entries that either meet or are as close to a specified bound as possible.
00:26:07.840 Now let’s move on to the next library, FSDB, which stands for File System Database. Unfortunately, it's not a gem and cannot be installed as one, but it's available on SourceForge. This flexible and adaptive database management system allows you to manage your data similarly to how you would manage a hash.
00:27:03.440 In the software field, it's crucial to remain adaptable to the changing needs of your data. FSDB lets you accommodate these requirements effortlessly, whether it's adjusting formats or relationships.
00:27:19.760 Let’s consider a basic example where I will create a time series database. The FSDB setup resembles SQLite quite a bit, except you would define a directory name instead of a file. Each hash key represents a path, allowing you to drill down to particular data without unnecessary data considerations.
00:28:04.840 With FSDB, when storing a series of data, you can create a path ordered by year, month, day, and hour. Each object can be marshaled and stored in this directory format, allowing you to retrieve it as needed.
00:29:18.960 In practical use, invoking the data retrieval process looks just like any standard hash operation, but the benefit is that records are organized hierarchically by path, which simplifies computations. FSDB supports multi-threading, multi-processing and with it, you can ensure the integrity of your data operations using transactions.
00:30:05.760 Let me show you a few transaction examples. In FSDB, using the 'browse' method initiates a shared transaction for data reading while maintaining an exclusive lock for write transactions. You can reinsert or replace data per your needs.
00:30:57.760 I want to share about Renda, which offers a simple solution for inter-process communication in Ruby. Unlike other libraries, Renda does not require any installation because it comes with every copy of Ruby. Just like the Cylons, the power of networking becomes crucial to higher efficiency. Renda allows seamless communication between multiple running processes, making data sharing easy.
00:31:56.800 Suppose we want to process a scrambled word and compare it against a dictionary to find potential matches. This example is representative of many tasks requiring processing. The ‘signature’ method can be used to create a unique identifier for each word to aid in the comparison of characters.
00:32:36.960 When dividing the work amongst multiple processes, Renda's tuple space concept allows for easy data messaging, pulling in comparisons for distributed efforts. It shows how you can write messages that any process can retrieve, which is particularly beneficial for I/O heavy tasks.
00:33:11.360 Within your workers, you simply send off the results to the tuple space, which can be accessed continuously from any process. Each result can match a pattern identifiable by regular expressions simplifying the data retrieval.
00:33:55.760 Renda is equipped with functional capabilities beyond just basic messaging. You can set expiration times on messages, ensuring they remain relevant only for a specific period. Additionally, it embodies zero-configuration networking abilities for seamless device connections.
00:34:55.760 In conclusion, I've shared numerous ways to enhance Ruby's speed and efficiency through various libraries and methodologies. If you want to maintain a competitive edge with Ruby, these tools are invaluable. Renda is just one of those exciting options, inviting developers to explore their process optimization abilities.
00:36:08.480 Now, I’d be more than happy to take your questions if anyone has any! Feel free to ask anything regarding the topics we’ve discussed or any other Ruby-related queries.
00:37:18.480 One more thing: for those asking about the URL for my slides, I haven't uploaded them online yet, but I will work on getting them up soon. We plan to provide a server for the conference slides!
00:37:47.520 The video equipment rental cost was paid for by PeepCode.
Explore all talks recorded at LoneStarRuby Conf 2008
+18