00:00:06.359
The video equipment rental cost was paid for by PeepCode.
00:00:18.760
Today, I want to talk about some different Ruby libraries that are kind of underappreciated. I will show you what we can create using them.
00:00:25.160
I'm back at LoneStarRuby, which I'm really excited about! For those who don't know me, I'm James Edward Gray II. I created the Ruby Quiz originally and ran it for three years. Now, it's run by Matthew Moss, so I don't manage that anymore. However, I did start it.
00:00:38.520
I've released a few open-source libraries, including FasterCSV, which was mentioned earlier. I've also written a couple of Pragmatic Programmer books that contain quite a bit of Ruby content. I've had the privilege of speaking at every LoneStar Ruby Conference so far. How cool is that? You really don't realize what a miracle that is. I came down here last year and talked for one hour about the TV show 'Heroes', and they invited me back this year. That was pretty cool!
00:01:15.479
We can’t talk about that show this time because, well, season two wasn't great, right? So, I thought we could focus on some other interesting topics instead. What do you think? Let's go with that.
00:01:39.240
Of course, we might also touch upon Battlestar Galactica. For those who don’t know, the new show has created quite a buzz. Raise your hand if you haven't seen Battlestar Galactica. Oh my God! Alright, you folks have homework to do! In fact, you may have to skip the next talk. Now, in Battlestar Galactica, humans created the Cylons, who eventually grew tired of the human race and separated from them. At the beginning of the new series, the Cylons return to wipe out the human race.
00:02:19.640
This show has to be one of the most depressing on TV. It starts at the end of the world and goes downhill from there. So, why am I discussing Battlestar Galactica at a Ruby conference? Well, the ship the characters are on was supposed to be decommissioned; it was essentially their ancient junk. Yet, it becomes the only thing keeping the humans alive— their last holdout. Those of you who know me know that I love those kinds of odds. I love the underdogs!
00:02:41.800
But what does that have to do with Ruby? Well, you might have heard a rumor going around that Ruby is slow. I've heard a few people mention this and I want to share my opinion on that rumor: it's total nonsense! Some of you may remember the slides from my talk last year, which I thought was cool. Anyway, Ruby goes as fast as we want it to because we're in charge of that!
00:03:13.280
Some people say you can speed up Ruby by switching to C, but you can use C from inside Ruby, so we don’t even need to switch. However, you don’t really need to do that very often. You can use other libraries, many of which are written in C, which gives us that speed without having to switch languages entirely. If the libraries are specific to your problem, they can really speed things up!
00:03:54.480
The big win, in my opinion, always lies in adding better data structures and restructuring the problem to better fit what you're trying to do. So, we're going to focus mainly on this aspect, but also look at using libraries and processing power to speed Ruby up.
00:04:19.959
Let's start by discussing NArray. The 'N' stands for numerical and it's fantastic for numerical computations. The Cylons have a huge advantage over the humans—it's not about girls, it’s about numbers! That's right, just numbers! We'll get to the girls part later.
00:04:42.360
Most of the human race was wiped out in the initial attack, and if the Cylons die, they simply download into another body, so they continue coming back. This gives them a significant numerical advantage, so sometimes you just need to crunch numbers, and we all know that.
00:05:11.440
The numbers in Ruby were built for ease of use, which is great and we appreciate that, but unfortunately, it does make them a bit slower because they have to handle things like automatic conversions between Fixnum and Bignum or coercing different types. So, there's more processing to do, and as such, they can be slightly slower.
00:05:37.600
In contrast, C's numbers were built for speed; they can only go so big, but they operate much quicker. We can actually borrow C's numerical functionality inside Ruby using NArray. That's what it's designed for— to enable very fast numerical computations.
00:06:08.720
Let me provide an example. I have this PPM image library that is very simple. If you're unfamiliar, the PPM image format is easy to learn in about 30 seconds. It's essentially just a big grid of numbers and you can generate images from it.
00:06:38.160
This little wrapper I created is under 100 lines of code. I noticed it was taking about 1.3 seconds to generate a 400x200 pixel image. So, I decided to see if I could speed that up. I switched from using a two-dimensional array of color objects, which defined the image data, to a three-dimensional NArray.
00:07:04.599
I only had to change about ten lines of code; it wasn't complicated at all. Now, the same image generates in about one-hundredth of a second. That's a significant speed difference!
00:07:36.600
To change it, we start with the initial code, focusing on the last line where we're creating a two-dimensional array defined by width and height. In this array, I stored color objects in their respective positions.
00:08:01.840
When switching to using NArray, we require 'RubyGems' and 'NArray'. You can install NArray as a gem using 'gem install narray'. Then, the last line changes to this: we now define a three-dimensional NArray. The first dimension is the width, the second dimension is the height, and the third dimension accounts for the three RGB color values.
00:08:25.519
Next, we determine how we mark pixels. Initially, I marked pixels in row-major order by using the Y and then the X position. The updated marking in the NArray version now involves slicing into the NArray and using all dimensions together to grab all three byte values corresponding to the RGB colors.
00:08:56.639
In the drawing code, I used to iterate over the rows of the image, stringify the contents, and print them out. In the new code, iterating is best done by slicing the array to optimize the process. I can get the entire width with just one row of colors, then transpose it to match the desired layout.
00:09:18.880
NArray also has many other useful features. You saw its application where I used bytes; it can also work with two-byte integers, four-byte integers, and floats, even complex numbers. NArray offers data generation, arithmetic operations, comparisons, bitwise manipulations, and statistical functions.
00:09:37.760
There's a large API available that you can view online. The easiest way to learn how to use NArray is simply by opening IRB and creating a small NArray, perhaps a 3x3 array, and experimenting. Try adding one to it and observe what happens.
00:09:57.680
Another example, which I found in the online NArray examples, is an implementation of Conway's Game of Life. While I will simplify this demonstration, it shows how you can use NArray effectively. The code I show creates a 5x5 grid, taking the center three pixels and generating random ones and zeros for the simulation.
00:11:17.760
This code does all the necessary counting for the Game of Life by continuously slicing the NArray to get each neighboring cell. You simply add those counts to check how many neighbors each cell has.
00:11:42.160
The actual implementation of one step of the Game of Life with NArray utilizes built-in comparisons and operations to determine cell status. For instance, if a cell has three neighbors, it will be reborn, and if a neighbor was present, it survives. The operation efficiently updates the state of the entire grid.
00:12:00.240
Now, let's discuss another library that can also help improve Ruby's performance: SQLite. This is an underutilized library. While some of us use it within Rails, we often do so behind an ORM, which is a missed opportunity to explore its full capabilities.
00:12:37.600
SQLite has already solved numerous data storage issues. Using it would place you one step ahead of your data problems. The convenience of having an entire language at your disposal to express your data needs is invaluable, especially when your requirements are not clear from the outset.
00:14:10.360
Let’s review a real example. In the old Ruby Quiz, there was the IP to Country problem. The goal was to return the country associated with a given IP address by utilizing an online CSV database. Despite being a quiz, this task was something I often dealt with at my job. We perform site analysis on our website visitors, and many blogs employ similar methods.
00:14:44.640
The database essentially consists of ranges that include the beginning and end IP addresses. The large file is over 6 megabytes in size. Many participants solved this Ruby Quiz with binary search on the file, which is fast, of course.
00:15:20.360
Several of them pre-processed the file to simplify that search, and many applied smart techniques. However, I wanted to see how SQLite would handle this problem, and I was surprised by how easy it was to make SQLite match the speed of those binary solutions.
00:15:57.760
The first query I tried with SQLite was about a third of a second to find a country for an IP address. This was achieved without creating any indexes, which pleasantly astonished me. It was straightforward SQL, and I was able to utilize full country names rather than just codes.
00:16:19.680
Now, I will take you through how I set up the database. All that is required is the name of your desired file; SQLite databases reside in a single file. I use a statement to create a table with fields for low IP, high IP, and the country name.
00:17:08.880
The code used to actually load the data follows this structure. Using the open URI library alongside gzip reader, I am able to parse the CSV file directly from the internet without saving it locally. I use FasterCSV to parse the CSV data and pull the relevant pieces.
00:17:54.640
Here, you can see the hypothetical INSERT operation into the SQLite table. I've also included positional values for placeholders in my SQL statements. By using question marks, I can pass the arguments for those placeholders in the execute method.
00:18:35.679
While querying, the execution time remained around a third of a second. SQLite offers a handy helper method called 'get_first_value', which allows you to retrieve the first value from the first row in a query.
00:19:16.040
You can also use named parameters in SQLite, further enhancing usability. Many people are unaware of the exceptional qualities of SQLite such as it being completely free, with developers relinquishing copyright to place their code in the public domain.
00:19:56.040
SQLite allows you to obtain query results as hashes, working with names instead of index positions. Additionally, it can convert SQL types to Ruby types, and you can even work with in-memory databases, which can be fast.
00:20:32.040
SQLite allows for cross-database queries, enabling you to join separate database files and perform actions across them. You can also define SQL functions in pure Ruby, extending its functionality for your specific needs.
00:21:20.240
For instance, suppose I had a user database and attached the IP to country database to it. I could utilize this to look up a user's IP address within that user-managed database. To circumvent the complexity of converting an IP to an integer using SQL, I implemented a straightforward IP to int function in Ruby, facilitating the conversion for SQL statements.
00:22:45.040
Compiling the first row results as my answer was effortless, showcasing the vast potential that SQLite offers, which can sidestep known solutions like binary search.
00:23:27.520
Moving forward, let’s discuss RBTree. For those seeking efficient binary searching, I recommend utilizing RBTree, which is built to be a red-black tree. It's more efficient than an AVL tree, offering more speed and performance, specifically in C.
00:24:25.040
Using RBTree for IP to country matching, we can achieve search times that drop below one-thousandth of a second! This is commendable when considering Ruby's inherent speed limitations. The next time someone tells you that Ruby is slow, you have my permission to react with disbelief.
00:25:10.000
RBTree can easily be marshaled just like other Ruby objects, enabling us to output it for persistent databases. When performing searches using bound methods like upper_bound, lower_bound, or bound, you can obtain entries that either meet or are as close to a specified bound as possible.
00:26:07.840
Now let’s move on to the next library, FSDB, which stands for File System Database. Unfortunately, it's not a gem and cannot be installed as one, but it's available on SourceForge. This flexible and adaptive database management system allows you to manage your data similarly to how you would manage a hash.
00:27:03.440
In the software field, it's crucial to remain adaptable to the changing needs of your data. FSDB lets you accommodate these requirements effortlessly, whether it's adjusting formats or relationships.
00:27:19.760
Let’s consider a basic example where I will create a time series database. The FSDB setup resembles SQLite quite a bit, except you would define a directory name instead of a file. Each hash key represents a path, allowing you to drill down to particular data without unnecessary data considerations.
00:28:04.840
With FSDB, when storing a series of data, you can create a path ordered by year, month, day, and hour. Each object can be marshaled and stored in this directory format, allowing you to retrieve it as needed.
00:29:18.960
In practical use, invoking the data retrieval process looks just like any standard hash operation, but the benefit is that records are organized hierarchically by path, which simplifies computations. FSDB supports multi-threading, multi-processing and with it, you can ensure the integrity of your data operations using transactions.
00:30:05.760
Let me show you a few transaction examples. In FSDB, using the 'browse' method initiates a shared transaction for data reading while maintaining an exclusive lock for write transactions. You can reinsert or replace data per your needs.
00:30:57.760
I want to share about Renda, which offers a simple solution for inter-process communication in Ruby. Unlike other libraries, Renda does not require any installation because it comes with every copy of Ruby. Just like the Cylons, the power of networking becomes crucial to higher efficiency. Renda allows seamless communication between multiple running processes, making data sharing easy.
00:31:56.800
Suppose we want to process a scrambled word and compare it against a dictionary to find potential matches. This example is representative of many tasks requiring processing. The ‘signature’ method can be used to create a unique identifier for each word to aid in the comparison of characters.
00:32:36.960
When dividing the work amongst multiple processes, Renda's tuple space concept allows for easy data messaging, pulling in comparisons for distributed efforts. It shows how you can write messages that any process can retrieve, which is particularly beneficial for I/O heavy tasks.
00:33:11.360
Within your workers, you simply send off the results to the tuple space, which can be accessed continuously from any process. Each result can match a pattern identifiable by regular expressions simplifying the data retrieval.
00:33:55.760
Renda is equipped with functional capabilities beyond just basic messaging. You can set expiration times on messages, ensuring they remain relevant only for a specific period. Additionally, it embodies zero-configuration networking abilities for seamless device connections.
00:34:55.760
In conclusion, I've shared numerous ways to enhance Ruby's speed and efficiency through various libraries and methodologies. If you want to maintain a competitive edge with Ruby, these tools are invaluable. Renda is just one of those exciting options, inviting developers to explore their process optimization abilities.
00:36:08.480
Now, I’d be more than happy to take your questions if anyone has any! Feel free to ask anything regarding the topics we’ve discussed or any other Ruby-related queries.
00:37:18.480
One more thing: for those asking about the URL for my slides, I haven't uploaded them online yet, but I will work on getting them up soon. We plan to provide a server for the conference slides!
00:37:47.520
The video equipment rental cost was paid for by PeepCode.