00:00:20.960
Hello, everyone. Thank you for coming. I'm going to talk about how to play nice with others. This presentation mainly focuses on tools that you can use in mixed language environments.
00:00:26.880
My name is Jeremy Hinegardner. You can find me on Twitter at @copiousfreetime or at my websites jeremyhinegardner.org or copiousfreetime.org. I work for a company called Collective Intellect.
00:00:39.360
While that part isn’t too important, the fun part is the various technologies we use in the production of the products we create. Last year, I gave a talk about building a Ruby infrastructure. How many of you were here last year? Did anyone attend my talk?
00:00:52.239
Great! Since then, our Ruby infrastructure has grown quite a bit. We have added several new systems to our total infrastructure, including Java services, a couple of C++ libraries, a Groovy application, and around 20 micro Rails apps, along with some Sinatra apps and a wide range of gems.
00:01:10.240
In this multi-language environment, we need technologies that can interact harmoniously with one another. It’s essential for Java applications to use some of the same resources as Ruby applications. Similarly, Groovy applications and C++ libraries need to communicate effectively.
00:01:24.000
So, what tools can we use, aside from the ever-popular relational database, which sometimes may not be the best option for the task?
00:01:30.400
To kick things off, let’s do a little survey. Raise your hand if you have a favorite programming language. Keep those hands up! It’s great to see so many diverse languages represented here.
00:01:47.600
Now, I’ll call out some languages, and if I mention yours, put your hand down: Ruby, Smalltalk, Java, C#, C++, Lisp, Fortran, PHP, JavaScript, Perl. We’ve got a lot of languages in the room today. Ruby is obviously a favorite at a Ruby conference, but it’s fantastic to see others as well.
00:02:22.080
All these languages need to communicate with each other somehow. You might have a program written in Fortran that needs to interact with another program written in Java, Ruby, or Smalltalk. The question is, how do they exchange information?
00:02:34.480
The simplest method might be through plain files, but I started looking into commonalities between these various coding languages. What can we learn about how programming languages can communicate?
00:02:47.920
How many people here have a computer science degree or a background in computer science? What are some of the things you learned in your studies that may not necessarily relate directly to any specific language?
00:03:11.519
For example, big O notation, computational complexity, and in-depth knowledge about data structures. Who has that big white book with the blue sweep on it, authored by Rivest and others? It covers all these essential concepts about data structures.
00:03:45.360
Data structures are crucial because every language has an implementation of various data structures. In Ruby, for example, we have integers, floats, rationals, and even imaginary types, which can all be considered data structures.
00:04:11.760
In addition to data structures, another commonality across programming languages is the method of communication between them.
00:04:27.440
So, let’s do a quick survey. What does everyone currently use to communicate data structures between different applications?
00:04:33.600
Options include SOAP, Sagan, CORBA, JSON, HTTP, delimited files, Marshall, YAML, and others. The interesting challenge is ensuring these tools can effectively communicate across different languages.
00:04:58.560
I break down communications into two realms: network-based communication, which covers the exchange of data between different physical machines, and local communication, which refers to interactions within the same system without relying on a network API.
00:05:14.639
Additionally, we need to consider the aspect of persistence. I define persistence roughly in three ways: none, where there’s no persistence at all; snapshot persistence, which involves taking a snapshot of the data structure and saving it to a medium like a disk; and lifetime persistence, where data remains useful for a certain duration.
00:05:40.800
Let’s see if anyone has been paying attention. Using persistence, communication, and data structures, can you guess what tools I'm describing? The first tool has network communication, no persistence, and utilizes a hash data structure.
00:06:06.240
Can anyone take a guess? Yes, it’s Memcache! It has network communication, no persistence, and relies on a hash data structure.
00:06:14.639
The next tool has network communication, lifetime persistence, and operates using a struct data structure.
00:06:28.639
Any ideas? Yes, it’s any database you prefer. We’re categorizing tools using the taxonomy of communication, persistence, and data structure.
00:06:39.680
To be considered a cross-language tool for communicating these types of data structures, it should ideally support at least three languages.
00:06:53.000
Let’s have a quick show of hands. How many here are working on a project that involves more than one programming language? Great! More than two? Fantastic. And even more? That’s awesome!
00:07:44.400
Through my experience, the average number of languages in a project tends to be around three. Even in a standard Ruby on Rails project, you’re likely to encounter Ruby, JavaScript, and SQL for database interactions.
00:08:02.640
Now, I would like to talk about a few different tools that I enjoy working with and how they fit into this mixed-language context.
00:08:20.800
First off is Tokyo Cabinet. Who's familiar with these products? Not many? Interesting! Over the past year, I've noticed a surge of simple tools that offer versatile features across many different languages.
00:08:28.800
Tokyo Cabinet and its other products have been prominent in this context. In fact, I think there were a few talks on Tokyo products at RubyKaigi and the recent conference in Toronto.
00:08:45.760
I'm currently using Tokyo Cabinet in production for Tyrant. In terms of data structures, Tokyo Cabinet supports arrays, hashes, and structs. It has several file formats: hash, B3 table, and an array.
00:09:01.440
In terms of communication, it is local, meaning it acts as a straight library. For persistence, it offers lifetime storage, allowing access by another process.
00:09:13.440
Tokyo Cabinet ships with bindings for a variety of languages such as C, Perl, Ruby, Java, Lua, and Python.
00:09:23.840
Next, we've got Tokyo Tyrant, which converts any Tokyo Cabinet database into a network server. It supports the same data structures as Tokyo Cabinet.
00:09:30.560
Tokyo Tyrant offers several cool features, including compression for key-value stores, with automatic compression and decompression using zlib.
00:09:48.960
Another valuable feature is that it fully understands the Memcache D protocol. This allows you to persist your Memcache data to disk easily.
00:10:05.920
Additionally, Tokyo Tyrant has a full RESTful API that makes it simple to interact with, allowing for GET and PUT requests to store and retrieve values.
00:10:17.920
Moreover, it has Lua extensions for executing various functions as required and offers replication options such as master-master and master-slave setups.
00:10:32.960
Now, let’s move on to Redis. Has anyone heard of Redis? More familiar faces! Great!
00:10:39.680
Redis is a data structure server that can hold different types of data structures. It provides lists, hashes, sets, and standard key-value pairs.
00:10:56.320
The outstanding feature of Redis is its ability to process list and set data structures. It operates on a network level with its own protocol and supports asynchronous snapshot persistence.
00:11:16.640
With Redis, data is saved in the background, but if your server dies unexpectedly, you may lose data between the last save and the time of the crash.
00:11:31.680
Redis supports various languages including Ruby, Python, PHP, Erlang, Tcl, Perl, Lua, and Java, making it quite versatile.
00:11:47.920
Furthermore, Redis provides replication capabilities, allowing for both master-master and master-slave setups, which facilitate data streaming effectively.
00:12:11.840
The fun part is relation to in-server set operations. For example, you can easily find intersections between sets stored within Redis.
00:12:27.680
Next, let's touch on Libjlog. How many of you have heard of it? It’s an excellent tool that acts as a library for publish-subscribe messaging between processes.
00:12:41.920
While I haven’t yet used it in production, I find its concept of enabling communication between two processes to be very intriguing.
00:12:57.920
Currently, it has support for C, Perl, and PHP, and I'm working on integrating it into Ruby as well.
00:13:17.920
After that, we have Beanstalkd. Have you heard of it? Many of you, excellent! This is another library that I really appreciate.
00:13:29.040
Beanstalk uses a simple job queue structure and does not offer persistence yet; however, in the next minor version, it is set to include persistence.
00:13:45.760
With Beanstalk, you can push jobs onto a queue and process them with multiple workers. It’s straightforward; once a job is reserved, it’s the only one that can work on it.
00:14:05.920
Lastly, let's touch upon ZeroMQ, which has the potential for being a plumbing component for any message system you wish to develop.
00:14:20.000
ZeroMQ provides high throughput messaging, and its latest version supports lifetime persistence based on the usefulness of messages.
00:14:36.159
It’s a very flexible messaging library that allows you to implement your own messaging models as you see fit.
00:14:52.080
Now, let’s move on to MongoDB, which is a network database that has been gaining traction in the field.
00:15:06.160
There’s been a historic shift toward NoSQL technologies, allowing for various flexible data structures.
00:15:29.680
Additionally, there are newer technologies like Flare, which starts becoming sharded and scales automatically.
00:15:46.080
Other interesting technologies include Cassandra and CouchDB—both of which have made significant impacts as well. Their ability to handle large-scale data processing has seen immense growth in the last couple of years.
00:16:00.480
And we cannot forget about Solr. Solr represents a powerful search platform that effortlessly integrates with various programming languages.
00:16:16.240
Please share any of your favorite tools that I haven't mentioned. I'm starting to compile a growing list of technologies, and I'd love your input.
00:16:38.720
Let's transition into some demos. I’ll start with Tokyo Cabinet, using sample data from the US Census for a variety of names.
00:17:01.520
The dataset includes first names and last names, which I’ll read in to illustrate the functionalities.
00:17:18.080
Tokyo Cabinet is often overlooked for its table file format, a simple key-value store with the values as hashes.
00:17:33.680
I will demonstrate how creating an index on the name field allows for super-fast lookups to return names in a matter of seconds.
00:17:52.880
Overall, we managed to insert a multitude of names and observed how quickly the process can be done.
00:18:16.720
Next, let’s execute the same demo using Tokyo Tyrant, which operates as a network server instead.
00:18:29.760
So, let’s do that now.
00:18:37.600
In the case of Tokyo Tyrant, there’s a slight difference in speed, but it’s a very efficient option overall.
00:18:51.680
We will also explore using Lua inside Tokyo Tyrant for some advanced functionality.
00:19:06.720
Next up is Redis. I’ll be running through a quick demo that showcases storing male and female first names with Redis to find sets and intersections.
00:19:22.080
This should also demonstrate how Redis provides enhancement in server-side operations.
00:19:34.240
Lastly, I'm excited to conclude with Beanstalkd, which makes job processing in queues exceedingly simple.
00:19:57.360
I think we have time left for one more, or would anyone prefer to wrap up? Thank you all for your attention and participation.
00:20:19.360
Are there any final questions? It was great discussing this with you all!
00:20:34.160
Thank you!