00:00:11.200
Hello, everyone. My name is Sarah Mei, and I'm a Ruby developer at Pivotal Labs in San Francisco. We do a lot of Rails development, and today, we're going to see some Rails code in these slides.
00:00:15.360
I'm here to talk to you about Ruby APIs for NoSQL. I also like to call this talk 'Polyglot Persistence.' We'll discuss how to store data in Ruby when you're writing a system that uses more than just a relational database.
00:00:27.119
But I'd like to start out with a little bit of audience participation. Let's see a show of hands: who has written an application in Ruby that uses a relational database? That's great to see!
00:00:41.280
Now, who has written a Ruby app that uses a relational database along with some kind of alternative data store? For example, Memcache or a file system? And how about those who have written a Ruby app that uses only non-relational storage? Wow! That's a lot more than I thought.
00:00:58.960
It's really awesome to see that NoSQL is becoming more popular. When I started asking this question six months ago, I would get just one or two hands in the room. Now, we're looking at a much more significant number, which shows how far we've come.
00:01:33.680
So, I want to start off by showing you a diagram you’ve probably seen a hundred times before. This is a very vanilla Rails application where requests come in, go through the routes to the controller, which interacts with the models. The models talk to the database through ActiveRecord, retrieve data from SQL, and then formulate a response back through views.
00:02:15.760
A friend of mine pointed out that if you show a diagram like this, you're legally obligated to have a little diagram of a cloud somewhere. So, I added one! But let me emphasize that it's unlikely anyone in this room has written an application with real users where the system diagram looks like that.
00:02:30.960
In particular, I would be surprised if anyone here has written one where MySQL is the only means by which data is persisted. Usually, you don't set out to create a polyglot persistence system; it just kind of happens over time as you introduce more complexity.
00:03:07.920
You start with your nice little Rails app, and then the product owner suggests, for example, that we need to incorporate free-text search. Next, you may add Solr. Soon after, you might decide to add S3 for file storage, and before you know it, you've got multiple data stores, perhaps even Redis for caching.
00:03:36.880
By that point, you could find yourself with four or five different data stores, and you haven't even tackled anything overly complicated. In fact, I would argue that this setup is actually more common than the base Rails application that I initially presented.
00:04:10.960
In recent years, we've recognized that most applications deal with data that doesn't fit neatly into relational persistence. The reality is that relationships can become quite complex when modeled in SQL, and large blocks of text often need to be searched semantically instead.
00:04:51.840
In fact, many of the alternative data stores we now utilize—like Cassandra and Redis—were once considered unconventional. Until recently, many developers didn't even think of options like Memcached or S3 as legitimate data stores.
00:05:36.400
However, the NoSQL movement has reshaped our understanding of what constitutes a data store. In the 90s, you might have had someone persist data to XML text files, which really exemplifies early NoSQL thinking.
00:06:41.920
But this presents a challenge: how do we encapsulate this information into a cohesive class when the data model is scattered across different storage systems? That's one question to consider.
00:07:13.600
Furthermore, as applications grow, there’s the need to replace the primary data store with something else, which has historically not been easy in Rails. So, how do we unify a model that encompasses data from multiple stores while maintaining functionality?
00:07:56.720
To sum up this section: the integration of multiple databases is a reality in building even simple applications. This concept is referred to as 'polyglot persistence,' and it's not a new idea.
00:08:01.280
I think it's important for Ruby developers to broaden their perspective beyond just MySQL and PostgreSQL. In actuality, applications will likely utilize an assortment of technologies, whether planning for it or not. For instance, I decided my simple application would focus on a 'cephalopod social network,' mainly because I like the word cephalopod.
00:09:36.640
To frame our development process, we start with a basic 'Squid' class, inheriting from ActiveRecord base for all the familiar relational functionalities. However, as we expand to include free text search, we integrate Solr using the sunspot gem to create a searchable block that outlines indexed attributes.
00:10:45.600
Because it’s a web application, we must incorporate a friend graph. Thus, we decide to use Redis to store our denormalized list of friends. This might not be the first solution someone proposes, especially when you consider the overhead of multiple joins.
00:11:38.080
In essence, we create a method that utilizes the Redis gem to add users to a Redis key space. Then, since this social network centers around the cephalopods' literary creations, we let users upload their novels to S3.
00:12:55.440
Now, this Squid class is becoming increasingly complicated with multiple responsibilities, handling both ActiveRecord persistence along with redis and s3 functionalities. Ideally, we would benefit from a uniform interface across all these data stores.
00:14:02.880
At first, I thought integrating everything through ActiveRecord might be the answer, as many Rails developers associate ActiveRecord directly with models. However, it quickly became apparent that ActiveRecord is designed specifically for relational databases.
00:15:24.000
The introduction of ActiveModel in Rails 3 is a significant shift. It separated validations and lifecycle callbacks from the persistence layer, allowing for greater flexibility. This separation encourages a clean architecture where models can communicate using a consistent API while being decoupled from ActiveRecord itself.
00:16:51.600
However, there are still challenges with having multiple data stores within a single model. The question remains, how do we encapsulate a cohesive model when each piece of data is handled by a different mechanism?
00:17:36.800
In conclusion, we are beginning to see how applications can leverage both SQL and NoSQL databases effectively. If you do more than just experiment with these technologies, you'll notice they can coexist within your application architecture.
00:18:29.440
If you have more than 100 users, chances are, you'll end up using an assortment of data stores. The key is to find ways for them to interact harmoniously, and to do that best, I recommend upgrading to Rails 3.
00:19:37.760
I'm happy to take any questions you may have!
00:19:58.800
Great question on testing with multiple data stores. I usually break the functionality down into modules that I can mix in, then create shared examples that can be included in each test for the model using that module.
00:20:57.280
I've seen some issues with libraries trying to adapt to a relational mindset while being fundamentally different; it's important to embrace the unique aspects of each data store.
00:21:38.640
With NoSQL databases, managing consistency can be tricky, as many of them do not implement two-phase commit. Ensuring consistency often requires additional coding work and accepting some limitations.
00:22:32.160
If you're working with managed third-party services for NoSQL tools, I've primarily rolled my own solutions but have heard good things about those services.
00:22:58.480
I appreciate your input on using DataMapper with Rails. While it's primarily designed for relational data sources, employing it flexibly can yield fruitful results if you're looking to leverage older versions of Rails.
00:24:01.760
Thank you all for your attention, and I hope you feel more informed about using Ruby APIs with NoSQL. Let's continue to explore how best to implement these strategies in our applications.