Talks

Ruby APIs for NoSQL

Ruby APIs for NoSQL

by Sarah Mei

In the talk "Ruby APIs for NoSQL" delivered by Sarah Mei at GoGaRuCo 2010, the discussion revolves around how Ruby developers can effectively access and manage multiple NoSQL data stores. Recognizing the increasing complexity of applications, Mei refers to the concept of 'polyglot persistence', where projects often incorporate various data stores beyond traditional SQL databases.

Key points covered in the video include:
- Audience Engagement: Mei starts by gauging audience experience with various data storage solutions, illustrating the growing adoption of NoSQL.
- Traditional Rails Application Diagram: The talk begins with a simple Rails architecture, emphasizing that most applications likely evolve into incorporating multiple data storage solutions.
- Complexity of Modern Applications: As applications become more complex, developers often integrate NoSQL data stores like Solr for search, Redis for caching, and S3 for file storage, which leads to polyglot persistence.
- Encapsulation Challenges: Mei highlights the difficulty of unifying data models when data is scattered across multiple stores and the implications this has for Rails applications.
- ActiveModel Introduction: The advent of ActiveModel in Rails 3 is discussed as a means to better manage validations and lifecycle callbacks separately from data persistence, fostering a clearer architecture.
- Combining SQL and NoSQL: Mai explains how developers can leverage both SQL and NoSQL effectively and discusses challenges associated with integrating these technologies.

Examples and Illustrations: Mei walks through the development of a 'cephalopod social network', showing how classes and data structures may evolve to incorporate multiple storage systems—like integrating Redis for a friend graph and using S3 for storing user uploads.

Main Takeaways:
- Applications often require multiple data stores to accommodate various data needs.
- Ruby developers should embrace a polyglot persistence approach, utilizing both SQL and NoSQL databases based on the project’s requirements.
- The need for a cohesive model that allows interaction between different data stores is essential for application architecture, and upgrading to Rails 3 can help facilitate this transition.
- Ultimately, understanding the strengths and weaknesses of different data stores, along with being open to utilizing new technologies, positions developers for success in current development environments.

00:00:11.200 Hello, everyone. My name is Sarah Mei, and I'm a Ruby developer at Pivotal Labs in San Francisco. We do a lot of Rails development, and today, we're going to see some Rails code in these slides.
00:00:15.360 I'm here to talk to you about Ruby APIs for NoSQL. I also like to call this talk 'Polyglot Persistence.' We'll discuss how to store data in Ruby when you're writing a system that uses more than just a relational database.
00:00:27.119 But I'd like to start out with a little bit of audience participation. Let's see a show of hands: who has written an application in Ruby that uses a relational database? That's great to see!
00:00:41.280 Now, who has written a Ruby app that uses a relational database along with some kind of alternative data store? For example, Memcache or a file system? And how about those who have written a Ruby app that uses only non-relational storage? Wow! That's a lot more than I thought.
00:00:58.960 It's really awesome to see that NoSQL is becoming more popular. When I started asking this question six months ago, I would get just one or two hands in the room. Now, we're looking at a much more significant number, which shows how far we've come.
00:01:33.680 So, I want to start off by showing you a diagram you’ve probably seen a hundred times before. This is a very vanilla Rails application where requests come in, go through the routes to the controller, which interacts with the models. The models talk to the database through ActiveRecord, retrieve data from SQL, and then formulate a response back through views.
00:02:15.760 A friend of mine pointed out that if you show a diagram like this, you're legally obligated to have a little diagram of a cloud somewhere. So, I added one! But let me emphasize that it's unlikely anyone in this room has written an application with real users where the system diagram looks like that.
00:02:30.960 In particular, I would be surprised if anyone here has written one where MySQL is the only means by which data is persisted. Usually, you don't set out to create a polyglot persistence system; it just kind of happens over time as you introduce more complexity.
00:03:07.920 You start with your nice little Rails app, and then the product owner suggests, for example, that we need to incorporate free-text search. Next, you may add Solr. Soon after, you might decide to add S3 for file storage, and before you know it, you've got multiple data stores, perhaps even Redis for caching.
00:03:36.880 By that point, you could find yourself with four or five different data stores, and you haven't even tackled anything overly complicated. In fact, I would argue that this setup is actually more common than the base Rails application that I initially presented.
00:04:10.960 In recent years, we've recognized that most applications deal with data that doesn't fit neatly into relational persistence. The reality is that relationships can become quite complex when modeled in SQL, and large blocks of text often need to be searched semantically instead.
00:04:51.840 In fact, many of the alternative data stores we now utilize—like Cassandra and Redis—were once considered unconventional. Until recently, many developers didn't even think of options like Memcached or S3 as legitimate data stores.
00:05:36.400 However, the NoSQL movement has reshaped our understanding of what constitutes a data store. In the 90s, you might have had someone persist data to XML text files, which really exemplifies early NoSQL thinking.
00:06:41.920 But this presents a challenge: how do we encapsulate this information into a cohesive class when the data model is scattered across different storage systems? That's one question to consider.
00:07:13.600 Furthermore, as applications grow, there’s the need to replace the primary data store with something else, which has historically not been easy in Rails. So, how do we unify a model that encompasses data from multiple stores while maintaining functionality?
00:07:56.720 To sum up this section: the integration of multiple databases is a reality in building even simple applications. This concept is referred to as 'polyglot persistence,' and it's not a new idea.
00:08:01.280 I think it's important for Ruby developers to broaden their perspective beyond just MySQL and PostgreSQL. In actuality, applications will likely utilize an assortment of technologies, whether planning for it or not. For instance, I decided my simple application would focus on a 'cephalopod social network,' mainly because I like the word cephalopod.
00:09:36.640 To frame our development process, we start with a basic 'Squid' class, inheriting from ActiveRecord base for all the familiar relational functionalities. However, as we expand to include free text search, we integrate Solr using the sunspot gem to create a searchable block that outlines indexed attributes.
00:10:45.600 Because it’s a web application, we must incorporate a friend graph. Thus, we decide to use Redis to store our denormalized list of friends. This might not be the first solution someone proposes, especially when you consider the overhead of multiple joins.
00:11:38.080 In essence, we create a method that utilizes the Redis gem to add users to a Redis key space. Then, since this social network centers around the cephalopods' literary creations, we let users upload their novels to S3.
00:12:55.440 Now, this Squid class is becoming increasingly complicated with multiple responsibilities, handling both ActiveRecord persistence along with redis and s3 functionalities. Ideally, we would benefit from a uniform interface across all these data stores.
00:14:02.880 At first, I thought integrating everything through ActiveRecord might be the answer, as many Rails developers associate ActiveRecord directly with models. However, it quickly became apparent that ActiveRecord is designed specifically for relational databases.
00:15:24.000 The introduction of ActiveModel in Rails 3 is a significant shift. It separated validations and lifecycle callbacks from the persistence layer, allowing for greater flexibility. This separation encourages a clean architecture where models can communicate using a consistent API while being decoupled from ActiveRecord itself.
00:16:51.600 However, there are still challenges with having multiple data stores within a single model. The question remains, how do we encapsulate a cohesive model when each piece of data is handled by a different mechanism?
00:17:36.800 In conclusion, we are beginning to see how applications can leverage both SQL and NoSQL databases effectively. If you do more than just experiment with these technologies, you'll notice they can coexist within your application architecture.
00:18:29.440 If you have more than 100 users, chances are, you'll end up using an assortment of data stores. The key is to find ways for them to interact harmoniously, and to do that best, I recommend upgrading to Rails 3.
00:19:37.760 I'm happy to take any questions you may have!
00:19:58.800 Great question on testing with multiple data stores. I usually break the functionality down into modules that I can mix in, then create shared examples that can be included in each test for the model using that module.
00:20:57.280 I've seen some issues with libraries trying to adapt to a relational mindset while being fundamentally different; it's important to embrace the unique aspects of each data store.
00:21:38.640 With NoSQL databases, managing consistency can be tricky, as many of them do not implement two-phase commit. Ensuring consistency often requires additional coding work and accepting some limitations.
00:22:32.160 If you're working with managed third-party services for NoSQL tools, I've primarily rolled my own solutions but have heard good things about those services.
00:22:58.480 I appreciate your input on using DataMapper with Rails. While it's primarily designed for relational data sources, employing it flexibly can yield fruitful results if you're looking to leverage older versions of Rails.
00:24:01.760 Thank you all for your attention, and I hope you feel more informed about using Ruby APIs with NoSQL. Let's continue to explore how best to implement these strategies in our applications.