00:00:09.360
Okay, so to start out, I just want to ask really quickly how many of you have actually used Cassandra in an app somewhere, even just playing with it?
00:00:12.480
We’ve got a couple here. How many of you have used it with Rails? One or two, okay.
00:00:15.679
Let’s see if this works. So, a little bit about me real quick: I’ve been doing Ruby and Rails since 2005.
00:00:21.119
Back then, I was doing tech support. I couldn't get my boss to buy us a tool, so I started building one and realized quickly that management wasn’t as fun as coding, so I switched.
00:00:25.679
I went freelance last year, and some of you may listen to some of the podcasts that I do, like 'Teach Me to Code,' 'Ruby Rogues,' and 'Rails Coach.' Those are a few of the things that I do, and I like to play with that kind of technology.
00:00:29.039
When I was getting ready to prepare this talk, I had a client who came to me and said, 'I want a Twitter clone.' I looked at him and said, 'You know, Twitter isn’t making any money, so this probably isn’t a great idea.'
00:00:35.920
However, he explained to me that he had a unique selling proposition; he wanted some functionalities that Twitter offers but didn't want Twitter itself. I figured that it was probably something that wouldn’t kill him, and he might actually be able to make it work.
00:00:43.439
He had some interesting ways of advertising on the site, so I said, 'Go ahead, I’ll do it for you.' He offered to pay me a substantial amount of money for it. A few months later, his brother-in-law, one of the founders of Dentrix, which is dental software, told me that Twitter was using this NoSQL solution to handle all of its tweets.
00:00:58.320
His brother-in-law insisted that he wanted a NoSQL solution right away, so I said, 'Okay.' I was apprehensive but agreed to go ahead with it.
00:01:10.720
As I was learning to implement Cassandra into this Rails app, I thought, 'I might as well talk about it.' A few months ago, right after I submitted this talk, he approached me and said he wanted to get the project into beta so he could start getting feedback, but I told him that some things needed to be cut from the plan.
00:01:30.720
He suggested cutting the conversion to Cassandra, which I found amusing. This conversation made me think that I might just build my own Twitter clone. How many freelancers here have time for a large project like that? I didn’t see any hands.
00:02:08.000
I've started working on a semi-functional prototype, but it’s not complete enough to demonstrate here. Speaking of Cassandra, there are a few hands raised. Most of you know it’s a NoSQL solution. Initially, I was confused by people discussing it as a column-oriented database in contrast to row-oriented databases.
00:03:01.840
We’ll discuss the schema in a bit, but generally, a column-oriented database is about how the data is conceptualized, not the structure itself. Cassandra was started by Facebook, which open-sourced it in 2008, and since then the Apache Foundation has supported it, leading to rapid development.
00:03:29.360
Cassandra is based on the CAP theorem, which states you can only maintain two of the three guarantees: availability, consistency, and partition tolerance. Availability means your client can always connect and retrieve data, while consistency means that multiple queries to different clients yield the same result each time. Partition tolerance refers to the capability of handling large data growth by spreading it across machines.
00:04:58.799
Cassandra typically emphasizes availability and partition tolerance rather than full consistency. Why would you use Cassandra over a relational database? While some argue relational databases are obsolete, I believe in choosing the right tool for the right problem.
00:05:16.000
Cassandra's benefits shine in large deployments, such as Twitter, which needs to handle billions of tweets. It excels in write-heavy operations and can easily integrate geographically distributed setups. If your schema is constantly evolving, Cassandra is a good fit, as it doesn’t require a predefined schema.
00:06:50.320
In Cassandra, the top-level structure is a keyspace, akin to a hash. Inside the keyspace, there are column families that are similar to hashes, where the keyspace manages data consistency, and column families reference rows.
00:07:01.679
Cassandra stores rows as records, and the columns represent key-value pairs, which enable efficient data management. Queries in Cassandra occur by the key, similar to hashes, and you can only look up data by one key at a time.
00:08:01.040
It’s common to create entire column families for one query; for instance, if I want all tweets from user X, I’ll set up a table where the key is the user identifier. However, it’s important to note that ordering is predefined in the database, and it’s not uncommon to set up column families uniquely for various queries.
00:09:05.840
For CRUD operations in Cassandra, they are simplified when using the Cassandra gem rather than the CLI. The gem handles data serialization better than raw byte arrays, and operations include regular get, multi-get, and remove, while insert performs both create and update.
00:10:27.360
When scaling with Cassandra, leverage multiple machines in your cluster and set a replication factor that determines how many copies of your data should exist. This replication provides reliability, allowing continued querying even if one node goes down.
00:11:42.080
You can also tune your consistency levels based on how critical the data is. For instance, if you desire strong consistency, you might require acknowledgments from three nodes for a correct response, but this may slow down your reads due to the checks.
00:13:02.720
The Ruby ecosystem offers several gems to help you interact with Cassandra, including the Cassandra gem which has a somewhat complex API but is functional. I’ve built an ORM on top of it to streamline interactions.
00:14:21.680
Active Model is a clean choice when working with this structure. The ORM I created reflects the active record pattern but operates under Cassandra's unique constraints, allowing full DB operation with familiar syntax.
00:15:46.799
As for migrations, they are simple since you don’t have to handle fields like you would in a relational database. It's primarily about creating and modifying key spaces and column families, which can be done easily.
00:17:17.440
So far, I have found that the approach has worked well for building a clean API that resembles what you'd expect when using Rails. I also want to maintain consistency in how users interact with their data models.
00:18:23.360
Remember that due to Cassandra’s architecture, automatic detection of data types and the need for orderly entries mean careful planning is necessary when designing APIs. It’s crucial to keep a visual representation clear for when users are inputting data.
00:19:49.520
The flexibility of NoSQL lies in its schema-less advantages, which allow for dynamic changes—like adding new attributes without requiring changes to existing structures. This agility proves beneficial in quickly evolving environments.
00:20:56.480
Several projects now exist that showcase how Cassandra can be integrated within Ruby ecosystems, and I encourage exploration of those. It's an excellent learning opportunity to design code that interacts effectively with database systems.
00:22:18.200
I'm happy to answer any questions, particularly regarding Cassandra or its integration with Ruby and Rails. Your inquiries and feedback about practical experience are very welcome.
00:23:09.440
Regarding secondary indexes, they require existing data to index. If a column doesn’t exist, it isn’t indexed. The flexibility of Cassandra allows arbitrary columns but requires the right setup and understanding of your data.
00:24:13.520
Your keys in Cassandra can be various types, including non-string types. This versatility allows for rich data handling, and the automatic sorting of keys simplifies data retrieval.
00:25:00.400
When setting up your architecture, consider how you define and manage your column families to maintain the integrity and performance capabilities of your application.
00:26:23.440
If you have further questions about setting up or optimizing your implementation of Cassandra, please let me know. Collaboration and discussion help us all grow stronger in our coding journeys.