Talks

SQL to NoSQL to NewSQL and the rise of polyglot persistence

SQL to NoSQL to NewSQL and the rise of polyglot persistence Paul Dix

The last ten years have brought many new developments in databases. Previously developers had SQL as the dominant and nearly only paradigm for databases. Then in the mid-aughts the rise of NoSQL databases like MongoDB, Redis, Cassandra, HBase and others brought new paradigms and options to developers. Over the last few years there seems to have been a swing back to NewSQL or scalable databases that support the SQL standard. In this talk we'll look at some of the new database models like document, data structure, time series, and key/value. I'll look at use cases where these different models end up being a better fit for their problem domains than SQL, the previous one true language to rule them all.

GoRuCo 2017

00:00:16.650 So my talk is titled 'SQL, NoSQL, NewSQL, and the Rise of Polyglot Persistence.' Let's turn this guy on.
00:00:27.609 Alright, I got the slot right before lunch. I'm sure you're all ready for food and you don't want to listen to me speak. But just wait if you hate me.
00:00:39.850 Before I get into the talk, I just want to give you a bit of information about me, kind of like my perspective so you can see where I'm coming from. I'm the CTO and co-founder of InfluxData. We make an open-source time series database called InfluxDB. It's written in Go, and it has a query language that's somewhat reminiscent of SQL.
00:00:52.000 I've been thinking a lot about the query language and how I can improve it lately, and that’s kind of the inspiration for this talk. I'm also an author; in 2010, I wrote a book titled 'Service-Oriented Design with Ruby on Rails.' So even though I haven't been a Ruby programmer for quite a few years now, I want you to know that I'm with you—I’m one of you. So you can embrace me as a brother.
00:01:16.030 As Luke mentioned, I've spoken at GoRuCo quite a few times. This is actually me ten years ago at the very first GoRuCo, where I was extraordinarily nervous presenting something that I barely knew anything about.
00:01:41.800 Now, onto the core thesis of my talk: SQL, and the relational data model’s dominance in the database world, is over. It is always going to be dominant in some way, at least for a while, but for decades, the assumption was that if you were going to use a database, it would be a relational database. My thesis is that time is gone, and we've actually been in a new era for a while.
00:02:12.730 When you think about SQL, it's a domain-specific language. It's an API for working with data. It's one API, but there are many ways to represent APIs. In the past ten years, we've seen the rise of other kinds of ways to work with data, and my thesis is that this multi-paradigm approach involving NoSQL and all these different models is here to stay.
00:02:31.310 We have many programming languages for many different kinds of tasks, so we should have many query languages—not just SQL. The thinking in the database community has gone something like this: SQL was an excellent tool, and we all relied on it for everything until around 2006-2007 when people realized it wouldn't scale. So, we introduced NoSQL to constrain our problem set and sacrifice some power for scalability.
00:02:54.200 Then came the idea of Not Only SQL, which acknowledged that while SQL is great, there are many other considerations. NewSQL emerged, suggesting that we don't have to compromise on query language or access patterns; we can layer SQL on top of distributed databases.
00:03:30.200 It’s here I find myself disagreeing with the NewSQL crowd. Many believe that NoSQL approaches are inferior and merely a temporary solution until technology catches up with SQL. I think that if you obsess over SQL alone, it will end badly for you. There are other approaches to working with data, and that is what I mean by polyglot persistence. SQL is not the end state; it’s not the only acceptable way to query and manage your data.
00:03:58.010 In the beginning of the NoSQL movement, everything was tied to scalability. People were fascinated with distributed systems and the number of requests per second they could handle. However, I contend that scalability is the least interesting aspect of NoSQL. Most people do not face scalability problems. The main focus of NoSQL is really programmer productivity. This is something that the Ruby community can connect with deeply. When you picked up Rails, it wasn't necessarily because it was the fastest framework; it was because you could build applications faster than in any other environment.
00:04:46.800 That’s why I think NoSQL will continue to grow in popularity over time. Query languages are APIs for working with data, and we need tools that are effective for different use cases and access patterns. This talk will give you a bit of database history, some thoughts on query languages and APIs. I'll make some hand-wavy arguments and add examples of what I think is more effective.
00:05:51.060 So, let’s start with SQL. This is where our database journey begins. Although there were databases before, like hierarchical databases, I think of the SQL journey in this timeframe: 1970 to 1986. Now, 1986 was a long time ago, but bear with me; this was a crucial period for SQL's development.
00:06:11.220 In 1970, a computer scientist at IBM, Edgar F. Codd, wrote a foundational paper titled 'A Relational Model of Data for Large Shared Data Banks.' This paper was the precursor to all relational databases. Codd laid out the relational model and established the relational algebra that all SQL databases are founded upon.
00:06:27.440 In the 70s, IBM started working on a prototype called System R. Interestingly, Codd wasn't involved in the project; it was quite political. System R was merely a prototype and was only sold to a few companies—still not commercially available. At this point, relational databases were still an academic concept.
00:06:54.090 However, in 1979, a company named Relational Software released the first functional relational database with a language called SQL. There had been a language called SEQUEL before, but due to trademark issues, the database was named Oracle V2. Some of you may recognize this name; the company later became known as Oracle.
00:07:18.000 Now, this is Larry Ellison, the founder and CEO of Oracle, who has appeared on Forbes' richest list. I did an image search for him and stumbled across a picture of him brandishing a pistol because, apparently, if you are the lord of databases, that’s what you do. Fun fact: Oracle’s headquarters is in the Bay Area, and very close to it is a small airport called San Carlos Airport.
00:07:46.350 This airport has the code SQL. Before I researched this talk, I thought it was named that because of its proximity to Oracle; but the information I found said the airport code SQL originated in 1977 before Oracle released its first SQL database in 1979.
00:08:21.020 Moving on, in 1979, IBM released System/38, which was the first commercially available system from IBM with a relational database. Interestingly, it wasn't just software—a piece of hardware contained a lot of software as well. It had its bizarre design.
00:08:41.900 In 1981, IBM released the software package SQL/BS and then, in 1982, the DB2 name emerged. This naming convention and the corresponding ANSI SQL standards that followed in 1986 would set the stage for the SQL we recognize today, the first standard being SQL-86.
00:09:12.789 It’s essential to note that SQL's dominance took time; it didn't happen overnight. SQL wasn’t just handed down from on high by Lord Ellison for everyone to use. One competing language at the time was called QL, developed at Berkeley in the 70s, which also influenced SQL's development.
00:10:02.200 Around 1994, the database Postgres transitioned to SQL after recognizing its growing popularity. Over the years, SQL's standards evolved and we saw numerous iterations refreshing its capabilities.
00:10:36.200 Now, while SQL has evolved, it’s worth noting there’s no universally fixed standard of SQL. Each SQL variant—MySQL, PostgreSQL, SQL Server, Oracle—comes with its syntax distinctions. This mismatch in SQL dialects emphasizes the need for libraries like Active Record to create a standard Ruby interface, regardless of the underlying database.
00:11:40.490 As we shift focus to NoSQL, the original temptation for many NoSQL movements was scalability. Back in 2006, Google published a research paper on BigTable, which introduced a new storage model that was simple and scalable but lacked a robust query language. This simplicity in design appealed to large enterprises.
00:12:30.780 In 2007, Amazon released the Dynamo paper detailing a highly available key-value store designed for infinite scalability. In 2008, Facebook introduced Cassandra as open-source, which was akin to BigTable, further popularizing the NoSQL trend.
00:13:01.110 NoSQL also became associated with simplifying developer experience. This led to the notion of 'Not Only SQL,' proffering that while SQL was good, there are alternatives that we could embrace. Two significant entries into the NoSQL space were MongoDB in 2007 and Redis in 2009, both highlighting the flexibility of data access patterns outside the SQL paradigm.
00:14:09.110 The idea behind NewSQL emerged from the realization that NoSQL stores lacked many features SQL databases offered. Companies like NuoDB and VoltDB were created to fill this gap, aiming to maintain compatibility with SQL while being able to operate at the scale offered by NoSQL technologies.
00:15:13.320 As we look at the innovations in the database community, we find that at this intersection of familiarity and functionality lies immense potential for productivity. While incremental improvements can be essential, breaking out of the established paradigms will be increasingly necessary to address the diverse challenges in data management.
00:16:44.100 Take time series as an example. InfluxDB optimizes querying for time series data, tapping into powerful native capabilities for analysis of continuous data streams. The paradigm shift from set-based queries to functionally driven timeliness can revolutionize how we think about database management.
00:18:20.890 Another point is the analogy of GraphQL, which represents an emergent form of polyglot persistence. This ties various APIs together and allows database operations to converge in a meaningful query language format, providing a flexible solution for managing diverse data sources.
00:19:18.520 Reflecting on the evolution of database technology: from the relational revolution to now, where we face immense data challenges, we must remember that every task has unique requirements. This mandates an exploration of our options outside SQL for long-term success.
00:20:23.330 At its heart, polyglot persistence emphasizes the necessity of adaptability in managing disparate data types and patterns conducive to application needs. More and more, our challenges will revolve around the manipulation and extraction of insight from larger and increasingly complex sets of data.
00:21:10.950 So, I urge you all to embrace the polyglot database mindset. With the pace of change in our data landscape, recognizing these emerging paradigms will only enhance our ability to create meaningful applications. Thank you!