RailsConf 2013

Data Storage: NoSQL Toasters and a Cloud of Kitchen Sinks

By Casey Rosenthal

What is the best data storage option for your application? We have an abundance of conventional wisdom when it comes to building applications on top of a relational database in the Ruby world. Building an application on top of a NoSQL database is a different story. I will present a conceptual framework for understanding Access Patterns that jives with properties of databases, then review common NoSQL databases and propose considerations for choosing one over another. I will also review some uncommon NoSQL databases that address common use cases, and suggest that perhaps some of these should be used more often. Most importantly, I will describe the different state of mind that you should have when building applications on top of these NoSQL options, and provide visualization of non-relational concerns like: fault tolerance, availability, consistency, capacity planning, and horizontal vs vertical scaling. Whether or not you choose a NoSQL option for a future project, you won't look at data storage options in the same way after this presentation. ;-)

Help us caption & translate this video!

http://amara.org/v/FGab/

RailsConf 2013

00:00:16.400 there's a little bit of confusion about the schedule so my talk was moved here this is my
00:00:22.320 talk it's data storage subtitled nosql toasters and a cloud of kitchen
00:00:28.080 sinks am casey rosenthal
00:00:33.920 and i work for a company called basho basho makes a distributed key value
00:00:40.480 database called react it's very ops friendly and they have a product called react cs which is kind of
00:00:45.760 like private s3 so there's a joke in the nosql world that
00:00:52.640 goes something like this it's a bad joke i'll preface it with that do you have a toaster
00:00:58.480 yeah i have a toaster everybody has a toaster right does your toaster run sequel
00:01:03.600 um no my toaster doesn't run sequel oh so you must have one of those newfangled no sequel toasters
00:01:11.119 here's a newfangled no sequel toaster now the joke's uh bad because it's absurd and it's absurd
00:01:17.520 because this term nosql is absurd right we're we're in an industry we're literally you know it's software
00:01:24.159 we can compute anything given enough time and resources and we define this in this industry
00:01:29.920 nosql by the small thing that it can't do which just happens to leave everything else that's possible so if
00:01:37.200 you're in my field if you're in no sequel this is kind of a problem but you know this is
00:01:43.040 how we're known we're all known under the nosql umbrella so we have to embrace that we have to
00:01:48.240 find a way to embrace the term nosql how do we do that well first we recognize it's just
00:01:54.000 a term and then we imbue it with our with our own meaning so for us nosql means uh
00:02:01.360 choices we want it to be an analog with choices in how you store your data for the past few decades
00:02:09.520 prior to the nosql emergence we essentially had a software engineer
00:02:14.879 so we essentially had two ways to store data on a file system or in a relational database
00:02:20.319 and what's exciting about working in this world is every year every couple months a new
00:02:25.520 nosql database comes out new applications are built in ways that access data in different
00:02:31.280 uh different ways and so as software engineers we now have more tools to solve more problems to build new
00:02:39.040 solutions so why nosql this is going to be the
00:02:46.560 overarching theme of my talk why nosql why would you want to look at nosql for any reason other than
00:02:54.400 curiosity's sake i'm going to break the talk up into
00:02:59.760 basically three parts who who are the nosql players what kind of things are what kind of problems are
00:03:05.280 nosql solving and how do we build applications on top of nosql databases obviously i only have 40
00:03:13.040 minutes so i'm just going to be scratching the surface on all of these
00:03:21.040 and i want everyone to be aware that i do work for nosql company
00:03:26.080 basho which makes an sql product react so obviously i have a certain bias
00:03:33.519 it would be really easy to stand up here and start flame wars by criticizing other nosql databases and since i work
00:03:39.680 with react i'm very aware of criticisms about react in order to avoid
00:03:45.599 those flame wars as fun as they might be for some people uh i'm going to
00:03:50.640 attempt to only say positive things about nosql databases okay so in this primer to the nosql
00:03:57.040 world i'm going to do my best to only say positive things about other databases if you want a critical
00:04:03.920 analysis of some of the options up here talk to me after the talk or in another forum
00:04:12.640 this is a chart that 451 research put out it's the nosql linkedin skills index so
00:04:18.720 people on linkedin they have their profiles they can say what database skills they have and then
00:04:24.240 they summed them and so going from left to right we can see the most popular nosql
00:04:31.360 databases self-described by the engineers who allegedly have skills in these databases that's a little bit hard to see so i'll
00:04:37.520 zoom in and we'll take the first half of the chart so over here we have mongodb redis cassandra
00:04:44.479 hbase couchdb marklogic neo4j react couch base and dynamodb
00:04:53.600 i'm going to very quickly go over these 10 databases obviously i can't get
00:05:00.720 into any depth with with any of the 10 and somebody somewhere once said something
00:05:06.960 about constraints give you freedom or something to that effect so i'm going to constrain myself to a single slide
00:05:12.639 taken from github for a ruby client for one of these databases to prove to you that yes people are
00:05:18.080 using these databases and i'll briefly describe some of the properties
00:05:26.400 so if we go in order mongodb i don't again i don't expect you to learn how to use mongodb
00:05:33.280 all you're seeing from this example is yes people write ruby apps on top of mongodb so mongodb is a document
00:05:41.360 database storage engine a document is a data structure that you can think of as
00:05:46.880 kind of like a tree where each node is a key value pair and the value might
00:05:53.840 just be a set of other branches right so you've got this branching structure
00:05:59.280 most of us this is rails world right most of us are familiar with documents via the dom
00:06:06.880 or regular old html so nested elements all of that stuff
00:06:12.639 that's a document mongodb stores its documents in a binary
00:06:18.000 safe version of json so the documents look very similar to json
00:06:24.479 next one on that list was redis redis is an in memory it stores stuff just in ram
00:06:31.440 storage key value has a couple other data types but primarily it's known as a key value
00:06:36.800 and it can persist to disk but again it serves requests out of memory
00:06:42.880 cassandra cassandra falls in the column family group of nosql databases
00:06:50.400 uh so inspired by bigtable these nosql databases um they don't have a relational
00:06:57.680 structure of tables columns and rows they they have a different structure for column families columns and
00:07:04.560 uh records and the implication here is that uh these store the data on disk
00:07:10.960 differently than a relational database would store data so in a relational database for example
00:07:16.319 it would be really awkward to have a data model where you have a table with a lot of columns
00:07:22.160 and only one value set for a given record right sparse data like that is kind of
00:07:28.319 inefficient in most relational databases sparse data happens to be very efficient in a column family
00:07:36.160 storage paradigm hbase the next one on here is also a column family storage
00:07:44.479 uh nosql database although on the back end it distributes the data much differently you can access the data
00:07:51.599 kind of in a similar paradigm to cassandra
00:07:57.039 and some descriptions i've seen for hbase describe it as a large hash map
00:08:04.400 couchdb couchdb is another document database it stores
00:08:11.120 its documents in json and you can use mapreduce to find different views of those documents
00:08:18.479 mark logic yet another document database this one stores its documents in xml
00:08:26.639 neo4j this is the first graph database on this list so if you've
00:08:32.000 ever tried to model a tree structure or a hierarchy in a relational
00:08:38.000 database something where you would want to recurse through records to get the full
00:08:44.480 structure out if it's properly normalized you might have seen that in a relational database that's kind of a pain
00:08:50.320 well graph databases are specifically designed to make those kinds of relations easy to navigate and efficient
00:08:56.959 to navigate right self-joining a table in sql would probably not be a good idea
00:09:02.160 you'd run into resource limits pretty quickly that way if you depending on how deep you want to
00:09:08.560 pull out that relationship but graph databases are specifically designed to do that to traverse
00:09:14.240 those kinds of data structures efficiently react is a distributed key value
00:09:22.480 store so like like the other key values react has no
00:09:28.320 intrinsic knowledge about the value that it stores you could store json in here or xml or binaries it doesn't really
00:09:35.920 care couchbase you can kind of think of that as a
00:09:41.600 mashup between an in-memory key value and a persisted json document store
00:09:49.600 and amazon dyno dynamo db this is a yet another key value
00:09:55.200 distributed key value database okay so there's the top ten
00:10:00.399 so in the back of your mind you can now say um you know you're basically familiar with
00:10:07.279 the scene what i want you to get out of that is if you haven't seen these names before now you have and maybe
00:10:12.720 when a use case comes up uh that could make use of one of these you could go okay i kind of know
00:10:18.240 something about what that does or just go ahead and put no sql on your
00:10:23.600 linkedin profile
00:10:29.440 the burden of being popular so one of the reasons that nosql emerged is because sql
00:10:37.360 was maybe too popular it's being used in places where it wasn't an ideal fit so i don't want
00:10:43.680 you i don't want to leave you with just those top ten right there there are dozens and dozens if not
00:10:49.360 hundreds of nosql databases that solve different problems in unique ways and a lot of those benefits you kind of
00:10:56.720 just have to be exposed to to really get them so i'm just going to pick three
00:11:01.920 that i happen to be looking at this past month just to illustrate the depth of
00:11:07.680 solutions that nosql databases can address so the first is titan this is another graph database like neo4j
00:11:14.880 what's interesting to me about this is it's distributed so if you have more data than fits on
00:11:20.320 one machine this might be an option it's eventually
00:11:25.600 consistent it's still in alpha i think it's at 0.3
00:11:31.839 to prove it's real you can get the it's a job app you can uh you know install it via maven
00:11:37.600 or whatever there and it uses a gremlin which is like query it's a query language for
00:11:43.600 graph databases that neo4j also supports
00:11:50.639 it actually runs on top of titan actually uses cassandra or hbase
00:11:59.120 as its storage mechanism so that's kind of interesting another one exists db so titan is is
00:12:05.360 relatively new exist db has been around for a while since 2001 like marklogic it stores files in xml
00:12:12.880 what i think is cool about this is it uses a query language called xquery here's an example might be too
00:12:19.519 dark to see on the screen but again this is taken from their example page so xquery is a w3c standard
00:12:26.560 language and it's kind of cool because it combines xpath with a javascript ish a javascript like
00:12:33.360 language and xml inside so you can use this language to generate
00:12:40.240 a query that actually outputs an html file and i think that's kind of cool so you put
00:12:46.000 the data in as xml files and when you change those the queries will update their view and you can essentially serve web pages
00:12:52.160 directly from this database okay so imagine if you had a dynamic rss
00:12:57.440 feed or some sort of web page search engine that wasn't too complicated you could use xpath to get the data out
00:13:03.920 of there and populate an rss feed directly from the database i thought that was kind of cool and
00:13:10.320 atomic so atomic is
00:13:16.000 really interesting remember this is rails community so we assumed some things about our application
00:13:22.320 architecture in order benefit from the structure of a rails application atomic does something
00:13:28.160 similar it makes a couple of assumptions for example about your hardware and about your use case to give you
00:13:35.760 as in my experience a unique uh feature set for a database so basically
00:13:43.279 it it runs appear inside of your application that kind of has like a built-in caching for queries
00:13:48.880 gives you consistent transactions and time-based facts they call them that actually
00:13:54.399 return as native data types so like an array in ruby i'm not going to say uh too much
00:14:00.000 more about atomic because 10 30 thursday morning yoko harada is going to
00:14:06.720 be talking about atomic and ruby so i'm looking forward to that talk but again with this i just wanted to
00:14:12.959 kind of you know illustrate that okay we've got our top 10 nosql databases obviously those aren't
00:14:20.240 even close as popular as the sql solutions we're used to but this field is is large it's exciting
00:14:26.720 it's it's got uh energy in it in rails right in rails
00:14:32.959 4.0 rails 5.0 where it's a mature framework we're not going to get any huge surprises
00:14:38.240 from future versions of rails but in the nosql world we are going to get new solutions that
00:14:46.000 solve problems in ways that we haven't thought of before so getting back to why no sequel
00:14:56.000 one reason you might want to choose or investigate a nosql solution is for fault tolerance so i
00:15:02.720 think of fault tolerance as the optimistic view that bad stuff is going to happen
00:15:10.320 right the optimistic view that bad stuff is going to happen your hard disk the bearings on your hard disk are going to break your ssd is going to break
00:15:16.800 the server's going to die network cable is going to come unplugged this stuff happens so if
00:15:22.959 you have an application a business use case where your database your data storage
00:15:29.040 has to be fault tolerant then you might want to look at a nosql solution that is specifically designed
00:15:34.639 from the ground up to provide fault tolerance if you if you try to build fault tolerance on top of
00:15:41.040 the relational databases that we're used to it can be done to a degree but it's kind
00:15:46.399 of a pain in it's a pain right so some databases will automatically
00:15:52.959 distribute the data for you to multiple nodes so that you have multiple copies of it it'll automatically route around
00:16:01.040 the drunk puppies in your cluster and when you know you get a fresh node in your
00:16:07.680 cluster it'll automatically heal the data by sending the data to that fault tolerance is a really important
00:16:15.440 attribute to have in a data store and so if your use case calls for that you
00:16:20.639 definitely want to start with a database that's specifically designed to meet that criteria
00:16:26.639 i'll tell you let's take a look at another one high availability in distributed databases we love to do
00:16:32.480 things in parallel so here is a parallel cluster of napping
00:16:37.839 puppies we'll call this the napping puppy database i won't trademark that so if any of you
00:16:44.160 want to go build this that's cool all right so if we store a value into
00:16:49.920 our napping puppy cluster say we're storing orange and we get that
00:16:55.040 we ask for that data value back out from a different puppy as long as the cluster is all talking to each other we'll get
00:17:00.240 the answer that we expect orange but puppies don't always talk to each other so say we have a
00:17:06.480 division there right this could be a network partition um
00:17:12.160 it could be just a node on your server is down it doesn't matter all you know is that from one part of your distributed
00:17:18.079 network you can now no longer communicate with the other part of your distributed network so what do you do on a highly available
00:17:24.640 system if you can the part that you can connect to you can still save a value so you can
00:17:29.840 save this value as yellow and when you retrieve it from that side you'll get yellow
00:17:35.039 and you can't see this bottom part right but from the other side if you can see
00:17:40.160 the bottom part but not the top part you can go ahead and save the same value blue right you read it back out and you
00:17:47.280 get that value but notice that if this happened at the same time depending on where you were connected to you're getting a
00:17:54.240 different answer in a lot of use cases a lot more use cases than i
00:18:00.799 originally thought when before i was in the nosql arena this is acceptable in fact the preferred
00:18:06.400 way for an application to respond if you're only connected to a subset of a distributed
00:18:12.799 network a lot of times it's more important to get a response any response than the correct
00:18:20.799 response whatever that means right so in web and anything web related this is this is
00:18:27.360 usually the response that we want usually we want a web page even if it's outdated even if it doesn't agree with
00:18:33.280 the same web page that somebody would view from the other side of the country
00:18:38.320 and then of course when that partition goes away a highly available data store will have
00:18:43.679 some method to allow you to resolve that value in this case
00:18:48.720 we took yellow blue and got green that's one property so if you have if
00:18:55.039 you have a use case for high availability again that's that's difficult to do with the sql
00:19:00.320 database with a relational database but some of the nosql databases are specifically designed for this
00:19:06.559 so here's another one so here's a flock of penguins database right and these guys stick together i
00:19:11.679 can't draw a line between them so in some cases you want strict consistency
00:19:17.520 over a cluster over a large cluster again if you do this in a traditional
00:19:22.559 relational database you're going to be dealing with locking or some other mechanism that's going to affect performance
00:19:29.200 so in consistency we write to this guy and then when we go to fetch that data
00:19:34.880 they will all whatever answer they give they will all give the same answer the state of the data
00:19:42.000 at any given time is identical in a strongly consistent database so some nosql databases were
00:19:48.400 specifically designed to do this to have strong consistency over a distributed
00:19:54.160 data set that's not an easy problem to solve and it's and it's certainly not easy to do with
00:20:00.320 the relational tools that we have and scale everybody's favorite nosql
00:20:08.080 topic scale so a word about scale you can scale throughput like operations per second
00:20:17.440 you can scale storage the amount of disk space that you're storing stuff on latency scaling latency is probably more
00:20:24.480 important if you can do a million operations uh average per second but it's taking
00:20:29.520 you a minute from when the query starts to when the query ends that's not going to be acceptable for a web page right so latency is
00:20:36.240 is actually a really important thing to consider when you're looking at scaling scaling takes two forms
00:20:43.520 vertical you put something on a bigger machine and this was uh formally the typical way the
00:20:49.360 prototypical way to scale a sql database right you just put it on a bigger box horizontal means you distribute the
00:20:55.520 workload among uh many machines and this is very appealing on commodity hardware
00:21:02.159 it can be done on a relational database you can do you can use tricks like sharding but
00:21:07.600 then uh you know there has there have to be certain things about your application
00:21:12.799 logic that can't change in order for that to trans transition smoothly
00:21:18.960 some databases some of the nosql databases that scale take the strategy of using
00:21:25.520 separate servers that handle different tasks so like one will handle the metadata and the other will handle the actual storage and so you can query a
00:21:31.919 transactor or something like that and distribute the workload among many storage nodes that way
00:21:37.600 the other kind of paradigm for having a scalable nosql database is to have some sort of
00:21:42.840 logical data locality so like you use a consistent hash so that
00:21:47.919 logically when the transaction comes in you know which server it lands on and that way you can just add more servers
00:21:53.600 to scale out but again so this is the the third reason you might want to look at a nosql
00:21:59.840 database instead of a relational database and just to really hammer home this scaling issue
00:22:05.520 i came across this yesterday 90 of the data in the world today has been created in the last two years
00:22:11.360 we're kind of at an inflection point on an exponential curve for data storage
00:22:16.880 last year sources estimate that we we stored about two and a half
00:22:22.720 two to two and a half zettabytes of data so it goes gigabyte terabyte petabyte
00:22:29.440 exabyte zettabyte so it's just an unconceivably large amount of data
00:22:36.080 and my napkin math tells me that in 2013 we're going to store more date more
00:22:44.159 than that a lot more than that more data than has ever been stored up until 2013.
00:22:51.600 okay that's just the data that we're storing there's a bigger problem which is we
00:22:57.840 don't have the infrastructure to store all of the data that we have a business case for storing
00:23:03.919 so people want to store data that we simply can't and relational databases
00:23:08.960 certainly aren't going to scale and to meet that demand no sql databases are struggling to scale to
00:23:14.480 get in there to meet that demand if you're a software engineer or a
00:23:19.600 consultant and you have insight into how nosql works and how these databases scale out
00:23:26.240 that makes you really valuable because there are very few people who
00:23:32.320 can solve that problem so again why nosql well i'm here to tell you that
00:23:38.880 there is no todd there's no universal theory of data you may have picked up on this by now
00:23:44.080 there's no one database that you can go to and say okay this is it and there's no universal
00:23:49.440 theory theory above that that you can say okay this is the application that i have now which nosql database is the right
00:23:55.520 one for me there's no answer to that yet it's it's not yet a solved problem
00:24:01.679 so we have to go on experience and intuition and again as a software engineer if you
00:24:07.760 have that that's really valuable
00:24:12.960 but there are access patterns so in trying to figure out how we work with
00:24:20.559 nosql databases i'm the director of professional services at basho so we go on site with big clients
00:24:28.000 and we help them figure out how to install this infrastructure to store these mounds of data that
00:24:34.640 people are creating that businesses are creating now and so we need some sort of framework
00:24:40.480 for deciding okay what is the best tool for the job and we started by developing this
00:24:47.200 framework around access patterns and i'm going to give you a little bit of a taste of access patterns so say we have
00:24:53.760 a scale one side is scheduled another spontaneous we're looking at the query patterns
00:25:00.640 between an application and a data store so on one side scheduled you can think
00:25:05.760 of something on a cron job or if you're able to negotiate with the stakeholder and say i'm going to run this report
00:25:11.840 at you know 4 a.m on sunday on the other side of the scale you have
00:25:17.120 spontaneous the quintessential example here is a website right you don't have control
00:25:22.960 you don't have explicit control over when people visit your website and that might trigger a query right
00:25:29.200 scheduled versus spontaneous here's a here's another scale static versus dynamic
00:25:38.320 on the static side think of a key value it's like i want this okay and the database just
00:25:44.559 gives it right back to you if you can trace exactly the path of
00:25:49.600 that the code follows to retrieve the data that you want that's pretty static on the other side we have dynamic
00:25:55.200 anything that requires a query planner is dynamic if you can't beforehand
00:26:01.200 before the query is is issued know a certainty how it's going to go through
00:26:06.480 the stack to retrieve the data then it's dynamic okay so that's static and dynamic put these two together you get a nice
00:26:12.640 grid here along the top we have static and dynamic and on the left side we have spontaneous and scheduled and it turns out if we can
00:26:20.080 fit an application into its access pattern into one of these quadrants
00:26:25.279 then that helps us begin to think about what nosql solutions or relational solutions what data
00:26:31.679 storage solutions are appropriate for the access pattern so if we look at the first one well pretty much every database worth itself
00:26:38.720 can handle this right a static fetch scheduled it's very easy to plan for
00:26:44.320 it's very easy to control for the resources you would need for that spontaneous and static not everyone can
00:26:49.600 do this databases that scale well can do this better than databases that don't
00:26:56.240 key values are static so a key value that scales well is going to be really good at
00:27:01.840 spontaneous static queries get a little bit more difficult dynamic and scheduled okay so some databases
00:27:08.320 have ways to uh you know dig into the data a little bit deeper mapreduce or um
00:27:16.559 cql or sql or one of the other aquarium languages and if it's scheduled again you can
00:27:22.320 control for resources the problem is this quadrant where you don't know when that query is
00:27:28.159 going to be generated and you don't know how that query is going to be executed now having grown up in a relational
00:27:33.520 world we already kind of sort of have best practices enhanced so we know we're not supposed to join you know the table to itself a
00:27:40.399 million times right that's going to utilize a lot of resources stuff's going to crash
00:27:46.559 most databases are happy in one of those three they have some method
00:27:52.000 of reasonably handling those three and they're not so good down here in this in the spontaneous dynamic
00:27:57.840 realm so what do we do when it's down in this
00:28:04.000 realm so in this lower right hand quadrant as application developers that happens to be where we usually
00:28:11.200 start thinking about our application in terms of the data storage so that's kind of an
00:28:16.640 unfortunate mismatch so when we come in we say okay we want
00:28:22.159 to fix that what do we do well we try to take the use case and move it from there over to from dynamic over to static
00:28:32.399 how do we do that well we want to evolute how we uh right we don't want to evolve
00:28:38.480 how we deal with data we don't want to just randomly do stuff and see what survives
00:28:43.520 we want to proactively evolute ourselves i totally made up that word so
00:28:50.000 we want to actively go okay this is how that we originally thought about the
00:28:55.919 application in terms of you know it's got some dynamic query pro access pattern can we make that access
00:29:03.919 pattern static because if we can then we can move it over to one of the nosql databases that handles scaling a
00:29:09.919 lot better and we can still solve that problem and it's a different mindset it's a it's
00:29:16.080 a change in how you view software engineering so i'm going to describe
00:29:21.120 a prototypical relational scenario feel free to disagree with me but this was this was how i was raised
00:29:30.000 in a sequel if you have a relational data model and you're looking at a new application
00:29:36.240 okay so first thing you figure out is what data do i have to store and you take that data and you break it
00:29:41.360 down in a data model you normalize it you make sure all of the relationships are correct
00:29:46.640 and there's a lot of best practices for that and as long as you model the data
00:29:52.080 properly then when you go to build the application on top of it the view
00:29:57.840 you have a fair degree of confidence a fair degree of certainty that you'll
00:30:03.919 be able to query what you need out of that solid data model so data model first and then worry about
00:30:11.600 querying later but don't worry about it too much because if you do the first part well then the second part
00:30:17.039 you have some confidence that that'll work when dealing with nosql particularly
00:30:22.320 some of the simpler data structures like key value you want to flip that you want to start with what do i want this data
00:30:28.640 to look like and if you solve that well then the data model actually just falls out
00:30:35.279 so you solve this part well this is the report i want this is the web page i want this is the view of the data that i
00:30:41.840 want then how to store it in the in the back end actually just
00:30:46.960 falls out because you want it to look exactly the way you want it to fit the way you want to fetch it
00:30:53.760 okay and then you're not interested in normalizing data because that just takes extra time
00:30:59.200 so most applications that i've seen it's actually it's an exercise but it's reasonable
00:31:06.720 and rational to take what you thought was a dynamic access pattern and convert it into a
00:31:12.000 static one and then you get the benefits of having
00:31:17.120 that spontaneous static solution backing your formerly dynamic access
00:31:24.720 pattern an example of this is say you have a report say you're looking at like users and
00:31:30.320 logs or something and you want to know how many users hit this web page well we all know how to do that in sql
00:31:35.919 you would just count on a column or average or something whatever statistic you want well how would you do
00:31:42.559 that in a key value database well you can't do it that way so instead every time a log file is written
00:31:49.679 right out to the solution keep a rolling average keep a rolling count do it in real time
00:31:55.519 you've just distributed your rights across time and so particularly in a heavy
00:32:01.600 read heavy application you've just saved yourself a lot of uh processing time you've saved all of the query planning
00:32:07.440 because now you don't have any query planning you just fetch the answer it doesn't get any faster than that and
00:32:13.120 that kind of architecture is really easy to scale
00:32:19.360 if that fails go with a hybrid solution i've seen some really awesome hybrid
00:32:24.559 solutions right one of my one of my favorite uh hybrid solutions is like
00:32:29.679 postgresql metadata and then it spits out a key and then you use the key uh to fetch a value out of you know one
00:32:36.399 of the key values like uh react
00:32:43.440 okay but again there is no universal theory of data right now
00:32:49.600 so this is where experience and intuition really have to trump any heuristic
00:32:56.000 because we don't have a heuristic so if you want to remain uh if you want to be particularly valuable
00:33:03.120 in software engineering right now experiencing gaining experience with these different nosql databases
00:33:08.799 is a really good way to do that and even though there's no theory of
00:33:14.000 data universal theory of data as software engineers i promise you we can still find harmony with all these
00:33:19.519 different species of databases it is uh can be a little bit confusing
00:33:26.559 but it is certainly uh doable okay and hopefully
00:33:31.600 i have time for some questions although i'm not sure what the format for that is that's my talk i'm case rosenthal
00:34:17.679 you