RubyConf 2022

Building Stream Processing Applications with Ruby & Meroxa

As the world moves towards real-time there’s a growing demand for building sophisticated stream processing applications. Traditionally building these apps has involved spinning up separate task-specific tooling, learning new and unfamiliar paradigms, as well as deploying and operating a constellation of complex services. In this talk, we’ll take a look at how to use the Turbine framework (turbine.rb) to build and deploy real-time stream processing applications using Ruby.

RubyConf 2022

00:00:00.000 ready for takeoff
00:00:18.900 so welcome my name is Ali I'm going to
00:00:22.800 talk about stream processing with Ruby
00:00:25.500 and specifically turbine RP
00:00:29.220 so here's a quick agenda essentially
00:00:31.080 this is what we're going to cover so you
00:00:32.399 know what you're getting yourself into
00:00:36.059 let me start off with a little a little
00:00:38.700 bit about myself uh who am I why you
00:00:41.100 should trust me
00:00:43.200 so this is me I'm the CTO at one of two
00:00:46.440 co-founders uh davaris the other
00:00:48.420 co-founder is right there
00:00:49.920 at moroxo and uh previously before
00:00:53.700 starting roxa I was a lead engineer at
00:00:56.340 Heroku specifically on the heruku dates
00:00:58.140 team mainly working on uh heroku's Kafka
00:01:01.260 offering where my team managed thousands
00:01:05.400 of Kafka clusters for tens of thousands
00:01:07.680 of customers uh before that I built a
00:01:10.799 system at a targeted advertising company
00:01:12.299 that queried over 2 billion user
00:01:14.580 profiles uh in real time
00:01:16.799 and then way way way before that I built
00:01:19.380 analytics pipelines for mobile apps
00:01:21.780 um processing regularly over 100 000
00:01:24.740 events per second
00:01:26.700 and so basically I've been doing this
00:01:28.619 for for quite a while working in and
00:01:30.360 around the data space
00:01:34.680 so stream processing what is it and why
00:01:38.159 you should care
00:01:40.680 so specifically in stream processing
00:01:43.979 what I what I mean by string processing
00:01:45.600 is really about taking an unbounded
00:01:47.720 sequence of events uh continuous
00:01:50.460 unbounded sequence of events and
00:01:52.079 applying some sort of computation or
00:01:53.700 transformation to it
00:01:55.140 I'm intentionally avoiding the term real
00:01:57.060 time
00:01:58.259 um that's generally implied but there's
00:02:00.479 no
00:02:01.380 generally accepted agreed upon
00:02:03.600 definition for real time but essentially
00:02:06.899 not batch whatever that means to you
00:02:10.979 so some examples of stream processing in
00:02:13.080 general are filtering to get a number of
00:02:15.300 events you want to drop some of them
00:02:16.879 enrichment you want to take each event
00:02:18.900 and you want to augment it with some
00:02:20.160 additional information
00:02:21.739 aggregation where you want to do some
00:02:23.879 sort of processing across a number of
00:02:25.200 them maybe count them some of them do
00:02:27.060 some sort of calculation there joins
00:02:30.360 typically is similar to a SQL join you
00:02:33.660 want to take two sets of data mash them
00:02:35.400 together by some common element and then
00:02:37.980 routing is kind of another one where you
00:02:40.680 want some events to go in place and
00:02:42.180 other events to go somewhere else
00:02:46.379 um some some common use cases that
00:02:48.599 should be familiar to most people
00:02:51.060 um you know analytics is probably one of
00:02:52.680 the most common uh it's one of the most
00:02:54.599 common ones that we see at least and
00:02:56.640 essentially you're taking data from a
00:02:58.800 number of different sources it could be
00:03:00.180 your operational database maybe it's a
00:03:01.920 postgres database back in your rails
00:03:03.480 application you're taking some data from
00:03:05.640 you know support tickets in in zendesk
00:03:07.620 and maybe some CRM data from Salesforce
00:03:09.840 and you're pulling them all into a
00:03:12.120 single data warehouse where your data
00:03:14.280 scientists run some queries and and sort
00:03:16.319 of derive some some insight out of
00:03:19.560 um another common use case is
00:03:20.940 replication or just and disaster
00:03:22.379 recovery
00:03:23.459 and so here you're continuously and
00:03:25.739 hopefully immediately pulling data from
00:03:29.099 one place and putting it into some other
00:03:31.400 region or data center or or Cloud even
00:03:35.340 um across you know geographical
00:03:37.019 distances in order to have a uh another
00:03:40.440 place that you can recover from this
00:03:42.840 could also be different database types
00:03:45.299 so maybe you're doing postgres on RDS
00:03:48.120 and AWS and you're copying over to SQL
00:03:51.180 server and Azure on on a different uh
00:03:53.760 different side of the the country
00:03:57.480 um enrichment is another very common one
00:03:59.280 for us so essentially you're taking some
00:04:01.860 data uh maybe it's a user sign up and
00:04:05.159 you want to add some additional
00:04:06.959 information to make that data more
00:04:08.280 useful to you so maybe you look up their
00:04:10.680 email with some third-party service that
00:04:13.560 gives you a little bit more information
00:04:14.459 about them maybe the company the role or
00:04:16.799 whatever it is and then you're taking
00:04:18.299 that sort of fatter enriched record and
00:04:20.220 then you're putting it somewhere else so
00:04:21.540 you can use it maybe it's back in your
00:04:22.740 operational database maybe it's in your
00:04:24.660 your data warehouse
00:04:26.400 and then uh I've listed integration
00:04:29.040 which is a super vague general catch-all
00:04:31.560 for like everything else and essentially
00:04:34.620 taking your data and putting it
00:04:36.419 somewhere else where it can be used by
00:04:37.919 someone else
00:04:38.960 this could be third parties it could be
00:04:41.699 other teams maybe you scrub the pii out
00:04:45.540 of your stream of data and you make it
00:04:47.340 available for a partner to use
00:04:49.919 um that's that's kind of a common
00:04:51.419 example too
00:04:55.500 and so what is what is the the problem
00:04:58.560 right with stream processing right now
00:05:02.160 essentially you know everyone here I
00:05:04.259 assume loves Java it's your favorite
00:05:05.699 language uh clearly Ruby conference must
00:05:08.880 love travel
00:05:10.259 um nothing nothing wrong with Java but
00:05:12.240 essentially if you do enough stream
00:05:14.100 processing you're going to end up with
00:05:15.360 Java somewhere Kafka is written in Java
00:05:17.639 Cafe connects written in Java Kafka
00:05:19.199 streams is written in Java Pulsar is
00:05:21.000 Java spark is Java flank is Java Java is
00:05:23.580 everywhere and that's great if you love
00:05:25.919 Java if you don't then that kind of
00:05:28.800 sucks
00:05:30.479 um so that's sort of one major obstacle
00:05:32.880 with stream processing especially for
00:05:34.199 everyone else
00:05:36.060 and then the other sort of major part of
00:05:38.400 it is stream processing introduces a ton
00:05:40.860 of new sort of patterns and paradigms
00:05:42.660 that aren't really common elsewhere so
00:05:44.820 if you're used to building web
00:05:46.199 applications with a regular request
00:05:47.639 response cycle now you have to worry
00:05:49.800 about delivery semantics uh is it at
00:05:51.960 least once is that most ones is it
00:05:54.240 exactly once with scare quotes
00:05:57.440 you know ordering guarantees what are
00:06:00.600 they is it strictly ordered is it
00:06:02.580 globally ordered is some subset of it
00:06:04.740 ordered late delivery is something you
00:06:07.320 don't typically have to deal with you
00:06:09.539 might get a message seconds later or
00:06:11.820 days later or even weeks later what do
00:06:13.740 you do with that message
00:06:15.600 and then you get duplicates that's kind
00:06:18.120 of an annoying one that's pretty common
00:06:19.380 especially when the default is at least
00:06:21.539 once in many cases
00:06:23.960 then
00:06:25.500 you have to think about partitions and
00:06:27.000 topics so if you work with Kafka
00:06:29.100 partitions are the scaling unit and so
00:06:31.440 you really need to get it right the
00:06:32.940 first time around because changing it
00:06:34.319 later is painful
00:06:35.880 and so these are all things that you
00:06:38.039 don't typically have to worry about and
00:06:39.840 you don't want to worry about
00:06:41.720 it's not something that you should
00:06:43.740 really care about you should just use
00:06:45.180 the tools someone else should should
00:06:47.160 worry about these things
00:06:52.020 another major part of it is where do you
00:06:54.360 deploy this stuff so if you have a
00:06:56.160 stream processing application it does
00:06:57.960 something useful now what where do you
00:07:00.419 run it and how do you maintain it and
00:07:02.400 how do you make sure that it runs
00:07:03.660 consistently performs well all the stuff
00:07:07.199 um so the easy answer is
00:07:10.080 yeah it's easy all you need to do is set
00:07:12.780 up a VPC set up your subnets IPS
00:07:16.139 configure some security groups spin up
00:07:17.940 some ec2 instances deploy kubernetes
00:07:20.520 provision Kafka create topics create
00:07:22.259 partitions wire everything up make sure
00:07:24.660 Echoes are in place
00:07:26.720 you know configure 8 000 million
00:07:29.580 different things wire everything up and
00:07:31.740 yeah it's good that's all you need to do
00:07:33.960 so just do this thing
00:07:38.039 um yeah so it's it's not it's not easy
00:07:40.979 um if you look at some of the AWS guys
00:07:42.419 for for setting up vanilla kubernetes
00:07:44.539 it's like 60 Pages
00:07:47.880 um and 10 of those pages are like create
00:07:49.740 your VPC and configure everything
00:07:51.419 correctly the first time because if you
00:07:53.880 get the subnets wrong then it really is
00:07:56.039 super painful to fix it
00:07:58.380 um so that's a big part of that is you
00:08:00.240 know once we have this thing where do we
00:08:02.160 run it and how do we make sure that it
00:08:03.479 runs consistently and performs well and
00:08:05.940 does all the things that we needed to do
00:08:09.240 so that's kind of the the problem space
00:08:10.919 that we're trying to tackle
00:08:14.400 so for us at maroxa our answer is is
00:08:17.340 basically uh turbine and the Rockstar
00:08:19.680 data platform and so it's turbine is
00:08:22.080 sort of the tool chain and the data
00:08:24.000 platform is is the platform as a service
00:08:25.440 that runs the the tool chain
00:08:29.759 so I'm going to dig into turbine a
00:08:31.139 little bit
00:08:31.979 um
00:08:32.640 essentially turbine is is the framework
00:08:34.500 that we work with it's actually a family
00:08:36.419 of Frameworks for various languages we
00:08:39.479 started with go and JavaScript and
00:08:41.219 Python and at this conference we're
00:08:43.320 making a turbine available for Ruby as
00:08:45.779 well
00:08:46.440 and so each turbine framework is sort of
00:08:49.500 individually handcrafted for that
00:08:51.540 particular language to follow idiomatic
00:08:54.600 practices for that language and so that
00:08:56.640 it looks familiar and you know works in
00:08:59.399 the way that you expect it to as someone
00:09:01.200 who writes Ruby day in Day Out
00:09:04.740 the other sort of main focus for for
00:09:06.720 Turbine is we've introduced an API
00:09:10.140 that exposes a high level sort of
00:09:12.540 abstract abstraction on top of these
00:09:14.700 common things so as long as you can
00:09:17.399 assign variables call methods then you
00:09:20.279 should be able to create a sort of Rich
00:09:22.380 stream processing applications
00:09:25.560 uh the other sort of key part for us is
00:09:27.600 you can write custom logic in that
00:09:29.760 language and so if you're using turbine
00:09:31.740 for Ruby you can write logic in Ruby in
00:09:35.580 familiar Ruby that looks like Ruby
00:09:37.320 doesn't introduce any weird dsls or
00:09:39.180 anything it also lets you import
00:09:40.920 rubygems that you might already have or
00:09:43.080 might already exist online so you can
00:09:44.880 import those in and use them with your
00:09:46.500 your turbine app to actually help you
00:09:48.600 process these these events
00:09:55.560 so this is what it looks like so this is
00:09:57.899 a turbine app
00:09:59.880 it's obviously a very a simple example
00:10:01.740 but you can kind of expand this as you
00:10:03.720 go along but it should look very
00:10:06.240 familiar uh it's very much inspired by
00:10:08.220 the racket API and so it should look
00:10:10.380 pretty familiar to to anyone who's been
00:10:12.180 writing Ruby for for any amount of time
00:10:15.000 um essentially we expose a number of
00:10:17.100 methods that allow you to tap into a
00:10:19.800 resource in this case it's a database
00:10:22.980 resource named demo PG
00:10:25.200 and then you pull records out of a table
00:10:28.080 called events you process them with a a
00:10:31.800 process called pass-through and then you
00:10:34.200 write it to the same database in a
00:10:36.480 different collection and so what you'd
00:10:38.339 expect here is you're basically creating
00:10:40.320 a very simple pipeline that pulls data
00:10:42.240 from one place processes it with the
00:10:44.339 function pass through which is actually
00:10:45.540 written below
00:10:47.480 and then writes it out into the database
00:10:51.720 and so that's turbine itself that's
00:10:53.519 really the framework that you would
00:10:54.540 write
00:10:55.500 um these data apps in
00:10:57.899 the other major part of the tool chain
00:11:00.060 for us is actually the platform itself
00:11:02.700 and so this is the the platform as a
00:11:05.220 service in our case and so it's a fully
00:11:07.200 managed platform as a service that's
00:11:09.899 designed to host and run turbine apps
00:11:12.899 essentially we handle the operational
00:11:14.579 burden of running this thing wiring up
00:11:17.060 monitoring the sort of underlying
00:11:20.519 instances and the components of making
00:11:21.779 sure that it's healthy and it continues
00:11:23.399 to run
00:11:24.720 um a lot of the magic around
00:11:25.640 automatically figuring out how to do
00:11:28.019 things or the heavy lifting is handled
00:11:30.660 by the platform and so it'll reach out
00:11:32.579 and look at resources and figure out how
00:11:34.500 best to get data out of them
00:11:36.600 um and sort of automatically configure
00:11:38.519 these connectors and and pipeline
00:11:41.279 components to achieve that
00:11:44.820 um
00:11:45.600 and then when you actually deploy your
00:11:48.779 turbine app this custom logic that you
00:11:50.519 wrote so the pass-through function that
00:11:52.200 gets packaged up into a container and
00:11:54.120 deployed onto the platform the platform
00:11:55.980 contains a sort of serverless functions
00:11:58.500 component that's where that function
00:12:00.480 goes and it's responsible for scaling it
00:12:02.880 independently and so as you get more
00:12:05.040 events coming in it'll scale up those
00:12:07.079 functions to process more of those
00:12:08.880 events
00:12:10.680 so that's just the managed side of it
00:12:14.579 so here's a very high level architectury
00:12:17.160 type view of it so essentially pulling
00:12:20.160 in data from somewhere it figures out
00:12:21.779 how best to do that it puts it into a
00:12:24.959 durable store where it can rewind and
00:12:27.240 replay and kind of act as a shock
00:12:28.800 absorber it applies your turbine
00:12:31.620 function across all those events and
00:12:34.260 then whatever the results are go back
00:12:35.700 out through some connector or mini
00:12:37.740 connectors into wherever the destination
00:12:40.560 resources and so everything in the sort
00:12:43.740 of dotted box in the middle that's the
00:12:45.540 platform itself and it just handles it
00:12:46.920 for you
00:12:50.040 so I'm going to attempt a live demo
00:12:54.300 we'll see we'll see how that goes
00:12:58.079 all right
00:13:02.040 there we go
00:13:06.139 all right
00:13:22.320 all right
00:13:23.880 so here you can see
00:13:26.760 a turbine app that I wrote previously
00:13:30.180 essentially it implements that
00:13:32.279 enrichment use case so here we're
00:13:35.040 actually requiring existing clear bit
00:13:37.139 gem so that's the gem that exists open
00:13:39.660 source I just pulled in
00:13:41.519 we're pulling this we're using this
00:13:43.620 database called demo PG similar to the
00:13:45.839 example I included we have two types of
00:13:48.060 apis there's the chaining sort of based
00:13:50.220 fluent API as well as a more sort of
00:13:52.800 traditional procedural one so that's the
00:13:54.839 one I'm using here
00:13:56.639 so basically I'm saying take the records
00:13:58.920 out of a collection called events
00:14:01.019 process them using this enrich function
00:14:04.200 which I've written below and then write
00:14:06.000 out the results in events underscore
00:14:08.399 copy and so this is the enrich function
00:14:11.040 it's fairly contrived but actually does
00:14:13.500 something useful if you're not familiar
00:14:15.899 with clearbit it's one of the services
00:14:17.519 where you give it some information about
00:14:20.040 aptically user and it has a database of
00:14:23.339 users and a ton of information about
00:14:25.019 them so in this case I'm forwarding the
00:14:28.200 email of a user and then it's returning
00:14:31.620 back some information like the company's
00:14:34.139 legal name for the employer and then the
00:14:37.320 location of that person
00:14:40.279 and so this is the the data that I'm
00:14:43.019 actually feeding it so
00:14:44.699 turbine ships with a sort of local
00:14:46.800 development mode where you can kind of
00:14:48.360 iterate quickly and have this very fast
00:14:50.160 feedback loop
00:14:51.600 um where you can use fixture data or
00:14:53.339 sampled records to actually run it
00:14:54.720 through your pipeline and say like does
00:14:56.760 it do what I think it does or does it do
00:14:58.740 what I need it to do and you can
00:14:59.820 probably test against it and everything
00:15:01.079 and then once you're happy with that
00:15:03.120 functionality you can deploy it onto the
00:15:04.560 platform and so this is an example
00:15:06.300 record that I created
00:15:08.339 um so the actual value of the record has
00:15:11.519 an activity which is logged in and has
00:15:13.500 my email address
00:15:14.820 and so
00:15:16.380 what I hope happens uh is when I execute
00:15:19.740 it locally it should take my email
00:15:21.440 process it through this custom function
00:15:23.459 hit the clearbit API fetch some
00:15:25.260 additional details and say this is what
00:15:27.000 would have happened had you deployed
00:15:28.500 this live
00:15:30.899 and so
00:15:33.000 we have Marx's CLI
00:15:37.019 so essentially this is the local
00:15:38.459 execution command and it basically
00:15:40.800 threads your record through and it shows
00:15:42.899 you what what would have happened
00:15:45.600 um and so here it did work so you can
00:15:48.300 see that it says it fetched this record
00:15:51.240 which I showed earlier which just had my
00:15:53.699 email address
00:15:54.720 and then it augmented that enriched that
00:15:57.300 data with the company rocks Inc and
00:15:59.820 location San Francisco
00:16:02.399 um and that's it so essentially it did
00:16:05.100 what I thought it did now I'm happy with
00:16:06.899 it I can deploy onto the platform and
00:16:08.339 the platform will package all these
00:16:09.480 components and deploy it into a
00:16:11.399 continuously running pipeline
00:16:14.639 So yeah thank you
00:16:23.880 all right
00:16:28.440 so
00:16:29.940 what's next for for Turbine and moronsa
00:16:34.320 essentially right now turbine RB
00:16:36.959 um or turbine for Ruby we basically
00:16:40.259 recently made it it's still in a
00:16:43.139 relatively early developer preview and
00:16:45.120 we're looking for for feedback we want
00:16:46.980 people to use it we want people to to
00:16:48.600 try it out and actually tell us how to
00:16:49.980 improve it we are super focused on
00:16:52.860 developer experience and so we want to
00:16:55.320 make it great for for developers and so
00:16:58.139 yeah we want people to sign up use it
00:16:59.940 and tell us tell us what they think and
00:17:01.920 tell us how we can improve it one of the
00:17:04.020 things that was relative relatively
00:17:05.760 recent for us is
00:17:07.760 Ruby 3.1 introduced the idea of a value
00:17:10.919 object or the the data class which
00:17:14.280 introduces sort of an immutable struct
00:17:16.740 essentially that seems like it would be
00:17:18.720 pretty good for this kind of use case
00:17:20.100 where records come into the platform as
00:17:22.919 an immutable object and you use sort of
00:17:24.600 methods defined on it to to manipulate
00:17:26.339 this so that's something I would like to
00:17:27.480 consider but again we'd love to hear
00:17:29.280 from from users and say this is what we
00:17:31.320 want or this API sucks and you should do
00:17:33.780 something else
00:17:35.580 um another sort of major component that
00:17:37.200 we're we're working on is a native
00:17:39.419 stateful processing so stateful
00:17:41.880 processing is is kind of a big problem
00:17:43.559 space to to solve
00:17:45.600 um right now on the platform you can
00:17:48.299 Implement stateful processing but the
00:17:51.179 burden is on you to persist data
00:17:53.340 somewhere so you might have some sort of
00:17:55.679 redist or or a database or something
00:17:57.419 like that soon we hope to have that
00:17:59.880 natively built into the platform and so
00:18:02.100 you can just
00:18:03.600 magically assume that there is some
00:18:05.580 persistence available to every function
00:18:07.140 and if you write something to that it
00:18:08.700 will just be available everywhere
00:18:10.799 um part of the part of the functionality
00:18:12.780 is joins and so being able to do stream
00:18:14.580 joins natively without relying on
00:18:16.020 anything external would be enabled by
00:18:18.360 the native stateful processing another
00:18:21.240 major component that we're kind of
00:18:22.679 digging into is CI CD integration and so
00:18:25.500 I think
00:18:26.700 um for us to make this functionality
00:18:29.340 turbine and writing stream processing
00:18:31.200 really available to all software
00:18:33.299 Engineers it needs to play nice with you
00:18:35.880 know traditional or common CI CD
00:18:37.919 practices so you should be able to write
00:18:39.660 a screen processing application
00:18:41.400 alongside your rails application or your
00:18:43.559 Ruby application and have them sort of
00:18:45.600 deployed in lockstep together you can
00:18:48.059 already import those objects on those
00:18:50.340 models so why not have them deployed
00:18:53.220 together right if you change something
00:18:55.080 in your app that would effectively break
00:18:57.240 your stream processing application they
00:18:59.160 should both be blocked on success
00:19:01.260 successful deploys on both
00:19:03.240 so that's something that we're we're
00:19:04.980 actively digging into right now
00:19:10.080 so if you want to access the developer
00:19:12.660 preview you can take a picture of the QR
00:19:16.440 code that'll take you to a landing page
00:19:18.179 where you just show your interest you
00:19:21.059 can also win a meta Quest 2 by filling
00:19:23.700 that out
00:19:24.799 so yeah sign up and kind of let us know
00:19:28.700 what you want to do with it and how you
00:19:31.080 you'd like to use it and we'll try to
00:19:33.539 onboard as many people as quickly as
00:19:35.160 possible
00:19:43.080 all right
00:19:45.120 um
00:19:47.280 questions
00:19:48.419 we have plenty of time for questions so
00:19:50.880 if anyone has any we can address it now
00:19:53.220 otherwise you can catch up with me
00:19:55.980 yeah so the question was what's the main
00:19:58.980 difference between our platform and
00:20:01.500 using a serverless function platform
00:20:04.100 so in the case of the serverless
00:20:06.780 functions you still have to have
00:20:08.220 infrastructure to deliver your records
00:20:09.960 to that serverless function right in the
00:20:12.720 case of the the market platform you're
00:20:14.760 deploying this application that's
00:20:15.960 running continuously and so it's doing a
00:20:18.539 fair bit more than just integrating with
00:20:20.100 a serverless function so the platform
00:20:22.080 does the heavy lifting in terms of
00:20:23.340 pulling data out so I kind of glossed
00:20:25.140 over it very lightly but if you point
00:20:28.440 the platform to a postgres database it
00:20:30.720 will actually reach out and inspect the
00:20:32.100 database and look at what version it's
00:20:33.539 running what credentials you provided
00:20:35.400 whether it can set up logical
00:20:36.780 replication or not what extensions are
00:20:38.700 available and if it can it will set up a
00:20:41.220 logical replication slot with CDC so you
00:20:44.280 get very low latency High throughput
00:20:46.500 sort of change data capture into your
00:20:50.100 function and your function is being
00:20:51.539 triggered continuously against that so
00:20:54.240 yeah it's a lot more of the the sort of
00:20:56.220 complete pipeline rather than just that
00:20:58.500 function
00:21:00.960 related to that you could actually call
00:21:02.880 third-party functions like you could
00:21:04.740 deploy some logic or maybe you already
00:21:06.539 have logic on Lambda from our function
00:21:08.880 you can say every time I get an event
00:21:10.200 trigger this serverless function and
00:21:12.660 then take the result and put it into
00:21:13.799 something else
00:21:15.360 sure so the question is what did we
00:21:17.520 build the CLI with and how is it
00:21:18.960 installed
00:21:20.520 um the CLI is built using Cobra which is
00:21:23.160 a a go framework for writing clis it's
00:21:27.000 the same one that Coupe cuddle is is
00:21:28.380 written in and you can install it on Mac
00:21:32.159 using Homebrew
00:21:34.500 um
00:21:35.400 Linux also through Homebrew but we also
00:21:38.460 which is weird because nobody uses hover
00:21:40.860 on Linux
00:21:42.179 um but it's there but we also build
00:21:44.520 binaries we use go releaser to actually
00:21:46.500 generate
00:21:47.700 um binaries for multiple architectures
00:21:49.799 and multiple platforms
00:21:51.720 um and so yeah if you go to it's
00:21:54.000 actually open source as well so if you
00:21:55.200 go to GitHub maroxa CLI you can see all
00:21:59.280 the code for the CLI and all the tooling
00:22:01.080 and GitHub actions and everything we use
00:22:02.880 around generating it it's definitely
00:22:05.400 worth checking out we've invested a lot
00:22:07.020 of time in in a builder pattern for
00:22:09.179 creating new commands very easily I know
00:22:11.340 it's in go but it's it's worth checking
00:22:13.559 out either way
00:22:14.640 sure so the question is what was I
00:22:16.620 running locally to enable Rockstar apps
00:22:19.080 run and how does it compare to what is
00:22:21.720 run on the platform when I run rocks
00:22:23.820 apps deploy
00:22:25.220 so essentially we try to mimic the same
00:22:29.039 experience so that you have this fast
00:22:31.500 feedback loop locally and so we're
00:22:34.620 moving towards this
00:22:36.840 unified back end for enabling multiple
00:22:39.659 languages so right now we support go
00:22:41.460 JavaScript and Python and Ruby and so
00:22:44.880 it's the same functional backend even
00:22:47.280 locally so when you execute the local
00:22:49.320 execution it threads your records
00:22:51.480 through your function and then feeds it
00:22:53.400 back into it when you run rocks apps
00:22:55.799 deploy it does something very different
00:22:58.620 but the end result is effectively the
00:23:00.480 same and actually
00:23:02.220 ships your package it builds a container
00:23:04.740 out of your code and then ships it to
00:23:06.960 the platform and the platform wires up
00:23:08.460 all these components
00:23:10.860 it's a lot of technical stuff I'm happy
00:23:13.020 to go into much more detail with anyone
00:23:14.700 who wants to to discuss it
00:23:18.059 uh so the question is how do multiple
00:23:19.919 developers uh working locally
00:23:22.679 collaborate on the same sort of
00:23:25.080 deployment the same app
00:23:28.580 so essentially one of the things we do
00:23:31.679 is we with a local development
00:23:33.299 environment you can actually run a
00:23:35.760 command that pulls sample data from a
00:23:38.220 development database or staging database
00:23:39.840 and lets you iterate on it locally but
00:23:43.200 then we also use the typical git
00:23:45.360 workflow so you're building your stream
00:23:47.340 your data app and you're committing it
00:23:49.500 to GitHub and so you can kind of
00:23:52.799 lean on the same workflows that you
00:23:54.600 normally have around collaborating so
00:23:56.580 you are creating PRS with your stream
00:23:59.220 processing application you know getting
00:24:00.720 feedback and comments and everything at
00:24:02.159 the same time so we aren't necessarily
00:24:05.280 diverging from that our goal is actually
00:24:06.960 to map as closely as possible to what
00:24:09.240 you normally do with software
00:24:10.320 development so you follow the same
00:24:12.360 workflows that you normally have
00:24:14.120 you'd write some code you push a PR you
00:24:17.100 run some tests you get some feedback you
00:24:18.960 iterate on that and then eventually you
00:24:21.539 deploy the thing that you know works
00:24:23.520 when you're happy with it
00:24:27.320 yeah so sure so the local development
00:24:30.659 experience doesn't actually rely on any
00:24:33.299 databases it sort of assimilates what
00:24:35.100 that database would be so in the example
00:24:36.840 that I used today it simulates getting a
00:24:39.900 record from postgres
00:24:41.400 by actually sampling sampling a record
00:24:44.280 from postgres and says this is what the
00:24:45.900 record looks like and it stores it
00:24:47.520 locally in this demo.json file so it
00:24:50.940 includes a bunch of sample records and
00:24:52.500 then that's what you're iterating on
00:24:53.640 locally so you don't need postgres
00:24:55.860 um
00:24:56.580 the way the turbine framework is
00:24:58.559 designed it's actually entirely agnostic
00:25:01.200 of the real resource and so I can go in
00:25:03.539 and change demo PG to demo and the
00:25:07.080 code Works in exactly the same way
00:25:08.280 because it's the platform that's doing
00:25:09.659 that translation as far as the turbine
00:25:11.580 function is concerned I get a record
00:25:13.500 that looks like this and I'm applying
00:25:15.539 some Transformations and I'm pushing out
00:25:17.039 a record in that format
00:25:19.320 the platform is the thing that's
00:25:20.760 responsible for pulling the record from
00:25:22.380 postgres and giving it to the turbine
00:25:24.600 function
00:25:25.860 all right I guess that's it for me
00:25:28.679 thank you very much