Building Stream Processing Applications with Ruby & Meroxa

00:00:00.000 ready for takeoff

00:00:18.900 so welcome my name is Ali I'm going to

00:00:22.800 talk about stream processing with Ruby

00:00:25.500 and specifically turbine RP

00:00:29.220 so here's a quick agenda essentially

00:00:31.080 this is what we're going to cover so you

00:00:32.399 know what you're getting yourself into

00:00:36.059 let me start off with a little a little

00:00:38.700 bit about myself uh who am I why you

00:00:41.100 should trust me

00:00:43.200 so this is me I'm the CTO at one of two

00:00:46.440 co-founders uh davaris the other

00:00:48.420 co-founder is right there

00:00:49.920 at moroxo and uh previously before

00:00:53.700 starting roxa I was a lead engineer at

00:00:56.340 Heroku specifically on the heruku dates

00:00:58.140 team mainly working on uh heroku's Kafka

00:01:01.260 offering where my team managed thousands

00:01:05.400 of Kafka clusters for tens of thousands

00:01:07.680 of customers uh before that I built a

00:01:10.799 system at a targeted advertising company

00:01:12.299 that queried over 2 billion user

00:01:14.580 profiles uh in real time

00:01:16.799 and then way way way before that I built

00:01:19.380 analytics pipelines for mobile apps

00:01:21.780 um processing regularly over 100 000

00:01:24.740 events per second

00:01:26.700 and so basically I've been doing this

00:01:28.619 for for quite a while working in and

00:01:30.360 around the data space

00:01:34.680 so stream processing what is it and why

00:01:38.159 you should care

00:01:40.680 so specifically in stream processing

00:01:43.979 what I what I mean by string processing

00:01:45.600 is really about taking an unbounded

00:01:47.720 sequence of events uh continuous

00:01:50.460 unbounded sequence of events and

00:01:52.079 applying some sort of computation or

00:01:53.700 transformation to it

00:01:55.140 I'm intentionally avoiding the term real

00:01:57.060 time

00:01:58.259 um that's generally implied but there's

00:02:00.479 no

00:02:01.380 generally accepted agreed upon

00:02:03.600 definition for real time but essentially

00:02:06.899 not batch whatever that means to you

00:02:10.979 so some examples of stream processing in

00:02:13.080 general are filtering to get a number of

00:02:15.300 events you want to drop some of them

00:02:16.879 enrichment you want to take each event

00:02:18.900 and you want to augment it with some

00:02:20.160 additional information

00:02:21.739 aggregation where you want to do some

00:02:23.879 sort of processing across a number of

00:02:25.200 them maybe count them some of them do

00:02:27.060 some sort of calculation there joins

00:02:30.360 typically is similar to a SQL join you

00:02:33.660 want to take two sets of data mash them

00:02:35.400 together by some common element and then

00:02:37.980 routing is kind of another one where you

00:02:40.680 want some events to go in place and

00:02:42.180 other events to go somewhere else

00:02:46.379 um some some common use cases that

00:02:48.599 should be familiar to most people

00:02:51.060 um you know analytics is probably one of

00:02:52.680 the most common uh it's one of the most

00:02:54.599 common ones that we see at least and

00:02:56.640 essentially you're taking data from a

00:02:58.800 number of different sources it could be

00:03:00.180 your operational database maybe it's a

00:03:01.920 postgres database back in your rails

00:03:03.480 application you're taking some data from

00:03:05.640 you know support tickets in in zendesk

00:03:07.620 and maybe some CRM data from Salesforce

00:03:09.840 and you're pulling them all into a

00:03:12.120 single data warehouse where your data

00:03:14.280 scientists run some queries and and sort

00:03:16.319 of derive some some insight out of

00:03:19.560 um another common use case is

00:03:20.940 replication or just and disaster

00:03:22.379 recovery

00:03:23.459 and so here you're continuously and

00:03:25.739 hopefully immediately pulling data from

00:03:29.099 one place and putting it into some other

00:03:31.400 region or data center or or Cloud even

00:03:35.340 um across you know geographical

00:03:37.019 distances in order to have a uh another

00:03:40.440 place that you can recover from this

00:03:42.840 could also be different database types

00:03:45.299 so maybe you're doing postgres on RDS

00:03:48.120 and AWS and you're copying over to SQL

00:03:51.180 server and Azure on on a different uh

00:03:53.760 different side of the the country

00:03:57.480 um enrichment is another very common one

00:03:59.280 for us so essentially you're taking some

00:04:01.860 data uh maybe it's a user sign up and

00:04:05.159 you want to add some additional

00:04:06.959 information to make that data more

00:04:08.280 useful to you so maybe you look up their

00:04:10.680 email with some third-party service that

00:04:13.560 gives you a little bit more information

00:04:14.459 about them maybe the company the role or

00:04:16.799 whatever it is and then you're taking

00:04:18.299 that sort of fatter enriched record and

00:04:20.220 then you're putting it somewhere else so

00:04:21.540 you can use it maybe it's back in your

00:04:22.740 operational database maybe it's in your

00:04:24.660 your data warehouse

00:04:26.400 and then uh I've listed integration

00:04:29.040 which is a super vague general catch-all

00:04:31.560 for like everything else and essentially

00:04:34.620 taking your data and putting it

00:04:36.419 somewhere else where it can be used by

00:04:37.919 someone else

00:04:38.960 this could be third parties it could be

00:04:41.699 other teams maybe you scrub the pii out

00:04:45.540 of your stream of data and you make it

00:04:47.340 available for a partner to use

00:04:49.919 um that's that's kind of a common

00:04:51.419 example too

00:04:55.500 and so what is what is the the problem

00:04:58.560 right with stream processing right now

00:05:02.160 essentially you know everyone here I

00:05:04.259 assume loves Java it's your favorite

00:05:05.699 language uh clearly Ruby conference must

00:05:08.880 love travel

00:05:10.259 um nothing nothing wrong with Java but

00:05:12.240 essentially if you do enough stream

00:05:14.100 processing you're going to end up with

00:05:15.360 Java somewhere Kafka is written in Java

00:05:17.639 Cafe connects written in Java Kafka

00:05:19.199 streams is written in Java Pulsar is

00:05:21.000 Java spark is Java flank is Java Java is

00:05:23.580 everywhere and that's great if you love

00:05:25.919 Java if you don't then that kind of

00:05:28.800 sucks

00:05:30.479 um so that's sort of one major obstacle

00:05:32.880 with stream processing especially for

00:05:34.199 everyone else

00:05:36.060 and then the other sort of major part of

00:05:38.400 it is stream processing introduces a ton

00:05:40.860 of new sort of patterns and paradigms

00:05:42.660 that aren't really common elsewhere so

00:05:44.820 if you're used to building web

00:05:46.199 applications with a regular request

00:05:47.639 response cycle now you have to worry

00:05:49.800 about delivery semantics uh is it at

00:05:51.960 least once is that most ones is it

00:05:54.240 exactly once with scare quotes

00:05:57.440 you know ordering guarantees what are

00:06:00.600 they is it strictly ordered is it

00:06:02.580 globally ordered is some subset of it

00:06:04.740 ordered late delivery is something you

00:06:07.320 don't typically have to deal with you

00:06:09.539 might get a message seconds later or

00:06:11.820 days later or even weeks later what do

00:06:13.740 you do with that message

00:06:15.600 and then you get duplicates that's kind

00:06:18.120 of an annoying one that's pretty common

00:06:19.380 especially when the default is at least

00:06:21.539 once in many cases

00:06:23.960 then

00:06:25.500 you have to think about partitions and

00:06:27.000 topics so if you work with Kafka

00:06:29.100 partitions are the scaling unit and so

00:06:31.440 you really need to get it right the

00:06:32.940 first time around because changing it

00:06:34.319 later is painful

00:06:35.880 and so these are all things that you

00:06:38.039 don't typically have to worry about and

00:06:39.840 you don't want to worry about

00:06:41.720 it's not something that you should

00:06:43.740 really care about you should just use

00:06:45.180 the tools someone else should should

00:06:47.160 worry about these things

00:06:52.020 another major part of it is where do you

00:06:54.360 deploy this stuff so if you have a

00:06:56.160 stream processing application it does

00:06:57.960 something useful now what where do you

00:07:00.419 run it and how do you maintain it and

00:07:02.400 how do you make sure that it runs

00:07:03.660 consistently performs well all the stuff

00:07:07.199 um so the easy answer is

00:07:10.080 yeah it's easy all you need to do is set

00:07:12.780 up a VPC set up your subnets IPS

00:07:16.139 configure some security groups spin up

00:07:17.940 some ec2 instances deploy kubernetes

00:07:20.520 provision Kafka create topics create

00:07:22.259 partitions wire everything up make sure

00:07:24.660 Echoes are in place

00:07:26.720 you know configure 8 000 million

00:07:29.580 different things wire everything up and

00:07:31.740 yeah it's good that's all you need to do

00:07:33.960 so just do this thing

00:07:38.039 um yeah so it's it's not it's not easy

00:07:40.979 um if you look at some of the AWS guys

00:07:42.419 for for setting up vanilla kubernetes

00:07:44.539 it's like 60 Pages

00:07:47.880 um and 10 of those pages are like create

00:07:49.740 your VPC and configure everything

00:07:51.419 correctly the first time because if you

00:07:53.880 get the subnets wrong then it really is

00:07:56.039 super painful to fix it

00:07:58.380 um so that's a big part of that is you

00:08:00.240 know once we have this thing where do we

00:08:02.160 run it and how do we make sure that it

00:08:03.479 runs consistently and performs well and

00:08:05.940 does all the things that we needed to do

00:08:09.240 so that's kind of the the problem space

00:08:10.919 that we're trying to tackle

00:08:14.400 so for us at maroxa our answer is is

00:08:17.340 basically uh turbine and the Rockstar

00:08:19.680 data platform and so it's turbine is

00:08:22.080 sort of the tool chain and the data

00:08:24.000 platform is is the platform as a service

00:08:25.440 that runs the the tool chain

00:08:29.759 so I'm going to dig into turbine a

00:08:31.139 little bit

00:08:31.979 um

00:08:32.640 essentially turbine is is the framework

00:08:34.500 that we work with it's actually a family

00:08:36.419 of Frameworks for various languages we

00:08:39.479 started with go and JavaScript and

00:08:41.219 Python and at this conference we're

00:08:43.320 making a turbine available for Ruby as

00:08:45.779 well

00:08:46.440 and so each turbine framework is sort of

00:08:49.500 individually handcrafted for that

00:08:51.540 particular language to follow idiomatic

00:08:54.600 practices for that language and so that

00:08:56.640 it looks familiar and you know works in

00:08:59.399 the way that you expect it to as someone

00:09:01.200 who writes Ruby day in Day Out

00:09:04.740 the other sort of main focus for for

00:09:06.720 Turbine is we've introduced an API

00:09:10.140 that exposes a high level sort of

00:09:12.540 abstract abstraction on top of these

00:09:14.700 common things so as long as you can

00:09:17.399 assign variables call methods then you

00:09:20.279 should be able to create a sort of Rich

00:09:22.380 stream processing applications

00:09:25.560 uh the other sort of key part for us is

00:09:27.600 you can write custom logic in that

00:09:29.760 language and so if you're using turbine

00:09:31.740 for Ruby you can write logic in Ruby in

00:09:35.580 familiar Ruby that looks like Ruby

00:09:37.320 doesn't introduce any weird dsls or

00:09:39.180 anything it also lets you import

00:09:40.920 rubygems that you might already have or

00:09:43.080 might already exist online so you can

00:09:44.880 import those in and use them with your

00:09:46.500 your turbine app to actually help you

00:09:48.600 process these these events

00:09:55.560 so this is what it looks like so this is

00:09:57.899 a turbine app

00:09:59.880 it's obviously a very a simple example

00:10:01.740 but you can kind of expand this as you

00:10:03.720 go along but it should look very

00:10:06.240 familiar uh it's very much inspired by

00:10:08.220 the racket API and so it should look

00:10:10.380 pretty familiar to to anyone who's been

00:10:12.180 writing Ruby for for any amount of time

00:10:15.000 um essentially we expose a number of

00:10:17.100 methods that allow you to tap into a

00:10:19.800 resource in this case it's a database

00:10:22.980 resource named demo PG

00:10:25.200 and then you pull records out of a table

00:10:28.080 called events you process them with a a

00:10:31.800 process called pass-through and then you

00:10:34.200 write it to the same database in a

00:10:36.480 different collection and so what you'd

00:10:38.339 expect here is you're basically creating

00:10:40.320 a very simple pipeline that pulls data

00:10:42.240 from one place processes it with the

00:10:44.339 function pass through which is actually

00:10:45.540 written below

00:10:47.480 and then writes it out into the database

00:10:51.720 and so that's turbine itself that's

00:10:53.519 really the framework that you would

00:10:54.540 write

00:10:55.500 um these data apps in

00:10:57.899 the other major part of the tool chain

00:11:00.060 for us is actually the platform itself

00:11:02.700 and so this is the the platform as a

00:11:05.220 service in our case and so it's a fully

00:11:07.200 managed platform as a service that's

00:11:09.899 designed to host and run turbine apps

00:11:12.899 essentially we handle the operational

00:11:14.579 burden of running this thing wiring up

00:11:17.060 monitoring the sort of underlying

00:11:20.519 instances and the components of making

00:11:21.779 sure that it's healthy and it continues

00:11:23.399 to run

00:11:24.720 um a lot of the magic around

00:11:25.640 automatically figuring out how to do

00:11:28.019 things or the heavy lifting is handled

00:11:30.660 by the platform and so it'll reach out

00:11:32.579 and look at resources and figure out how

00:11:34.500 best to get data out of them

00:11:36.600 um and sort of automatically configure

00:11:38.519 these connectors and and pipeline

00:11:41.279 components to achieve that

00:11:44.820 um

00:11:45.600 and then when you actually deploy your

00:11:48.779 turbine app this custom logic that you

00:11:50.519 wrote so the pass-through function that

00:11:52.200 gets packaged up into a container and

00:11:54.120 deployed onto the platform the platform

00:11:55.980 contains a sort of serverless functions

00:11:58.500 component that's where that function

00:12:00.480 goes and it's responsible for scaling it

00:12:02.880 independently and so as you get more

00:12:05.040 events coming in it'll scale up those

00:12:07.079 functions to process more of those

00:12:08.880 events

00:12:10.680 so that's just the managed side of it

00:12:14.579 so here's a very high level architectury

00:12:17.160 type view of it so essentially pulling

00:12:20.160 in data from somewhere it figures out

00:12:21.779 how best to do that it puts it into a

00:12:24.959 durable store where it can rewind and

00:12:27.240 replay and kind of act as a shock

00:12:28.800 absorber it applies your turbine

00:12:31.620 function across all those events and

00:12:34.260 then whatever the results are go back

00:12:35.700 out through some connector or mini

00:12:37.740 connectors into wherever the destination

00:12:40.560 resources and so everything in the sort

00:12:43.740 of dotted box in the middle that's the

00:12:45.540 platform itself and it just handles it

00:12:46.920 for you

00:12:50.040 so I'm going to attempt a live demo

00:12:54.300 we'll see we'll see how that goes

00:12:58.079 all right

00:13:02.040 there we go

00:13:06.139 all right

00:13:22.320 all right

00:13:23.880 so here you can see

00:13:26.760 a turbine app that I wrote previously

00:13:30.180 essentially it implements that

00:13:32.279 enrichment use case so here we're

00:13:35.040 actually requiring existing clear bit

00:13:37.139 gem so that's the gem that exists open

00:13:39.660 source I just pulled in

00:13:41.519 we're pulling this we're using this

00:13:43.620 database called demo PG similar to the

00:13:45.839 example I included we have two types of

00:13:48.060 apis there's the chaining sort of based

00:13:50.220 fluent API as well as a more sort of

00:13:52.800 traditional procedural one so that's the

00:13:54.839 one I'm using here

00:13:56.639 so basically I'm saying take the records

00:13:58.920 out of a collection called events

00:14:01.019 process them using this enrich function

00:14:04.200 which I've written below and then write

00:14:06.000 out the results in events underscore

00:14:08.399 copy and so this is the enrich function

00:14:11.040 it's fairly contrived but actually does

00:14:13.500 something useful if you're not familiar

00:14:15.899 with clearbit it's one of the services

00:14:17.519 where you give it some information about

00:14:20.040 aptically user and it has a database of

00:14:23.339 users and a ton of information about

00:14:25.019 them so in this case I'm forwarding the

00:14:28.200 email of a user and then it's returning

00:14:31.620 back some information like the company's

00:14:34.139 legal name for the employer and then the

00:14:37.320 location of that person

00:14:40.279 and so this is the the data that I'm

00:14:43.019 actually feeding it so

00:14:44.699 turbine ships with a sort of local

00:14:46.800 development mode where you can kind of

00:14:48.360 iterate quickly and have this very fast

00:14:50.160 feedback loop

00:14:51.600 um where you can use fixture data or

00:14:53.339 sampled records to actually run it

00:14:54.720 through your pipeline and say like does

00:14:56.760 it do what I think it does or does it do

00:14:58.740 what I need it to do and you can

00:14:59.820 probably test against it and everything

00:15:01.079 and then once you're happy with that

00:15:03.120 functionality you can deploy it onto the

00:15:04.560 platform and so this is an example

00:15:06.300 record that I created

00:15:08.339 um so the actual value of the record has

00:15:11.519 an activity which is logged in and has

00:15:13.500 my email address

00:15:14.820 and so

00:15:16.380 what I hope happens uh is when I execute

00:15:19.740 it locally it should take my email

00:15:21.440 process it through this custom function

00:15:23.459 hit the clearbit API fetch some

00:15:25.260 additional details and say this is what

00:15:27.000 would have happened had you deployed

00:15:28.500 this live

00:15:30.899 and so

00:15:33.000 we have Marx's CLI

00:15:37.019 so essentially this is the local

00:15:38.459 execution command and it basically

00:15:40.800 threads your record through and it shows

00:15:42.899 you what what would have happened

00:15:45.600 um and so here it did work so you can

00:15:48.300 see that it says it fetched this record

00:15:51.240 which I showed earlier which just had my

00:15:53.699 email address

00:15:54.720 and then it augmented that enriched that

00:15:57.300 data with the company rocks Inc and

00:15:59.820 location San Francisco

00:16:02.399 um and that's it so essentially it did

00:16:05.100 what I thought it did now I'm happy with

00:16:06.899 it I can deploy onto the platform and

00:16:08.339 the platform will package all these

00:16:09.480 components and deploy it into a

00:16:11.399 continuously running pipeline

00:16:14.639 So yeah thank you

00:16:23.880 all right

00:16:28.440 so

00:16:29.940 what's next for for Turbine and moronsa

00:16:34.320 essentially right now turbine RB

00:16:36.959 um or turbine for Ruby we basically

00:16:40.259 recently made it it's still in a

00:16:43.139 relatively early developer preview and

00:16:45.120 we're looking for for feedback we want

00:16:46.980 people to use it we want people to to

00:16:48.600 try it out and actually tell us how to

00:16:49.980 improve it we are super focused on

00:16:52.860 developer experience and so we want to

00:16:55.320 make it great for for developers and so

00:16:58.139 yeah we want people to sign up use it

00:16:59.940 and tell us tell us what they think and

00:17:01.920 tell us how we can improve it one of the

00:17:04.020 things that was relative relatively

00:17:05.760 recent for us is

00:17:07.760 Ruby 3.1 introduced the idea of a value

00:17:10.919 object or the the data class which

00:17:14.280 introduces sort of an immutable struct

00:17:16.740 essentially that seems like it would be

00:17:18.720 pretty good for this kind of use case

00:17:20.100 where records come into the platform as

00:17:22.919 an immutable object and you use sort of

00:17:24.600 methods defined on it to to manipulate

00:17:26.339 this so that's something I would like to

00:17:27.480 consider but again we'd love to hear

00:17:29.280 from from users and say this is what we

00:17:31.320 want or this API sucks and you should do

00:17:33.780 something else

00:17:35.580 um another sort of major component that

00:17:37.200 we're we're working on is a native

00:17:39.419 stateful processing so stateful

00:17:41.880 processing is is kind of a big problem

00:17:43.559 space to to solve

00:17:45.600 um right now on the platform you can

00:17:48.299 Implement stateful processing but the

00:17:51.179 burden is on you to persist data

00:17:53.340 somewhere so you might have some sort of

00:17:55.679 redist or or a database or something

00:17:57.419 like that soon we hope to have that

00:17:59.880 natively built into the platform and so

00:18:02.100 you can just

00:18:03.600 magically assume that there is some

00:18:05.580 persistence available to every function

00:18:07.140 and if you write something to that it

00:18:08.700 will just be available everywhere

00:18:10.799 um part of the part of the functionality

00:18:12.780 is joins and so being able to do stream

00:18:14.580 joins natively without relying on

00:18:16.020 anything external would be enabled by

00:18:18.360 the native stateful processing another

00:18:21.240 major component that we're kind of

00:18:22.679 digging into is CI CD integration and so

00:18:25.500 I think

00:18:26.700 um for us to make this functionality

00:18:29.340 turbine and writing stream processing

00:18:31.200 really available to all software

00:18:33.299 Engineers it needs to play nice with you

00:18:35.880 know traditional or common CI CD

00:18:37.919 practices so you should be able to write

00:18:39.660 a screen processing application

00:18:41.400 alongside your rails application or your

00:18:43.559 Ruby application and have them sort of

00:18:45.600 deployed in lockstep together you can

00:18:48.059 already import those objects on those

00:18:50.340 models so why not have them deployed

00:18:53.220 together right if you change something

00:18:55.080 in your app that would effectively break

00:18:57.240 your stream processing application they

00:18:59.160 should both be blocked on success

00:19:01.260 successful deploys on both

00:19:03.240 so that's something that we're we're

00:19:04.980 actively digging into right now

00:19:10.080 so if you want to access the developer

00:19:12.660 preview you can take a picture of the QR

00:19:16.440 code that'll take you to a landing page

00:19:18.179 where you just show your interest you

00:19:21.059 can also win a meta Quest 2 by filling

00:19:23.700 that out

00:19:24.799 so yeah sign up and kind of let us know

00:19:28.700 what you want to do with it and how you

00:19:31.080 you'd like to use it and we'll try to

00:19:33.539 onboard as many people as quickly as

00:19:35.160 possible

00:19:43.080 all right

00:19:45.120 um

00:19:47.280 questions

00:19:48.419 we have plenty of time for questions so

00:19:50.880 if anyone has any we can address it now

00:19:53.220 otherwise you can catch up with me

00:19:55.980 yeah so the question was what's the main

00:19:58.980 difference between our platform and

00:20:01.500 using a serverless function platform

00:20:04.100 so in the case of the serverless

00:20:06.780 functions you still have to have

00:20:08.220 infrastructure to deliver your records

00:20:09.960 to that serverless function right in the

00:20:12.720 case of the the market platform you're

00:20:14.760 deploying this application that's

00:20:15.960 running continuously and so it's doing a

00:20:18.539 fair bit more than just integrating with

00:20:20.100 a serverless function so the platform

00:20:22.080 does the heavy lifting in terms of

00:20:23.340 pulling data out so I kind of glossed

00:20:25.140 over it very lightly but if you point

00:20:28.440 the platform to a postgres database it

00:20:30.720 will actually reach out and inspect the

00:20:32.100 database and look at what version it's

00:20:33.539 running what credentials you provided

00:20:35.400 whether it can set up logical

00:20:36.780 replication or not what extensions are

00:20:38.700 available and if it can it will set up a

00:20:41.220 logical replication slot with CDC so you

00:20:44.280 get very low latency High throughput

00:20:46.500 sort of change data capture into your

00:20:50.100 function and your function is being

00:20:51.539 triggered continuously against that so

00:20:54.240 yeah it's a lot more of the the sort of

00:20:56.220 complete pipeline rather than just that

00:20:58.500 function

00:21:00.960 related to that you could actually call

00:21:02.880 third-party functions like you could

00:21:04.740 deploy some logic or maybe you already

00:21:06.539 have logic on Lambda from our function

00:21:08.880 you can say every time I get an event

00:21:10.200 trigger this serverless function and

00:21:12.660 then take the result and put it into

00:21:13.799 something else

00:21:15.360 sure so the question is what did we

00:21:17.520 build the CLI with and how is it

00:21:18.960 installed

00:21:20.520 um the CLI is built using Cobra which is

00:21:23.160 a a go framework for writing clis it's

00:21:27.000 the same one that Coupe cuddle is is

00:21:28.380 written in and you can install it on Mac

00:21:32.159 using Homebrew

00:21:34.500 um

00:21:35.400 Linux also through Homebrew but we also

00:21:38.460 which is weird because nobody uses hover

00:21:40.860 on Linux

00:21:42.179 um but it's there but we also build

00:21:44.520 binaries we use go releaser to actually

00:21:46.500 generate

00:21:47.700 um binaries for multiple architectures

00:21:49.799 and multiple platforms

00:21:51.720 um and so yeah if you go to it's

00:21:54.000 actually open source as well so if you

00:21:55.200 go to GitHub maroxa CLI you can see all

00:21:59.280 the code for the CLI and all the tooling

00:22:01.080 and GitHub actions and everything we use

00:22:02.880 around generating it it's definitely

00:22:05.400 worth checking out we've invested a lot

00:22:07.020 of time in in a builder pattern for

00:22:09.179 creating new commands very easily I know

00:22:11.340 it's in go but it's it's worth checking

00:22:13.559 out either way

00:22:14.640 sure so the question is what was I

00:22:16.620 running locally to enable Rockstar apps

00:22:19.080 run and how does it compare to what is

00:22:21.720 run on the platform when I run rocks

00:22:23.820 apps deploy

00:22:25.220 so essentially we try to mimic the same

00:22:29.039 experience so that you have this fast

00:22:31.500 feedback loop locally and so we're

00:22:34.620 moving towards this

00:22:36.840 unified back end for enabling multiple

00:22:39.659 languages so right now we support go

00:22:41.460 JavaScript and Python and Ruby and so

00:22:44.880 it's the same functional backend even

00:22:47.280 locally so when you execute the local

00:22:49.320 execution it threads your records

00:22:51.480 through your function and then feeds it

00:22:53.400 back into it when you run rocks apps

00:22:55.799 deploy it does something very different

00:22:58.620 but the end result is effectively the

00:23:00.480 same and actually

00:23:02.220 ships your package it builds a container

00:23:04.740 out of your code and then ships it to

00:23:06.960 the platform and the platform wires up

00:23:08.460 all these components

00:23:10.860 it's a lot of technical stuff I'm happy

00:23:13.020 to go into much more detail with anyone

00:23:14.700 who wants to to discuss it

00:23:18.059 uh so the question is how do multiple

00:23:19.919 developers uh working locally

00:23:22.679 collaborate on the same sort of

00:23:25.080 deployment the same app

00:23:28.580 so essentially one of the things we do

00:23:31.679 is we with a local development

00:23:33.299 environment you can actually run a

00:23:35.760 command that pulls sample data from a

00:23:38.220 development database or staging database

00:23:39.840 and lets you iterate on it locally but

00:23:43.200 then we also use the typical git

00:23:45.360 workflow so you're building your stream

00:23:47.340 your data app and you're committing it

00:23:49.500 to GitHub and so you can kind of

00:23:52.799 lean on the same workflows that you

00:23:54.600 normally have around collaborating so

00:23:56.580 you are creating PRS with your stream

00:23:59.220 processing application you know getting

00:24:00.720 feedback and comments and everything at

00:24:02.159 the same time so we aren't necessarily

00:24:05.280 diverging from that our goal is actually

00:24:06.960 to map as closely as possible to what

00:24:09.240 you normally do with software

00:24:10.320 development so you follow the same

00:24:12.360 workflows that you normally have

00:24:14.120 you'd write some code you push a PR you

00:24:17.100 run some tests you get some feedback you

00:24:18.960 iterate on that and then eventually you

00:24:21.539 deploy the thing that you know works

00:24:23.520 when you're happy with it

00:24:27.320 yeah so sure so the local development

00:24:30.659 experience doesn't actually rely on any

00:24:33.299 databases it sort of assimilates what

00:24:35.100 that database would be so in the example

00:24:36.840 that I used today it simulates getting a

00:24:39.900 record from postgres

00:24:41.400 by actually sampling sampling a record

00:24:44.280 from postgres and says this is what the

00:24:45.900 record looks like and it stores it

00:24:47.520 locally in this demo.json file so it

00:24:50.940 includes a bunch of sample records and

00:24:52.500 then that's what you're iterating on

00:24:53.640 locally so you don't need postgres

00:24:55.860 um

00:24:56.580 the way the turbine framework is

00:24:58.559 designed it's actually entirely agnostic

00:25:01.200 of the real resource and so I can go in

00:25:03.539 and change demo PG to demo and the

00:25:07.080 code Works in exactly the same way

00:25:08.280 because it's the platform that's doing

00:25:09.659 that translation as far as the turbine

00:25:11.580 function is concerned I get a record

00:25:13.500 that looks like this and I'm applying

00:25:15.539 some Transformations and I'm pushing out

00:25:17.039 a record in that format

00:25:19.320 the platform is the thing that's

00:25:20.760 responsible for pulling the record from

00:25:22.380 postgres and giving it to the turbine

00:25:24.600 function

00:25:25.860 all right I guess that's it for me

00:25:28.679 thank you very much