Plan to Scale or Plan to Fail: An Evidence-Based Approach for Improving Systems Performance

00:00:15.440 uh hello my name is Jade um I've worked

00:00:18.359 on as Mike said I'm a senior soft

00:00:21.039 engineer at Theta Lake uh I've

00:00:22.680 previously worked on eight different

00:00:24.080 rails apps as part of four different

00:00:25.760 rails systems uh totally different

00:00:28.320 architectures totally different products

00:00:30.400 but what they've had in common is

00:00:31.679 scaling

00:00:33.200 challenges so first up have you ever had

00:00:36.239 a big spike in user traffic come into

00:00:38.520 your system has your rails app ever

00:00:40.879 started slowing down or shedding

00:00:42.960 requests to the point where it looks

00:00:44.520 like it's actually down or maybe instead

00:00:47.440 of user traffic you've had a different

00:00:49.280 kind of problem in that whatever data

00:00:51.520 source you ingest grows to the point

00:00:53.480 where it's going to take just too long

00:00:55.320 to ingest all into your system at the

00:00:57.719 regular interval that you need to then

00:01:00.399 if any of those ring true this talk will

00:01:02.440 be useful for

00:01:04.600 you um so the last time I spoke at Ruby

00:01:07.920 comp in Denver one of the questions was

00:01:10.119 about load testing for user traffic and

00:01:12.400 I recommended a tool some former

00:01:13.880 co-workers had written foreshadowing or

00:01:15.759 reaying real requests against a system

00:01:18.799 the subsequent year I moved to Theta

00:01:20.520 Lake uh we are a fintech uh more

00:01:23.079 specifically Regulatory Compliance and

00:01:26.320 since 20 2022 a few Engineers have been

00:01:29.280 doing performance testing on a replica

00:01:31.280 of our production system and earlier

00:01:33.560 this year I was asked to join in on this

00:01:36.000 um so to give an overview this is

00:01:38.200 basically a cross- team project with

00:01:40.399 people from the Ingus team operations

00:01:42.920 team and uh my team working on the

00:01:45.600 customer facing app rails team and it's

00:01:48.159 led by our CTO Rich this talk is about a

00:01:51.719 methodology for replicating a production

00:01:53.880 system then seeing how how it performs

00:01:56.719 when we push more data into it than we

00:01:58.799 currently see

00:02:01.240 you might ask yourself why well the

00:02:03.880 point of the performance and scalability

00:02:06.000 testing we're carrying out is to

00:02:07.880 demonstrate the maximum performance and

00:02:09.640 throughput of our system when balanced

00:02:11.920 against operating at a reasonable cost

00:02:14.440 so you could horizontally scale out

00:02:16.000 further we're just looking to balance

00:02:18.280 operational cost and

00:02:19.760 throughput um there's the old quote

00:02:22.080 about premature optimization which I'm

00:02:24.080 sure many of you all know and I would

00:02:26.040 agree that if you don't need to do this

00:02:28.239 sort of thing then don't maybe um but

00:02:31.319 what we're trying to do here is given

00:02:33.160 what we know what we've been told and

00:02:35.599 where our ingestion is in terms of scale

00:02:37.360 at the moment we're trying to see ahead

00:02:39.599 a little bit into the future in order

00:02:41.519 find in order to find bottlenecks in our

00:02:43.800 system then we can then both demonstrate

00:02:46.800 how scalable it is at the moment and

00:02:49.280 also find limiting factors that we can

00:02:51.280 then look to

00:02:52.760 mitigate uh so my hope is if you're on a

00:02:56.200 team that is starting to see scaling

00:02:58.120 challenges uh then you'll be able to go

00:03:00.159 back to your team and apply this could

00:03:02.200 be a big hope but um the idea is you

00:03:04.840 want to do this before either your

00:03:06.400 server costs sort out of control or uh

00:03:09.879 your entire system is about to um start

00:03:12.560 acting like it's down or actually go

00:03:15.000 down um so yeah today I'm going to

00:03:18.400 present our methodology for replicating

00:03:20.599 our system for the purpose of load

00:03:24.000 testing right so bit of background uh

00:03:27.360 for this sort of thing a lot of the

00:03:28.840 tooling and resource that you'll hear

00:03:30.360 about are for looking at the performance

00:03:33.360 of um big systems uh designed by and for

00:03:39.200 absolutely massive companies operating

00:03:41.400 at truly massive scale so Netflix

00:03:44.040 Shopify and a lot of the work including

00:03:46.920 the literal book on the subject uh has

00:03:49.360 been done by grenon Greg who is in

00:03:51.239 Netflix the issue is in um smaller

00:03:56.079 companies where you don't have that uh

00:03:58.560 that large team

00:04:00.239 uh you may have different constraints

00:04:01.920 and also as well as having people's time

00:04:04.840 limited you might not want to operate a

00:04:07.519 fake version of your system all the time

00:04:09.680 because it would simply be too expensive

00:04:11.400 like why would you do that uh you may

00:04:13.799 also have some or a lot of pii personal

00:04:17.160 in identifiable information that you

00:04:19.479 have to remove before it goes into an

00:04:21.280 external logging tool and specifically

00:04:24.040 to us you cannot use production data in

00:04:27.360 any form even anonymized the problem is

00:04:30.560 when you're operating at some kind of

00:04:32.400 scale just your rails app alone like

00:04:34.960 your single monolith isn't often going

00:04:36.800 to be your entire system um so you could

00:04:40.600 locally optimize performance in it you

00:04:42.560 could see someone's PR and say oh I

00:04:44.160 think that could be a bit faster there

00:04:46.120 but you could do the classics you could

00:04:47.960 just switch out Malo for jalo put an

00:04:50.560 index somewhere in your database where

00:04:52.039 you need it um but this might not even

00:04:54.800 be where the bottlenecks are so that is

00:04:57.199 that could in that case be wasted work

00:04:59.720 so in short we want to replicate our

00:05:02.080 entire system on real infrastructure

00:05:04.800 with real logging and then load test

00:05:07.039 against that I personally think this is

00:05:09.320 really cool and I've not actually seen

00:05:10.880 it before so I'm going to walk you

00:05:13.000 through how we do

00:05:15.720 it so um I just wanted to clarify a bit

00:05:19.479 of terminology this is very directly

00:05:22.840 from Brandon Greg's book that I put a

00:05:25.120 few slides ago uh so few important

00:05:28.280 points so throughput is defined as the

00:05:30.479 rate of work performed workload is the

00:05:33.319 input to the system or the load you're

00:05:35.120 applying to that system response time is

00:05:37.960 the time for an operation to complete

00:05:40.440 both comprising of weight time and

00:05:42.360 actual service time and then utilization

00:05:45.960 uh has two definitions for resources

00:05:49.240 servicing requests like servers how busy

00:05:52.360 a resource was and for resources that

00:05:54.919 provide storage the cap the capacity

00:05:57.400 consumed so for example memory us

00:06:01.120 utilization then probably quite an

00:06:03.639 important one for this talk bottleneck

00:06:05.680 is a resource that limits the

00:06:07.280 performance of the system like a

00:06:08.960 limiting factor uh and you're aiming to

00:06:11.759 identify and remove systemic

00:06:15.639 bottlenecks um okay so this is a high

00:06:18.479 level system diagram of our system and

00:06:21.120 what it does and I've highlighted some

00:06:23.000 fairly typical areas of a system in my

00:06:24.960 experience so data ingestion your

00:06:27.560 pipeline and then rails py all the usual

00:06:30.560 maybe an API maybe not and here our data

00:06:35.360 ingestion has things like it's got a

00:06:37.919 pipeline uh go leads into a pipeline for

00:06:40.400 Content analysis and I'm not going to go

00:06:42.080 into great detail of that um my team's

00:06:44.479 part of the system is the rails psychic

00:06:46.120 or the

00:06:47.240 usual and then this is an architecture

00:06:50.199 diagram so we'll have various

00:06:51.800 Integrations like Zoom slack they feed

00:06:54.919 into a system called the integrator that

00:06:57.599 feeds into the ingestor through to the

00:06:59.919 pipeline through to portal which is the

00:07:01.599 rails app and then the API kind of feeds

00:07:03.919 into quite a few of those um I've seen

00:07:07.240 similar architectures or heard about

00:07:09.360 them and a fairly common thing I've seen

00:07:12.440 is for there to be differences in how

00:07:14.160 you do data ingestion um I've seen a few

00:07:17.560 places where the strategy is to write or

00:07:20.800 in some cases actually rewrite your data

00:07:22.680 ingestion service in a language other

00:07:25.160 than Ruby and um so maybe closure maybe

00:07:29.080 go line

00:07:30.479 uh the other month I actually heard

00:07:32.160 about a uh web hosting platform who'd

00:07:35.440 also decided to write like us their data

00:07:37.919 ingestion service in goang and that gets

00:07:40.720 data into database and then it

00:07:42.879 eventually um gets it into a form where

00:07:45.199 rails monolith reads it but

00:07:47.360 interestingly that team are actually

00:07:49.120 going to move back to Ruby because of

00:07:51.479 Team changes so it's therefore easier

00:07:53.479 for that team to maintain their data

00:07:55.000 ingestion

00:07:57.440 service okay

00:08:01.440 so I've covered when it will help to

00:08:02.840 load test your system let's get into

00:08:05.360 details so firstly we want to replicate

00:08:08.080 the existing system assuming you want to

00:08:10.520 repeat this process perform several load

00:08:12.960 tests not running just constantly and

00:08:15.800 also not pay to have that infra up all

00:08:18.199 of the time you want you're going to

00:08:19.960 need a way to build up and tear down

00:08:22.080 that

00:08:23.080 infrastructure so we use terraform uh

00:08:26.360 many of you will be familiar in case

00:08:28.599 anyone isn't terraform is infrastructure

00:08:30.800 is code if you've ever dug around in an

00:08:33.640 AWS console looking for some kind of

00:08:36.839 configuration setting you'll understand

00:08:38.800 why that's a useful

00:08:40.200 tool um in my experience some teams that

00:08:42.839 are on Heroku or AWS may already be

00:08:45.360 using it and you might not you might

00:08:47.680 have plans to move on to it benefits

00:08:49.839 here are for performance testing like I

00:08:53.080 said you want to be able to

00:08:56.839 um build up that infrastructure in the

00:08:59.200 same way and then tear it down between

00:09:01.360 tests so you're not leaving it idle and

00:09:03.160 paying for that capacity and you want

00:09:05.720 roughly the same setup as production and

00:09:08.000 you want to bring it up and tear it and

00:09:10.440 uh as I said tear it down in a

00:09:12.279 repeatable way and that is what

00:09:13.800 terraform was very useful

00:09:16.440 for um next thing is you want to think

00:09:19.560 about how work arrives in our system so

00:09:22.760 systems I've worked on have had two

00:09:24.600 typical ways pretty standard uh either

00:09:27.680 use a traffic or some kind of data

00:09:29.880 ingestion

00:09:32.320 service okay then the third thing you

00:09:34.560 want to think about for the purposes of

00:09:36.079 load testing is how can we then

00:09:38.000 artificially push work into this

00:09:41.040 performance testing system um so there

00:09:45.880 are for user traffic I'm aware of uh two

00:09:49.200 companies that have looked at uh doing

00:09:51.880 request replays or Shadow requesting so

00:09:54.640 you're capturing real production

00:09:56.000 requests removing the pii so it doesn't

00:09:58.240 go to You're logging and then replaying

00:10:00.279 those against your system um companies

00:10:02.360 I've heard of who are doing this have

00:10:03.519 actually open source their tools so one

00:10:05.160 is Carwell and Umbra and the other is

00:10:08.320 love holidays and a tool called

00:10:10.800 Ripley um which uh I recently saw a talk

00:10:14.480 about actually so was oh coincidence um

00:10:17.680 in our case because our customers are in

00:10:19.399 regulated Industries it is not

00:10:21.480 appropriate at all to use their real

00:10:23.880 data so instead what we use is a go tool

00:10:28.000 my colleague uh David who's on the

00:10:29.800 ingestion team wrote to generate some

00:10:32.600 fake data to approximate present day

00:10:35.000 workload and then from that we do 10x

00:10:38.480 100x THX Etc so the sort of things that

00:10:42.320 are coming into the system are emails

00:10:44.480 chats Zoom calls etc etc so we're going

00:10:48.320 to push fake versions of those through

00:10:50.200 the ingestion

00:10:52.440 service um so foundational work on our

00:10:56.320 tool for this generation and pushing

00:10:58.800 fake data into our system actually

00:11:01.120 happened before I got involved so I

00:11:02.920 asked David about his approach for

00:11:04.839 reflecting real traffic coming in uh

00:11:07.600 according to him the approach he took

00:11:09.639 was to push traffic at certain volumes

00:11:12.800 that correspond to what we were already

00:11:14.399 seeing and seeing what we would need to

00:11:16.600 process that in terms of resources

00:11:18.519 servers

00:11:19.760 Etc and then we could look at the sort

00:11:23.760 of 24 average rate across a work day

00:11:26.920 check the difference and then use that

00:11:29.480 to um check the difference between that

00:11:32.279 and the kind of peak load uh demand uh

00:11:35.200 requirement for resources and then work

00:11:37.320 out from that how many resources we

00:11:40.240 needed for that um so we used volume

00:11:44.680 volumes of then we used volumes of

00:11:47.360 anticipated customer data and allowed

00:11:49.800 some extra for growth and used that to

00:11:51.959 get a a kind of average rate and then

00:11:54.959 once we hit that then we would have at

00:11:57.079 most 24 hours latency in Pro processing

00:11:59.800 data coming in all of it kind of

00:12:01.120 normalized

00:12:02.160 out um with the rate of processing per

00:12:05.120 machine or maybe end machines we could

00:12:09.079 look at production data centers the

00:12:12.399 heaviest used ones and look at the

00:12:14.880 effective rate of the typical busiest

00:12:17.639 hour and then see how that differed from

00:12:20.199 the 24-hour rate so that would again

00:12:22.959 that's kind of deciding how many

00:12:24.399 resource that we need to process extra

00:12:27.519 workload coming in and a satisfa time

00:12:30.560 and this diagram is just showing 24

00:12:32.600 hours of ingestion for a production

00:12:34.399 server so average rate here was about

00:12:37.199 83,000 records processed per hour um

00:12:42.720 and the the idea here is just to get to

00:12:46.120 a message uh get to a process a record

00:12:48.959 through in a reasonable amount of

00:12:52.240 time so

00:12:54.560 finally measuring results um yeah before

00:12:58.120 I read computer science readed biology

00:13:00.160 which covered scientific method how you

00:13:02.199 generate a hypothesis design an

00:13:04.079 experiment test it and then analyze your

00:13:06.279 results to see if you're correct or not

00:13:08.040 pretty standard stuff but then we also

00:13:10.320 had a really cool lecture from one of

00:13:12.440 our Immunology lecturers Dan Davis about

00:13:15.360 how his research group was sort of

00:13:17.720 flipping that by gathering tons of data

00:13:20.800 and then sharing the data itself with

00:13:22.560 the wider scientific community so both

00:13:25.279 his research group and also other teams

00:13:27.680 could analyze it draw draw conclusions

00:13:29.839 and then publish from that so we're

00:13:32.199 partway there in that we're Gathering

00:13:34.519 tons of data in a standard and our

00:13:36.320 standard logging and then from that um

00:13:40.519 so like I said we have the same logging

00:13:42.240 that we do in a production system and

00:13:44.320 from that we can also therefore look at

00:13:46.040 CPU utilization RAM and also individual

00:13:49.639 log lines from each component in the

00:13:51.560 system um so I'm going to skip over

00:13:55.480 looking for absence of Errors because

00:13:57.800 that's kind of ear tend to be early on

00:13:59.839 in a low test where you're you might

00:14:02.639 have like tried needed to switch

00:14:04.839 something off external like sending

00:14:07.199 emails and that might be quite specific

00:14:09.519 to your

00:14:10.680 system so this is hang on yeah this is

00:14:14.560 an example load test and we wanted to

00:14:18.079 look at the part of the system that

00:14:19.240 Roots records through to workflow so the

00:14:21.440 idea is to assign to an individual for

00:14:23.680 review and we were putting 240k records

00:14:27.360 through and looking for no errors total

00:14:31.000 time to process all of those 240k

00:14:33.440 records and how long each individual

00:14:35.959 record took to be

00:14:41.199 processed and this is what it looked

00:14:43.240 like over time this is the event

00:14:45.079 individual time per record processed and

00:14:48.320 that was working out to about 2 and a

00:14:50.240 half seconds per record to go through

00:14:52.279 that workflow so then we have logging as

00:14:55.440 well through that out that code path so

00:14:57.639 we can break down where the time is

00:14:59.440 being spent so for most to least time

00:15:03.360 spent we were assigning those records to

00:15:05.920 a workflow and then a lot of time

00:15:08.720 preparing less time 31% to set up a new

00:15:12.079 record then 8% to enter the actual

00:15:15.000 workflow process and then everything

00:15:17.000 from there was 4% of the time or less so

00:15:19.040 we w super concerned about

00:15:22.399 that okay so that's a kind of a gist of

00:15:27.360 one performance test this is now a full

00:15:30.199 system performance test I'm going to

00:15:31.759 walk you through

00:15:34.079 so you do not need to read all

00:15:36.720 this uh the key points are this is a

00:15:39.240 full systems performance test that we

00:15:41.120 carried out it was across 5 days

00:15:44.120 ingesting 190,000 fake chat records per

00:15:48.160 hour

00:15:49.800 634k fake email records per hour and

00:15:53.040 intentionally exercising the identity

00:15:55.399 matching part of the code base so

00:16:00.399 on the rail side a very important area

00:16:02.839 is how we recognize new participants

00:16:04.839 from Zoom meetings with the we and chest

00:16:07.079 and make sure that we're not constantly

00:16:08.759 saying oh Jade with this email address

00:16:12.199 is one person and Jade with this email

00:16:13.920 address is another person but you have

00:16:16.160 to have a couple of things to match on

00:16:18.440 um there are going to be things that

00:16:20.759 will identify as clearly the same person

00:16:23.519 like combination of name and number or

00:16:26.079 name and employee record for example so

00:16:28.920 so to artificially stress test this part

00:16:30.920 of the system um my team already had

00:16:34.319 something to generate an arbitrary

00:16:36.000 number of meeting participants written

00:16:37.839 by uh written by our lead and I wrote a

00:16:40.839 rake task that was to generate pairs of

00:16:44.000 participants that would be recognized as

00:16:46.000 the same person so say

00:16:48.639 100,000 and then our rails app is going

00:16:50.759 to realize that those are actually

00:16:52.920 50,000 individuals just with very

00:16:55.040 slightly different

00:16:56.639 details then when we're pushing fake

00:16:58.920 Zoom meetings into the system in the uh

00:17:01.880 overall systems test it's going to hit

00:17:04.640 this code path for identity matching and

00:17:07.039 then we can see how that in handles that

00:17:08.880 increase

00:17:13.600 throughput okay so this was really cool

00:17:16.880 so two days into the test I mentioned it

00:17:18.679 was a 5-day test um just from force of

00:17:22.280 habit from doing daytime like site

00:17:24.439 reliability support rotations uh a

00:17:27.199 couple of jobs ago I check the sidekick

00:17:29.840 cues about 11:00 UK time and they were

00:17:32.360 really really backing up so just want to

00:17:35.799 emphasize this is not a production

00:17:37.640 system this is a performance testing

00:17:39.280 system so we're fine the workflow

00:17:43.400 sidekick jobs were taking excessive

00:17:45.880 amounts of time to complete and to be

00:17:49.840 more specific the meantime of seconds to

00:17:52.520 complete was suddenly the exact same as

00:17:55.039 the

00:17:56.320 p999 so the other issue issue the kind

00:17:59.720 of the comb the same issue really uh we

00:18:02.200 were looking at latency of 6 and 1 half

00:18:04.679 hours for one of our I kick qes which is

00:18:07.320 nothing like our normal latency so

00:18:09.919 obviously it felt a bit high and at some

00:18:13.440 point in the night the databased

00:18:15.039 connection count had doubled from about

00:18:17.960 500 to

00:18:19.559 1,000 and therefore each of those

00:18:22.320 workflow items was taking a lot more

00:18:24.760 time to process um the actual time per

00:18:28.400 record was still around 2 and 1 half

00:18:31.039 seconds but they were waiting far far

00:18:33.320 far longer in the queue than usual so

00:18:37.440 then at midday UK time so about an hour

00:18:40.039 later jobs actually started failing and

00:18:43.600 there was a massive spike in the classic

00:18:45.799 active record database connection error

00:18:48.200 which I'm sure many of you will have

00:18:49.400 seen before so reasonably obviously if

00:18:52.200 you've seen this yourself this is not

00:18:53.559 good this could be an incident depends

00:18:56.440 but obviously in a production system and

00:18:59.640 what was happening here was the database

00:19:02.520 server was just the resource was just

00:19:04.640 getting totally exhausted and

00:19:07.840 that just led to database like keep

00:19:11.840 trying to open a database connection

00:19:13.480 keep getting rejected job fails goes

00:19:16.360 around again and it just keeps on

00:19:17.919 backing up the queue like that so that

00:19:21.880 500 database connections number actually

00:19:24.520 used to be heroku's Li heroku's limit on

00:19:27.360 postgress database connection

00:19:29.760 and uh at the end I'll share a reference

00:19:32.039 from them on rale behind that so there's

00:19:35.640 ways that you can handle this in this

00:19:37.799 test we just scaled up the database and

00:19:39.840 that worked perfectly fine so what I

00:19:42.440 thought was really cool about this was

00:19:43.960 we got a preview of what could be a real

00:19:46.559 production incident before it ever

00:19:48.679 actually happened in production so we

00:19:50.880 got to mitigate and present prevent it

00:19:53.559 ever actually happening for

00:19:56.120 real

00:19:57.640 so sharing data why bother I've talked

00:20:00.440 about the how why the sort of things you

00:20:02.760 might find when load testing I want to

00:20:05.200 go back to what I said about sharing

00:20:06.440 data and not just results uh why would

00:20:09.400 you bother so you want to bring your

00:20:11.360 team along with you this was really an

00:20:13.600 excuse to get a copy on the slides again

00:20:16.159 and uh the idea is you're limited on

00:20:19.360 time um ideally if you can share around

00:20:23.400 the like results from your logs then

00:20:26.400 everyone can pick up optimization work

00:20:28.360 when time available so what would be

00:20:31.159 ideal share the data in this case by

00:20:33.720 sharing your logging results from each

00:20:35.440 load

00:20:37.480 test um yeah so from what you learn in

00:20:40.120 performance tests a load test sorry you

00:20:42.280 can do smaller scale controlled

00:20:43.799 experiments and I would basically just

00:20:46.039 point you to Nate bapex work especially

00:20:48.960 the DRM method database Ruby memory and

00:20:52.720 I'd say both of these two books are the

00:20:55.360 go-to resources for rails performance

00:20:57.480 optimization

00:20:59.080 um this actually doesn't require a

00:21:00.919 performance testing environment like

00:21:02.559 I've described instead you start by

00:21:05.080 benchmarking locally and proving your

00:21:07.520 the worth and Improvement of your

00:21:09.919 performance PR and then you do the same

00:21:11.640 in production if you've got approval for

00:21:14.159 the pr um what's kind of cool is that

00:21:17.279 once you follow that process you can

00:21:19.440 also prove how your optimization will

00:21:21.440 perform in a larger load test this is

00:21:23.679 something that we've done uh with a

00:21:25.200 couple of my teammates work which has

00:21:26.679 been really interesting so

00:21:30.080 um then just going back

00:21:33.600 to uh the point about premature

00:21:38.760 optimization um I'd agree with anyone

00:21:41.600 who's thinking about premature

00:21:42.679 optimization being the root of all evil

00:21:44.600 the actual full quote is that you

00:21:47.120 there's no doubt The Grail of efficiency

00:21:48.720 leads to abuse we waste enormous time

00:21:51.840 amounts of time thinking about or

00:21:54.159 worrying about the speed of non-critical

00:21:56.279 parts of our programs and these attempts

00:21:58.360 are efficiency actually have a strong

00:22:00.400 negative impact when debugging and

00:22:02.480 maintenance are considered so we should

00:22:04.400 forget about small efficiencies since

00:22:06.679 say about 97% of the time um premature

00:22:10.760 optimization is the root of all evil

00:22:12.440 that's the full quote so everything I've

00:22:14.799 discussed in this talk is about ignoring

00:22:17.080 the non-critical so instead to find

00:22:19.960 bottlenecks in critical parts of your

00:22:21.720 system so you can Rectify them

00:22:24.000 preferably just before you need

00:22:26.640 to uh summing up up uh using the

00:22:30.600 methodology I've described you can carry

00:22:32.799 out large scale measurements share data

00:22:35.520 and your insights from those uh those

00:22:38.120 tests with your wider engineering team

00:22:40.880 and you're using tools to anticipate

00:22:43.559 problems rather than having to react to

00:22:45.279 them where they happen and just cause

00:22:47.400 Panic uh the idea is to anticipate

00:22:51.840 multiples of your current scale and

00:22:54.760 replicate your system with terraform

00:22:56.960 think about how the workload or traffic

00:22:59.520 arrives into your system whether that's

00:23:01.240 user traffic or data ingestion and then

00:23:04.919 from that load test and take

00:23:07.080 measurements from a performance testing

00:23:09.159 environment and I've also walked you

00:23:11.120 through a few example findings and

00:23:14.000 thought about looking ahead how you can

00:23:15.440 share that with your team so yeah thank

00:23:18.320 you for listening I'll be around for

00:23:20.000 questions

00:23:26.799 afterwards thanks very much for