Talks

Summarized using AI

Observability on Rails

Caroline Salib • August 30, 2024 • online

Observability on Rails: Enhancing Confidence in Code Deployment

In the talk "Observability on Rails," Caroline Salib discusses the concept of observability in software development, particularly within Ruby on Rails applications, and how it can enhance developers' confidence when deploying code to production. The presentation serves as an introductory guide to observability for those who may not be familiar with the term or its importance in modern software development.

Key Points Discussed:

  • Definition of Observability: Observability is defined as the ability to understand the internal state of a system by examining its external signals. Key signals include:

    • Logs: Text records that provide insight into application performance.
    • Metrics: Numeric data points that track factors like CPU usage, memory consumption, and request rates.
    • Traces: Detailed records tracking the journey of requests through services, helping identify issues and delays.
    • Profiling: A newer aspect allowing monitoring of code behavior at runtime in production environments.
  • Importance of Observability: The speaker emphasizes the role of observability in:

    • Understanding the impact of code changes.
    • Quick problem detection.
    • Tracking system health and performance.
    • Improving user experience and enhancing system reliability.
  • Common Tools: Various tools and frameworks for implementing observability are mentioned, including:

    • Commercial Tools: DataDog, New Relic, Honeybadger, and Sentry.
    • Open Source Tools: Prometheus for metrics, Grafana for visualizations, and OpenTelemetry for managing telemetry data.
  • Real-world Example: Caroline shares her experience at Shopify during a significant backend migration that required enhanced visibility to avoid performance degradation. Dashboard alerts helped identify and resolve issues early in the rollout process, contributing to a successful transition.

  • Alerts and Playbooks: The necessity of alerts tied to metrics and clear documentation (Playbooks) for handling incidents is highlighted. Effective alerts must be actionable to avoid noise and ensure relevant team responses.

  • Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs): These metrics help quantify reliability and performance goals within applications.

  • Live Demo: A practical demonstration of using Prometheus alongside a Rails application illustrates how observability aids in monitoring performance and managing requests effectively.

Conclusions and Takeaways:

  • Emphasizing observability can prevent production issues and enhance the team's ability to respond quickly to incidents.
  • Implementing observability tools and creating actionable alerts can significantly improve software reliability.
  • Caution against introducing noisy alerts, which could lead to alert fatigue and neglect.
  • Developers are encouraged to assess their current observability practices and consider potential improvements to enhance their systems' reliability and performance.

Observability on Rails
Caroline Salib • August 30, 2024 • online

If you're unsure about what observability is or haven't given it much thought, this talk is perfect for you! We'll cover the basics of observability and how it can boost your confidence when shipping code to production by improving your ability to manage and monitor your Ruby on Rails applications.
https://www.wnb-rb.dev/meetups/2024/08/30

WNB.rb Meetup

00:00:00.640 well thank you so much everybody for joining uh so today we're gonna do a
00:00:06.200 little introductory chat about observability and
00:00:11.240 hopefully uh that hopefully that will help you ship cod with more
00:00:19.359 confidence all right let's do this um little bit about me I'm a senior
00:00:26.519 software developer at Shopify I'm a pet mom I have a two cats and a doggy on my
00:00:32.960 free time I like to dream about living in the woods and um also a game overover
00:00:39.000 I love playing video games when I have the time and I wish I had more time than I actually do it um cool before we start
00:00:50.559 I actually am curious about what's your uh relationship with observability today
00:00:56.640 so I have here a few options and if you feel comfortable with and you can share share in the chat if you're number one
00:01:03.320 two or three uh so number one is my company or team has observability
00:01:08.600 dashboards and alerts and I check them periodically and it directly impacts how I do my
00:01:14.520 work uh number two my company or team has observability dashboards and alerts
00:01:20.119 but it not I'm not very familiar with them and it doesn't impact how I do my
00:01:25.479 work number three will be my company or team doesn't have observ probility dashboards and alerts or it doesn't have
00:01:33.040 the need for them or number four um if you want to share in the
00:01:39.079 comments if you are in a different
00:01:46.719 scenario nice thanks thanks for sharing um for those who posted two and three I
00:01:52.439 think this talk will be perfect for you uh and for those who posted well I think
00:01:57.680 there you still can get some value from this so from it so hang hang
00:02:03.439 tight um so how observability how did how why did I decide to do this talk in
00:02:08.840 the first place so I've been working with development for 13 years but I feel
00:02:14.000 like I've been coding in the dark because i' I've always been in scenario two or three where my company or team
00:02:21.239 doesn't have observability at all or he has it but it doesn't really uh impact how I do my work um I'm in the checkout
00:02:30.080 ability team at Shopify and uh earlier this year I led the the backend migration uh to the new checkup
00:02:37.280 editor uh so basically uh Shopify we used to let uh Merchants customize their
00:02:43.720 check out with uh checkout liquid uh but for many reasons security
00:02:50.920 performance uh we wanted them to customize in a more controlled environment so we came up with check out
00:02:57.680 accessibility and so we had to migrate all the millions of merchants uh to this
00:03:03.920 new platform and we need a better visibility to ensure that we would have
00:03:09.360 a smooth rollout and my lights went up weird I don't know why anyway our
00:03:18.200 main concern there was uh handling a massive increase in requests because we were migrating millions of
00:03:25.080 merchants and uh we wanted to make to ensure that there will no performance
00:03:31.560 degradation uh I saw a question about liquid no more liquid on checkout I
00:03:37.000 think there's still liquid and other areas of Shopify um yeah so we implemented
00:03:44.599 metrics dashboards and we did uh in fact found the issue during the roll out but
00:03:52.040 it was related to a separate team because we had dashboards the alerts we were able to identify this earlier and
00:04:00.480 address with the team they made a few twixs and there was no performance degradation no issues the release was a
00:04:07.159 success and yeah that's hey really good thing um but maybe you're here because
00:04:15.879 you don't even know what observability means so observability if if you don't know what
00:04:21.560 observability means it mean it's a term that we use to describe the ability to understand internal uh state of a system
00:04:29.600 by by examining its external signals um what are the signals so
00:04:37.080 there's logs logs are text records that uh you can see how the application is
00:04:42.479 doing if when you run R server immediately see logs there but some
00:04:47.639 companies also have a production application where you can see the logs um there's metrics uh metrics are
00:04:56.280 numer numeric data points that you can use to uh tracks things such as like CPU usage
00:05:05.039 and memory consumption uh request rates and even Uh custom things uh like features and
00:05:13.400 anything anything really that you want to track when it goes to production and we also have traces
00:05:19.759 traces are detailed records of an entire journey of our request and as it flow
00:05:26.160 through the different services and components so it helps you figure it out like uh possible issues or delays in
00:05:32.840 specific um stamps sorry spans of the the request
00:05:38.800 journey and profiling uh profiling was only introduced to open chetry and
00:05:44.800 observability uh earlier this year and it allows you to inspect the code
00:05:49.880 behavior and performance at runtime in production so pretty cool
00:06:00.160 a few use cases uh for observability in production is uh understand the impact
00:06:06.759 of new code changes uh quick problem detection improve the system understanding track
00:06:14.120 uh health and performance enhance the user experience which can be related to Performance as well and scalability and
00:06:23.120 reliability so basically not only you're there coding on your on development
00:06:28.840 machine fix bugs but you owning the the changes from development to production
00:06:34.639 and ensuring like that the system is uh
00:06:40.919 reliable uh there are here a few commsion tools um there's data dog neur
00:06:46.759 Relic Honeybadger signal robar blun Sentry scalp and many many others um
00:06:53.639 some of them are more focused about more focused about logging or traces or
00:07:00.800 um metric or like a combination of all these things um yeah I'm curious if if
00:07:07.000 you feel comfortable uh sharing like U do you use any of those tools at work or
00:07:12.560 for for personal projects um feel free to to share in the comments if you if
00:07:19.759 you feel comfortable yeah I used I used New Relic
00:07:26.599 in the past Honeybadger robar splank and data dog like in companies that I worked
00:07:33.560 for and I used the Sentry for personal projects and I really like
00:07:40.840 Sentry
00:07:48.440 nice yeah data dog and and centry and neic are very very popular thanks thanks
00:07:55.280 for sharing that's that's really insightful um we also have have open source tools um so there's Prometheus
00:08:02.800 more focused on metrics fluent D for logging jar for tracing grafana is uh
00:08:12.000 like a UI that you can see a bunch of things together uh so you can create dashboards and uh manage alerts and it
00:08:19.960 does a bunch of things I don't think it's only related to observability but um it's very commonly used with
00:08:27.159 Prometheus um and open Teter which is not a two but it's a open source
00:08:32.800 framework I have a separate slide for it does any does anyone here use one of
00:08:38.360 those open source toes feel free to to
00:08:45.839 share nice good question I will share more about this open source tools uh during the
00:08:54.680 talk nice yeah I'm going to cover a lot lot of peral and graph in a more like
00:09:01.839 introductory way but yeah cool uh let's talk a little bit about Petry so Petry
00:09:08.959 it's an open source framework or toit that it's used to manage Telemetry data
00:09:14.560 logs metrics traces and now profiling um it integrates with many
00:09:19.760 tools so Prometheus grafana data do neur Relic and it has been becoming very very
00:09:25.720 popular due to the standardization that it does so uh let's say your company uses data
00:09:34.160 dog but the bill is getting to expensive so you want to switch to an open source
00:09:40.200 too and then own your data and you know we have an infrastructure team to deal with that uh supposedly supposed to be
00:09:47.240 easy to uh switch from uh data dog to perits and gra or other open source
00:09:53.839 tools if they use this standard
00:09:59.640 cool uh I have a demo later uh in this presentation and so a few things I'm
00:10:05.600 going to mention here is so that on the demo you don't feel lost you it will help you understand things a little bit
00:10:12.200 better so we're going to talk a little bit about Prometheus uh just a reminder it's an open source too it's used mostly
00:10:19.680 for metrics and also alerts so promethos has four different
00:10:26.040 uh metric types you have counter counter is more to count occurence of events uh
00:10:32.399 can be used to count number of requests or errors we have gauge gaug is more
00:10:39.519 used for measurements that fluctuate so temperature maybe number of open
00:10:44.800 database connections um histogram it's very uh
00:10:51.360 interesting one it samples observations and counts them in configurable buckets
00:10:57.120 it's usually used for request duration response sizes and uh lastly we have
00:11:03.440 summary which is very similar to histogram but it does a little bit more
00:11:09.360 uh he also provides us with a count and a total count and sum of the observed
00:11:14.959 values and the quantiles that we're going to see more about it later also used for request durations response
00:11:22.399 sizes things like that uh cool Mythos has a query language
00:11:30.680 called promel uh it allows you to query metrics aggregate data some average rates and
00:11:38.720 visualized Trends uh here's an example this is um a
00:11:48.480 query using prom it's getting the rate of the Ruby HP request total on the
00:11:54.440 controller jokes using this rate interval that I pretty sh it's from
00:12:00.160 forna and it's aggregating by the status and so with this query we would expect
00:12:07.360 to see something like this um we see two results groped by the HTP
00:12:13.639 status and then you can like take some insights like between 210 220 the number
00:12:21.000 of Errors responses that return 500 are bigger than the 200 so something
00:12:28.680 something have come go WR here right and I want to mention another
00:12:36.240 thing about prom and pereus because when I started learning about it um it was
00:12:43.760 driving me crazy that I couldn't get exact numbers um on my graphs and now I
00:12:50.160 understand why so I want to share with you in case you going to work with Prometheus and so you don't go to the
00:12:55.839 same trouble that I was having so the way Prometheus work it takes snapshots of data at regular intervals and events
00:13:04.639 between this snapshots may not be captured exactly because of
00:13:09.920 the the intervals uh so we use raids uh to smooth out the trends and there's an
00:13:17.279 example how to get R over there on the interval of five minutes and then you
00:13:22.760 might not see the exact number maybe you're testing locally and you only have a few requests like that few is you send
00:13:29.600 few occurrences of U of A Metric and you expect to see the exact number you
00:13:34.920 you'll see something very very similar to what you expect but not but not the exact number still pretty useful
00:13:43.000 though uh let's talk a little bit about sis slos and slas um they help the company or team to
00:13:52.279 understand and quantify the reliability and performance of the services okay but what what the hell is
00:13:58.440 that um sis are service level indicators so the actual measurement of a
00:14:04.880 system uh slos are Target or service level objectives they are targets for
00:14:11.759 the sis so targets for the metrics and slas are service level agreements and
00:14:19.079 it's like a formal commitment that the company has with the customers I'm going
00:14:25.000 to give you an example and then should be more clear and found out that GitHub
00:14:30.560 actually have public slas which is pretty cool it says that GitHub commits
00:14:36.440 to maintain at least 99.9% of time for services and failing to this criteria
00:14:42.600 means that a credit claim can be made by customer I didn't know that I thought
00:14:48.079 was pretty cool so you know next time you see that Angry unicorn there maybe maybe you can do a credit
00:14:54.360 claim um so that's a public SLA it exists and this is a a fake SLO just to
00:15:00.639 give you an example maybe in order for GitHub to um ensure this SLA doesn't
00:15:07.959 fail they could have an SLO for a time for actions which is one of their
00:15:13.160 services the SLO usually will have um will be slightly uh bigger than the SLA
00:15:21.759 because you want to be conservative you want to be alerted before the SLA actually uh
00:15:29.680 fails um in order to measure this goal maybe
00:15:36.600 you could have an SLI for total triggered executions and another one for unavailable executions and then Crossing
00:15:45.079 this data could could get you to have like an SLO that will prevent the SLA to
00:15:51.959 fail another example of an SLO could be uh 99.99% of time for issues and
00:15:58.480 requests as another service and then maybe an SLA for that could be the
00:16:03.720 service availability uh crossed with the error rate and then you cross this information you calculate your goal and
00:16:11.079 then and then you have your SLO your SLI and your
00:16:17.399 SLA uh about slos still let's talk a little bit about percentiles um I've
00:16:24.600 always seen like people talking about P90 p50 P99 and I'm like I I don't know
00:16:31.519 what that means but I'll go with it um but let let's get some clarity about
00:16:38.959 that so p50 will be like the average uh will be like the average uh rate of
00:16:46.279 something so let's say the p50 of one request is 1 second it means that 50% of
00:16:53.600 the requests are actually faster than that and the other 50% are slower so the
00:16:59.440 average uh user experience P95 however will be
00:17:14.199 requests are um faster than that but that's the the typical worst case
00:17:19.720 scenario for users and it means that 5% of the users are getting uh performance
00:17:26.039 that worse than that so if it's two seconds it means that 5% of the users are having a performance that's actually
00:17:32.640 higher than two seconds and P99 it's us you usually used
00:17:38.480 for U rare performance issues it means that that 1% that's getting like the worst performance ever so if P99 it's 3
00:17:46.480 seconds there's 1% of the users are that are having like really bad performance
00:17:52.480 issues on their application none of the things that we
00:17:59.760 saw so far matters at all if we don't have alerts because you can have
00:18:05.600 beautiful dashboards and metrix but if nobody looks at it when there is a real issue uh it it's worth nothing so let's
00:18:14.000 talk a little bit more about that um you can create alerts based on your sis and
00:18:20.280 slos uh and use them to warn the team when the application isn't working as
00:18:26.120 expected most monitoring tools will let you configure alerts uh but be careful
00:18:32.240 not to introduce noisy alerts because um if you have an alert that's going enough
00:18:37.280 all the time and it's not a real issue there's this weird thing that happens to our brain where it creates a p pattern
00:18:44.400 and people starts to ignore it and it's nobody fault uh it's just like something that
00:18:50.720 you need to be very careful about not to create noisy alerts same goes for uh
00:18:56.919 exceptions and uh lastly alerts must be actionable I'm going to talk more about
00:19:05.280 that um one thing that you can do in order to make the alerts more accessible
00:19:10.640 and actionable is to have a Playbook uh The Playbook could be a documentation that uh goes on the
00:19:19.000 company uh like Internal Documentation system or it could go in the alert
00:19:25.360 description itself uh or even better could be both um so in the Playbook or
00:19:32.159 the alert description or both you should describe what to do when the alerts triggered and also what to do when there
00:19:37.679 are false positives false alerts again we want to avoid noisy
00:19:46.280 alerts uh my suggestion here for an alert description is to have a very
00:19:51.760 specific title uh describe how the alerts being calculated because you
00:19:57.360 don't want to go on vacation then the aler goes off and then nobody understands why and if it's a real
00:20:03.039 problem or not you want to uh describe why the alert may go off um what to do
00:20:10.640 when the alert goes off including what to do when the alert goes off for by
00:20:15.919 mistake when it's not a real issue because again I'm GNA be talking about this many times we don't want
00:20:21.440 noisy and another cool thing to add is uh the customer
00:20:26.960 impact uh I have an example here I hope is readable but I will also read it um
00:20:32.840 so the title here is very specific says back in errors jokes API custom error
00:20:38.280 monitor have received more errors than normal the amount of errors on the back
00:20:44.039 end for jok API has exceeded the threshold of 0.1% because it's now in
00:20:51.159 0.2% how do we calculate this metric we are using the percentile of p50 because
00:20:57.039 here we just care about the average uh impact
00:21:02.559 um this monitor may go off because um we then here I I added whatever I could
00:21:09.440 think about that could cause the the this error so I added we are using an
00:21:15.440 external API and the API might be down the version of the API might have changed or the code that consumes data
00:21:23.240 from the API might be broken uh how to fix it so the first thing is investigate
00:21:28.840 what's going on so if you post uh if you post a joke's API custom error on the
00:21:35.360 logs platform it might give you an idea of why the error is
00:21:40.840 happening uh and then I there is also an instruction here saying that if the alerts going off uh too often with false
00:21:49.200 positives to consider handling it in a different way or increase the threshold
00:21:55.559 because maybe there is a known issue the team doesn't have time to work on it it's better to increase the threshold of
00:22:03.039 0.1% to let's say 0.3 and then get alerted when it goes to
00:22:08.799 0.4 then let the alert pop all the time and it gets ignored when things uh go
00:22:16.440 off rails and then lastly here we have the impact so customers could have been
00:22:22.279 problem accessing the the index page which is really bad so you put the
00:22:28.240 impact there uh maybe the person who sees the alert might uh you know take
00:22:34.279 action
00:22:41.440 faster awesome now I'm gonna do use something wild and I'm going to try a
00:22:46.640 live live demo I hope it goes well uh for this live demo I'm using uh
00:22:53.240 this raos project here uh I'm GNA share the slides later so you you can access
00:22:59.480 the links and I'm I'm going to be covering
00:23:05.480 metrics and I'm going to be using Prometheus in G to collect and visualize
00:23:10.679 Matrix and I'm using this gam called Prometheus exporter to export the metrix
00:23:16.440 from the rails app cool so this is how my application
00:23:24.200 looks like nothing fancy it's fetching a joke from a public jok's
00:23:29.600 API uh when I refresh it's going to fetch a new joke and then a new joke this joke doesn't have a title some of
00:23:36.679 them has have a title uh why do we care we don't uh but I have a custom metric
00:23:43.080 for that so going to show you later we have a dashboard here so this is how graffan looks like in the back hand I
00:23:50.480 see the jokes are we good I hope so I got some bad ones that I was like a
00:23:56.039 little bit concerned about refreshing the page because who knows what's going to show up there cool um let me here so
00:24:05.960 I have this is how graan looks like and the back end uh there is the Prometheus
00:24:12.799 and Prometheus exporter exporting metric to
00:24:17.919 Prometheus we have here a board with the average request duration we have P99 P90
00:24:24.320 and p50 as we can see here p50 is um around 300 milliseconds uh P90 about
00:24:34.000 300 milliseconds as well P99 almost 400 millisecond so it's not bad it's
00:24:40.559 actually a perform a good performance for this purpose um have another one
00:24:45.679 here that it's the the the rate of ATP requests growed by status code uh so
00:24:53.840 here we have 500 we've been having zero 500 uh request should great and here are
00:25:02.000 like all the 200 so mostly you're getting 200 uh so mostly the my
00:25:07.559 controller is succeeding um cool uh I have a script here that's
00:25:15.200 pinging my application every half a second and this is my
00:25:21.159 controller I'm gonna uncomment this code here and I'm G to
00:25:26.720 simulate like if the API would uh time out so I'm adding a sleep time of two
00:25:33.799 seconds and pretend that the the API is standing
00:25:38.840 up um and then we should see sorry sorry
00:25:44.279 just real quick I don't think we see the code only see the dashboard oh sorry let
00:25:49.720 me I think I need to share the screen again sorry thanks for the heads up
00:26:01.600 oh I should have put it put selected entire screen
00:26:08.679 instead okay I'm going to go back to the code here I have a so yeah I was showing
00:26:14.000 that I have a script here that's pinging the request every half a
00:26:19.520 second and I have uh this is my controller I'm
00:26:25.360 fetching from this public API and there is a sleep time here that every
00:26:31.399 like 20% of the time you should generate errors as you can see here we we're
00:26:36.919 getting a couple of 500 and now what we're going to do is see like how a timeout rate error in my
00:26:44.600 controller can impact uh the graphs um here we can see that the
00:26:50.520 number of 500 requests are increasing as well of the number of 200 requests are
00:26:56.520 decreasing um the P99 already went all the way up saying that's taking over two
00:27:03.880 seconds to complete requests for 1% of the users
00:27:10.279 um and the P90 should go up too because you're raising errors for 20% of the
00:27:16.399 users but that takes a little bit longer uh okay the other thing I want to
00:27:22.919 show you is that I have an alert here right now it's normal and it says that
00:27:29.240 when my jokes page p50 is over two seconds um it will notify me on slack um
00:27:39.480 for example if I don't really have it set up to notify me but it will it will go
00:27:45.760 red um cool let yeah P95 went up let's
00:27:51.640 make it so the timeout occur for 80% of the you users and for
00:28:00.279 this uh we will have to wait a minute or two in the meantime I'm going to show
00:28:05.960 you some extra stuff hope I'm not going too fast I have a
00:28:13.039 separate dashboard here separate graph for because I was curious how many of
00:28:18.200 those jokes have uh a title or not uh because one time I was refreshing
00:28:25.200 you all the time and most of the jokes didn't have a title so I thought maybe maybe most of the jokes doesn't have a title but I actually surprisingly
00:28:33.000 actually it's kind of half half and you can see here that sometimes most of them doesn't have a title but some other
00:28:39.760 times most of them do have a title and I don't know this is an example of
00:28:44.919 something you can track Uh custom for some feature uh in your
00:28:53.080 application uh okay the 500's going up but the p50 is still
00:28:59.840 stable didn't get enough errors um so I'm going to show a little bit of
00:29:07.080 cold in the meantime uh so I have Prometheus exporter here as an initializer I'm just adding the midor
00:29:13.880 and it handles everything else uh but I do have to run it as a separate
00:29:19.679 server and then here I have per yo and what I'm saying is that this is the end
00:29:26.279 point for Prometheus exporter I'm telling Prometheus to scrape Matrix from
00:29:33.200 Prometheus exporter on this port and then I go back here I have my custom
00:29:40.240 matric it's a counter matric the name is joke joke's custom metric and it tells
00:29:46.440 me whether a joke has a title so joke title present it's going to be true otherwise going to be
00:29:52.720 false um and here uh this is the end point where promethus collects the
00:29:59.440 metric so you can see my custom metric here uh the counter for the ones that
00:30:05.200 had title or not we can see the the summary metric for ATP request duration
00:30:12.080 uh it creates that for every controller I only have one controller here and he also creates the qu the quen ties that I
00:30:18.320 can use for my p50s P90s and Etc if I go back here now something
00:30:27.279 really bad's going on right the p50 that's taking over two seconds uh but
00:30:32.960 luckily our team uh is aware of it because we're getting an alert on slack
00:30:38.519 and everybody it's act it's taking an action uh has their eyes on it and we
00:30:45.000 heard from observability and not from the customers which is uh the main take
00:30:50.559 of this presentation uh let's go back to the
00:30:56.120 presentation and we're almost was wrapping up here I do want to invite uh
00:31:01.360 folks for a reflection maybe even homework uh so my question is do you have observability implemented if yes uh
00:31:09.440 when was the last time the observability helped you find a bug uh was there a
00:31:14.480 recent incident that observability could have caught and yeah this is a great
00:31:19.720 opportunity for you to introduce some alerts and maybe some metrics to and
00:31:25.240 remember that they have to be actionable the alerts and if you don't have observability at
00:31:30.440 all then why not try to implement one of those tools that I mentioned on this talk however if you're just a Dev
00:31:37.720 instead of if you don't want a full-time job um taking care of all the infrastructure
00:31:44.080 for promeals and refen and the open source tools I recommend going to one of the uh commercial tools they most of
00:31:50.960 them have like free trials and free accounts that you can start with and sell the idea to to your team and your
00:31:58.399 company and yeah it should really pay off with time and last question does it
00:32:04.440 your team have a Playbook not maybe create a Playbook list all the alerts there all the metrics what's the intent
00:32:11.960 uh what to do when the alerts are going off too often and things like that uh
00:32:17.440 and all of these things they're really valuable to the company so if you see opportunity to work on some of these
00:32:24.159 items it should help you your career hopefully maybe even a promotion that
00:32:30.240 you're working on um so yeah keep that in mind it's a great
00:32:35.960 opportunity and that's it we we're we're done that I have there my email and my
00:32:43.880 website if you want to reach out my website has all the social media I will love to keep in touch so feel free to
00:32:50.240 give me a follow and send a message and I have some references here too I'm
00:32:55.279 going to share the slides later uh so you can access is the links we're good
Explore all talks recorded at WNB.rb Meetup
+20