RailsConf 2022

Finding the Needle in the Stack Trace: APM Logs-in-Context

Finding the Needle in the Stack Trace: APM Logs-in-Context
- New Relic - Kayla Reopelle and Mike Neville-O'Neill

RailsConf 2022

00:00:00.900 foreign
00:00:12.420 good afternoon and welcome my name is
00:00:14.160 Mike Neville O'Neill I'm a senior
00:00:15.599 product manager at New Relic on the logs
00:00:18.060 team today I'm joined by my co-presenter
00:00:20.340 Kayla riopel who is a senior software
00:00:23.520 engineer on our open source agent team
00:00:25.500 for the Ruby agent in particular and
00:00:28.320 today we're going to be spending a bit
00:00:29.519 of time talking about APM logs in
00:00:31.679 context unsurprisingly but before we get
00:00:34.559 into the specifics of the features and
00:00:36.180 functionality that we released earlier
00:00:37.500 this month I want to give you a brief
00:00:38.880 introduction to new Galaxy that you can
00:00:40.620 get just a quick understanding of who we
00:00:42.360 are and what we do next we'll move into
00:00:45.120 how it actually works and the work that
00:00:47.700 we ended up doing to actually build the
00:00:49.379 features and functionality that we'll be
00:00:51.059 demoing closer to the end of the talk
00:00:54.239 so what is New Relic I think it makes
00:00:56.460 sense to think about neuralic in terms
00:00:58.079 of the customer problems that we try to
00:01:00.059 solve where what you see here today is a
00:01:04.140 software monitoring architecture which
00:01:06.119 is reasonably common in fact this is the
00:01:08.159 observability architecture of a customer
00:01:09.600 we worked with before they migrated to
00:01:11.880 neuralik and so what causes what we're
00:01:14.580 looking at right here essentially
00:01:16.500 there's been an explosion in data over
00:01:18.240 the last couple of decades an explosion
00:01:20.460 in Technologies and platforms the
00:01:22.259 developers and Engineers are responsible
00:01:23.820 for maintaining and working with and
00:01:26.040 what has come along with that explosion
00:01:27.600 in data is an explosion of tools that
00:01:30.360 can be used to ingest manage and get
00:01:32.460 analytics from that data as a result
00:01:35.100 what we found is that software teams
00:01:36.659 have often been forced to adopt a
00:01:38.579 fragmented monitoring tool set for their
00:01:40.740 infrastructure versus applications
00:01:42.180 versus logs versus digital experience
00:01:44.939 aiops and on and on and on and this
00:01:47.520 leads to a couple of interesting
00:01:48.659 consequences the first of which is
00:01:50.520 there's no real single source of truth
00:01:52.560 that exists in an environment like this
00:01:54.180 what you have to do is consult a number
00:01:56.460 of different tools to try to Cobble
00:01:58.200 together or to synthesize a picture of
00:02:00.360 what's going on in your environment
00:02:02.399 the second interesting consequence of
00:02:04.320 this is all of the time that you end up
00:02:05.880 having to invest in synthesizing that
00:02:07.860 picture it's not something that's easy
00:02:09.599 to do and there's a lot of friction as
00:02:10.920 you switch context from tool to tool to
00:02:12.780 try to get a comprehensive view of the
00:02:14.340 health and status of a given environment
00:02:16.800 uh the average organization at this
00:02:18.420 point has dozens if not hundreds of
00:02:20.099 disparate tools that are used to monitor
00:02:21.599 different parts of their technology
00:02:23.040 stack so what we've heard from our
00:02:25.680 customers from our developers and
00:02:26.760 Engineers is that they were looking to
00:02:28.319 put their customer and operational data
00:02:30.060 in the same place
00:02:31.920 hence we built a New Relic platform so
00:02:34.560 we spent two years working with our
00:02:35.879 customers to understand how we could
00:02:37.560 build a platform where they could
00:02:38.760 centralize that customer and operational
00:02:40.319 data and what we end up doing is taking
00:02:42.540 all of your Telemetry data and we ingest
00:02:44.280 it into a single open data platform
00:02:46.319 where metrics events logs and traces
00:02:48.599 from any Source can be observed both
00:02:50.160 separately or together
00:02:51.720 we also have machine learning baked into
00:02:53.580 every element of the platforms that
00:02:55.800 teams can detect understand and resolve
00:02:57.780 incidents faster reducing alert fatigue
00:02:59.840 being more proactive and data driven and
00:03:02.640 lastly we wanted to allow our customers
00:03:04.440 to easily visualize and troubleshoot
00:03:06.120 their entire software stack in one
00:03:08.099 connected experience so without that
00:03:10.200 need to be switching context from tool
00:03:11.640 to tool to tool to see what's going on
00:03:12.959 in your environment
00:03:16.260 so with that we can get into APM logs
00:03:19.140 and context now that we've explained a
00:03:20.760 little bit about New Relic and what it
00:03:22.019 is that we do in the observability space
00:03:24.599 and I want to start with kind of a
00:03:26.220 specific problem to logs since we've
00:03:28.019 covered the higher level customer
00:03:29.099 problems that we're looking to solve but
00:03:30.900 I think with log data there are really
00:03:32.580 two key challenges that are always
00:03:34.620 present it's collection and correlation
00:03:37.019 how do you get that log data out of a
00:03:38.940 source for a given environment into a
00:03:40.920 destination of your choosing that we
00:03:42.239 hope is New Relic but it could be
00:03:43.500 another
00:03:44.400 and the second question is how do you
00:03:46.200 actually ensure that that data is
00:03:47.640 correlated how do you make sure that it
00:03:49.140 is Meaningful and useful to you once
00:03:50.940 you've ingested into a platform and how
00:03:53.640 can you ensure that it's going to be
00:03:54.840 available when you need it and where you
00:03:56.760 need it
00:03:58.319 so we really approached these entire APM
00:04:01.560 logs in context experience with those
00:04:03.239 two problems in mind and for the
00:04:06.599 correlation piece of it we ended up
00:04:08.519 building the logs and context component
00:04:11.099 that connects application Logs with
00:04:12.840 transactions and all of this is done
00:04:15.180 automatically essentially what we're
00:04:16.739 able to do is to inject span or Trace
00:04:19.680 IDs into the logs themselves so that you
00:04:23.160 never actually have to run a query to
00:04:24.660 find the logs that are associated with
00:04:26.160 these transactions or spans or traces
00:04:28.020 you'll never ever have to try and figure
00:04:30.419 that out or run a query for something
00:04:32.040 that the platform knows out of the box
00:04:33.660 so all that work is done for you
00:04:38.340 in addition to the enrichment capability
00:04:40.860 we also added the ability to forward
00:04:42.660 logs directly from the APM agent so what
00:04:46.080 this means is it doesn't require any
00:04:47.699 sort of domain expertise or access to
00:04:49.620 the underlying host or environment to
00:04:51.240 configure a dedicated log forwarder the
00:04:53.520 individual developer is empowered to
00:04:55.259 choose the data that they want to send
00:04:56.639 to make their investigations easier
00:04:58.740 we've also automated that enrichment
00:05:01.320 process so what we saw in the last slide
00:05:03.479 that enrichment and injection of span
00:05:05.160 and Trace ID into the logs there is zero
00:05:07.560 configuration required to get that up
00:05:09.180 and running as Kayla's going to show you
00:05:10.620 in just a few minutes you get up and
00:05:13.080 running very quickly and replicate that
00:05:14.880 experience and lastly we enhance the UI
00:05:17.040 to place log hooks around various
00:05:19.740 elements and features to make sure that
00:05:22.020 the log data was always going to be
00:05:23.220 available to you when you need it for
00:05:24.660 investigating a trace or an
00:05:26.460 infrastructure issue or another issue
00:05:29.639 at the moment you can see our supported
00:05:31.800 languages and Frameworks uh which was
00:05:33.660 starting earlier this month so we
00:05:34.860 support java.net Ruby as well as node
00:05:37.560 and coming soon will be python go and
00:05:39.960 PHP
00:05:43.560 so with that out of the way I'd like to
00:05:46.139 turn things over to Kayla who's going to
00:05:48.720 run you a bit through how we built it
00:05:50.400 and give you a quick tour of the product
00:05:52.500 thank you Kayla thanks Mike hello
00:05:55.139 everyone so how does this feature work
00:05:59.039 to talk about that I'm going to go over
00:06:01.740 a little bit about how we run the Ruby
00:06:04.320 agent and instrument new features and
00:06:06.479 then we'll give a quick demo of the
00:06:07.680 product to see the data in the UI
00:06:10.800 so when we have a new piece of code that
00:06:14.940 we want to instrument or really a new
00:06:16.259 library we want to instrument we try to
00:06:17.820 find the common place where all of the
00:06:20.039 events that we want to capture will flow
00:06:21.600 through and all the necessary attributes
00:06:23.819 and data will be accessible hopefully
00:06:25.680 that'll be within the same method
00:06:26.940 sometimes it might need to be within a
00:06:28.380 few methods but we're really fortunate
00:06:31.080 in Ruby that the logger class that's
00:06:33.180 built into the language is actually what
00:06:35.280 the majority of the libraries will use
00:06:36.960 to log the messages that you see so
00:06:40.139 that's where we decided to put our code
00:06:42.000 so let's look closer at the Ruby logger
00:06:44.220 class
00:06:45.180 right here this is a bit of an overview
00:06:47.520 in case you haven't used the Ruby logger
00:06:49.199 before you can start out by calling
00:06:51.840 logger.new and defining where you would
00:06:55.020 like to Output your log messages
00:06:56.160 sometimes to a file sometimes to
00:06:58.080 standard out
00:06:59.419 you can also call by severity
00:07:03.080 logger.info as an example here and the
00:07:05.940 rails logger uses this class as well and
00:07:08.220 looks very similar where you just call
00:07:09.740 railslogger debug to get your logs
00:07:12.180 output
00:07:14.400 so under the hood how does this actually
00:07:16.620 work so when you call that logger.error
00:07:19.800 what is actually happening is this
00:07:23.160 method right here and it's calling
00:07:25.380 another method called add and passes
00:07:27.419 just an error constant to it as the
00:07:29.460 first argument
00:07:31.560 so let's take a look at ad this is kind
00:07:33.660 of the next layer we're getting closer
00:07:36.720 um inside of add you know where it does
00:07:38.940 a lot of different checks
00:07:40.620 um but we ultimately decided that this
00:07:42.180 wasn't the method that we wanted to use
00:07:44.099 to instrument but the method we
00:07:45.780 instrument is inside here
00:07:47.580 so taking a look at this code
00:07:50.039 um I don't know if anyone can spot what
00:07:52.080 we ended up using
00:07:54.479 we ended up using format message which
00:07:56.759 is right at the bottom of this method
00:07:59.220 and what was unique about format message
00:08:01.919 is that it actually formats the severity
00:08:04.020 this is a little bit of a detour but the
00:08:07.020 format severity is
00:08:09.479 um
00:08:10.139 it's taking severity which at this point
00:08:12.780 has still been an integer which is a
00:08:14.400 little easier to store and also allows
00:08:16.199 us to have severity levels where we
00:08:18.419 start with debug that has level zero and
00:08:20.879 then fatal which has five and we can
00:08:24.479 organize things that way and that also
00:08:26.520 allows us to cap our logger by a certain
00:08:29.039 severity so you can only see messages
00:08:30.599 higher than a certain value
00:08:33.180 so now that we have format severity
00:08:35.099 there to do what is calling format
00:08:38.760 severity is format message and that's
00:08:40.740 what we ultimately instrumented because
00:08:42.959 it's called every time a message is
00:08:45.300 logged it receives the severity
00:08:47.279 translated into words which was one of
00:08:49.380 our desired attributes and receives the
00:08:51.300 message in its final formatted state
00:08:53.339 which was our other desired attribute
00:08:55.800 so now we can head into
00:08:58.800 the Ruby code that we have inside of the
00:09:01.800 agent and so this is a file that looks
00:09:04.560 really similar to the other files that
00:09:06.180 we have in our instrumentation it's a
00:09:09.240 little bit silly because Ruby's logger
00:09:11.640 class is always going to be installed
00:09:13.080 unless you've done really something
00:09:14.640 interesting to try to make it forget
00:09:16.980 about the logger class but we're
00:09:19.440 checking here to make sure that you want
00:09:21.540 the logger to be instrumented and that's
00:09:24.300 one of our configuration options and
00:09:25.980 that it is in fact defined and if both
00:09:27.660 of those things are true we're going to
00:09:29.279 move forward log a message inside the
00:09:31.500 agent log and either use module
00:09:33.839 pre-pending or Alias method chaining to
00:09:36.240 go ahead and add the instrumentation
00:09:38.100 code on top of Ruby's method
00:09:40.860 and so that causes us to have a little
00:09:43.019 bit of a detour into meta programming
00:09:45.060 which meta programming can sound really
00:09:47.220 scary I was terrified of it when I
00:09:48.959 started a new Relic
00:09:50.700 um but it has it's actually not as mad
00:09:53.279 as you as you may think so I won't get
00:09:55.260 into the depths of it today there are a
00:09:56.880 lot of other great talks at Ruby and
00:09:58.440 railsconf in the past that have covered
00:10:00.959 this topic I would also recommend
00:10:02.640 checking out meta programming for Ruby
00:10:04.320 2. if you want to know more about that
00:10:05.880 it's written by Paolo perota but the way
00:10:09.120 that we leverage meta programming in the
00:10:11.040 Ruby agent is to stick our
00:10:13.740 instrumentation right before customers
00:10:16.680 code or right after customer's code to
00:10:19.320 get the diming and the other attributes
00:10:20.940 that we need and kind of just float on
00:10:23.459 the outside so hopefully you can't see
00:10:25.320 any performance or other impacts from
00:10:27.540 our agent's observation
00:10:31.279 so let's see first we're going to talk
00:10:34.080 about Alias method chaining and this is
00:10:35.580 an older strategy for meta programming
00:10:37.920 module pre-pending is preferred these
00:10:40.019 days we'll get into that next but if
00:10:42.300 you're working in a legacy code base
00:10:43.620 you'll probably come across something
00:10:45.000 like this
00:10:45.959 I like to think of Alias method chaining
00:10:47.940 is going undercover as an agent or maybe
00:10:50.880 like body snatching you tell Ruby hey
00:10:53.640 you know this method you know about
00:10:55.019 already we're actually going to give it
00:10:57.420 a different identity you'll call it by a
00:10:59.399 different name instead and it'll have
00:11:00.600 slightly different Behavior but like on
00:11:02.579 the inside it is still the same method
00:11:04.920 it's still the same identity overall
00:11:08.459 um so here we are taking the original
00:11:10.440 logger classes method format message and
00:11:14.360 re-aliasing it as format message without
00:11:16.560 New Relic and then take our format
00:11:19.079 message with tracing which is the one
00:11:20.579 that has our instrumentation in it and
00:11:22.320 wrapping it around the original method
00:11:26.040 module pre-pending is a little different
00:11:28.700 this isn't code from the agent this is
00:11:31.200 just an example of what module
00:11:32.700 pre-pending may look like and inside of
00:11:35.339 in this case we have the logger class
00:11:37.200 and we would say hey we want to prepend
00:11:39.600 our instrumentation class into it of
00:11:42.240 course that doesn't work because we
00:11:43.500 don't necessarily have Ruby's logger
00:11:45.600 inside of our code so instead there's
00:11:47.760 other stuff that happens to allow this
00:11:49.380 pre-pin to work
00:11:51.360 um so then in here this looks actually
00:11:53.339 pretty simple we just call the format
00:11:55.079 message method and then take the message
00:11:57.779 with tracing method which is what has
00:11:59.820 our instrumentation in it and then call
00:12:01.620 Super to make sure it will return to the
00:12:03.480 original higher uh like basically like
00:12:07.200 parent class that's been defined
00:12:10.079 um so this is actually the bulk of
00:12:11.519 format message with tracing right here
00:12:13.620 we start out by checking to make sure
00:12:15.540 this is something that we actually want
00:12:16.980 to instrument that an agent is present
00:12:19.140 and if that's the case then we will call
00:12:21.420 our log event aggregator and record the
00:12:23.459 message we also have something you may
00:12:25.440 notice in here the mark skip
00:12:26.579 instrumenting we found out that if we
00:12:28.140 tried to record our own events for our
00:12:30.480 own logger we caused a stack Overflow
00:12:32.399 because there's just a little too much
00:12:33.540 back and forth going on so we don't have
00:12:35.579 that yet but we may someday
00:12:38.160 um here's the record message method this
00:12:40.320 is just the first half of it and what
00:12:42.480 we're looking at here is we're first
00:12:44.040 checking to make sure that this is a
00:12:45.480 viable message to record both by
00:12:48.120 configuration and by content and then if
00:12:52.139 you are on your the new New Relic UI
00:12:54.240 you'd probably see something like a logs
00:12:59.459 graph on your APM summary home page and
00:13:02.339 that's getting populated by this if
00:13:04.019 statement that's in the middle of the
00:13:05.700 record method
00:13:08.040 the next part is where we actually link
00:13:10.380 it with the transaction so we look for
00:13:12.660 the current transaction if that's there
00:13:15.000 we grab it we also look for the log
00:13:18.240 priority which I'll get to in just a
00:13:19.740 second so if there's a transaction
00:13:21.600 present we're going to associate the log
00:13:23.399 event to that transaction if not we are
00:13:26.100 going to associate it to our separate
00:13:27.540 buffer and still create the event
00:13:30.240 so the priority
00:13:31.920 um the way that we make sure that the
00:13:33.540 logs associated with the transaction
00:13:35.040 come up first is that we set the
00:13:37.079 priority for the log events to be the
00:13:39.660 same as the transaction's priority now
00:13:42.060 that brings up why aren't all of the
00:13:43.980 logs why do we need a priority at all
00:13:45.360 why aren't they all getting sent up
00:13:47.519 um if you create over 10 000 log events
00:13:50.519 in a 60 second time period logs will get
00:13:52.980 sampled with this feature you can change
00:13:54.959 that up to as high as a hundred thousand
00:13:56.820 but what we've found so far for
00:13:58.800 customers is that sampling has not
00:14:01.560 happened or has happened in only really
00:14:03.360 rare cases so that's just something to
00:14:06.060 keep in mind but this makes sure that if
00:14:08.639 you have a transaction that's getting
00:14:09.899 recorded you are going to get the logs
00:14:12.240 all associated with that before you get
00:14:14.160 any logs that aren't associated with a
00:14:16.019 transaction
00:14:17.940 um and then at the bottom basically we
00:14:20.459 assign a random float that is up to six
00:14:25.019 decimal places long as the priority uh
00:14:28.380 instead if there is no transaction
00:14:30.240 priority
00:14:32.100 all right so here is the rest of that
00:14:34.139 record method I don't think we needed
00:14:36.420 this slide
00:14:38.339 and now we are on to creating the actual
00:14:40.620 event so that was called by both the
00:14:44.600 transaction and the non-transaction
00:14:47.220 sections of that conditional statement
00:14:49.279 and here is where we get your linking
00:14:51.959 metadata so at this point in time we're
00:14:54.120 able to see if it's associated with the
00:14:56.339 transaction the trace ID and the span ID
00:14:58.500 that have been grouped with that
00:14:59.820 transaction
00:15:00.779 and we can also add the priority
00:15:02.519 together and so this hash and or I guess
00:15:05.699 ultimately array of hashes is all the
00:15:08.639 log event actually is at the end of the
00:15:10.440 day
00:15:11.220 and then when we go to actually send it
00:15:13.800 up we append the linking metadata that's
00:15:16.380 related to your application as a whole
00:15:18.120 and
00:15:19.380 linking metadata that is needed to
00:15:21.899 actually send your log events up and
00:15:24.600 correlate them with your application so
00:15:26.160 we'll see here that we delete actually
00:15:28.260 an irrelevant
00:15:30.079 attribute that is usually in this
00:15:31.980 payload
00:15:33.600 um so then we create an array of those
00:15:36.120 common attributes and those logs and
00:15:37.980 that's what gets returned and sent up to
00:15:39.600 New Relic to decorate the UI so there's
00:15:43.320 the journey from Ruby's logger all the
00:15:45.540 way to the UI so let's take a look at
00:15:48.779 the UI next
00:15:50.579 all right so this is demotron this is
00:15:53.100 the account that we usually use for our
00:15:54.600 demos with our customers
00:15:56.279 and this order composer is a ruby
00:15:58.620 service that has been updated not to the
00:16:01.740 latest version of the agent I see
00:16:04.320 um okay we were looking at a different
00:16:05.699 one earlier but it should still have
00:16:07.079 logs let's double check
00:16:09.720 it does all right so once you've
00:16:12.540 upgraded to version 8.7 of the agent or
00:16:15.120 higher unless you have configured to not
00:16:17.100 collect metrics you would normally see a
00:16:19.620 chart right here that breaks down your
00:16:21.240 metrics by log severity level
00:16:24.240 another place where you can take a look
00:16:25.680 at logs is inside of this triage tab
00:16:28.440 underneath logs and this will have all
00:16:31.079 of the logs for your particular app
00:16:33.660 right here from most recent starting at
00:16:37.320 the bottom of the scroll as you go up
00:16:40.500 Ayers inbox is another feature that I
00:16:42.360 personally was really excited about
00:16:43.620 because now instead of just seeing your
00:16:46.620 stack traces you get the logs that are
00:16:49.019 associated with that individual stack
00:16:50.820 tray side by side so you can see what
00:16:52.620 happened before that in that individual
00:16:55.560 method had an error raised
00:17:00.899 another place that's pretty exciting is
00:17:02.759 inside of distributed tracing so we
00:17:04.740 talked earlier about how the trace ID
00:17:06.179 and span ID are associated with the log
00:17:08.160 event
00:17:08.939 and I believe we should be able to see
00:17:11.579 it on this one let's take a look
00:17:14.360 so if we go to this controller Sumatra
00:17:18.000 purchase there's a log tab up here and
00:17:20.880 so this is where you would normally take
00:17:22.740 a look at all of the different spans
00:17:25.740 that occurred inside of that particular
00:17:27.839 request
00:17:29.040 and you can look at your logs as well
00:17:31.440 and you can see all of the logs that
00:17:33.000 happened in this request as well
00:17:35.340 so this is just a brief overview of how
00:17:37.440 that data shows up in the UI we have
00:17:40.020 ideas for more places that we want to
00:17:41.820 integrate it and hope that you can enjoy
00:17:44.520 them soon so
00:17:46.440 um I want to just tell you really quick
00:17:47.760 about a customer that was part of our
00:17:49.559 Early Access program the New York Public
00:17:52.440 Library is a pro bono customer at New
00:17:54.240 Relic and they joined us and decided to
00:17:56.760 hook up some of their rails apps to the
00:17:58.559 logger and what they found was that it
00:18:01.440 actually got them to change the way that
00:18:03.120 they thought about logging on their team
00:18:04.500 they're now revamping their logs to help
00:18:06.660 make them more meaningful especially in
00:18:08.820 the cases of common intermittent errors
00:18:10.740 so one error in particular that they've
00:18:12.720 been facing is that they have a
00:18:14.039 microservice structure and they've been
00:18:15.660 getting failed requests for certain
00:18:18.720 Services as they talk to each other but
00:18:20.520 it hasn't been consistent enough to pin
00:18:22.320 down the errors using the stack Trace to
00:18:24.660 any like true meaning and so now they
00:18:27.360 are adding more logs to they're asking
00:18:29.580 they're adding the agent to another one
00:18:31.860 of their apps they're adding more unique
00:18:34.080 log messages to the server services that
00:18:36.059 are having intermittent problems and
00:18:37.559 they're hoping that this will help them
00:18:38.940 solve these issues to have a better
00:18:40.860 product for their customers we also have
00:18:43.380 another customer that's been using this
00:18:45.059 product in a very different way they
00:18:46.559 have a much larger situation if you want
00:18:49.260 me to chat about it or do you want to
00:18:50.460 chat about it Mike I can talk a little
00:18:52.320 bit about it as well is it the customer
00:18:54.360 that we've been working with uh
00:18:56.880 on the APM agent front is Chegg and if
00:18:59.520 you're not familiar with them they work
00:19:00.539 in the educational services space so
00:19:02.340 they provide textbook purchases and
00:19:04.679 rentals for both physical and digital
00:19:06.840 and there's really two use cases that
00:19:08.460 they identified where they thought that
00:19:09.600 this would be particularly helpful the
00:19:11.580 first of which is that they wanted to
00:19:12.840 deprecate a series of fluent D side cars
00:19:15.000 that they were running in a serverless
00:19:16.140 environment and so that was going to
00:19:17.520 reduce the amount of toil and cost on
00:19:19.860 their end since they can just handle the
00:19:21.120 log forwarding and the enrichment
00:19:22.200 through the agent itself and then the
00:19:24.480 second use case they had identified was
00:19:26.400 kind of reducing their log volume
00:19:28.440 overall in the sense that one of their
00:19:31.500 goals had always been to exclude low
00:19:33.419 value log data from collection but
00:19:35.520 that's extremely challenging in many
00:19:37.140 cases often time consuming and most
00:19:39.419 people don't have the opportunity to do
00:19:41.160 it but what's interesting is the APM
00:19:43.440 agent does have that sampling logic in
00:19:45.780 it where we're prioritizing logs that
00:19:47.820 are associated with transactions and the
00:19:49.860 amount of logs that we're sampling is
00:19:51.419 completely configurable so their plan is
00:19:54.000 to use this feature to dial down their
00:19:55.799 log data by about 25 percent which is
00:19:58.980 maybe not so good for us but ultimately
00:20:00.660 we want to make sure that our customers
00:20:02.640 are getting value out of what they're
00:20:04.020 paying for and this feature is going to
00:20:05.520 allow them to do that
00:20:07.320 thanks Mike
00:20:09.419 um yeah so the last thing that I wanted
00:20:11.220 to tell you all about is the future of
00:20:13.080 the future and that's largely dependent
00:20:15.120 on the community we've already received
00:20:17.100 one feature request to add custom
00:20:18.720 attributes to our log events and what
00:20:21.480 this particular customer would like to
00:20:23.160 do is allow their container IDs and ECS
00:20:26.820 service IDs to be appended to the log
00:20:28.620 events so that that way when there's
00:20:29.940 problems with individual containers
00:20:31.559 they're able to more easily troubleshoot
00:20:33.600 them but I bet there's a lot of other
00:20:35.280 things we can come up with maybe
00:20:37.020 filtering by severity maybe only having
00:20:39.900 particular instances of the logger class
00:20:42.000 or log files getting sent up to New
00:20:44.160 Relic and we'd love for you to
00:20:46.380 contribute to the New Relic Ruby agent
00:20:48.539 with your ideas and use cases so as kind
00:20:51.419 of mentioned at the beginning of the
00:20:52.500 talk the New Relic Ruby agent is open
00:20:54.419 sourced we accept pull requests and
00:20:57.900 future requests and bug requests and
00:21:00.240 would love to know about how you're
00:21:02.039 using the agent and how we can make it
00:21:03.780 better
00:21:05.700 So yeah thank you I just wanted to thank
00:21:08.220 everyone for coming to the talk today
00:21:09.600 thanks to Jason Clark whose wife made
00:21:12.179 this beautiful Cape uh 10 years ago and
00:21:15.240 he's a major
00:21:16.880 contributor to this feature and also the
00:21:19.440 architect for the logs product and he
00:21:20.940 was very helpful to making it built and
00:21:23.280 I'd also like to thank the other lovely
00:21:24.660 people in this audience who have helped
00:21:26.160 me from the beginning of my time as a
00:21:28.740 developer and helped me get all the way
00:21:30.360 here today
00:21:31.340 so thanks to everyone and we'd love to
00:21:34.320 open the floor up for a q a if you have
00:21:36.780 any questions about how to use this
00:21:38.760 feature or
00:21:40.500 what we talked about today
00:21:51.260 yeah yep so the question was we
00:21:54.659 mentioned that we needed to upgrade the
00:21:56.340 agents in order to get this feature are
00:21:58.559 there other steps that you need to do in
00:22:00.120 order to get it enabled and will you
00:22:01.620 need to pay more for this feature so the
00:22:03.659 way that this feature works is that once
00:22:05.460 you upgrade to version 870 of the agent
00:22:08.039 or higher log forwarding is enabled by
00:22:10.140 default so you don't need to change
00:22:11.640 anything in your configuration in order
00:22:13.559 to get your logs sent up to new relics
00:22:15.360 straight away
00:22:16.799 um if you want to do something like
00:22:18.179 adjust the maximum numbers of samples
00:22:20.460 stored then you can open up your
00:22:22.679 configuration file or add an environment
00:22:24.419 variable as far as Costcos ingest rates
00:22:27.840 do apply so the same way if you're on
00:22:30.780 the new consumption model the way it
00:22:32.400 works is that you pay for users and the
00:22:34.020 data that you ingest and so the first
00:22:35.760 100 gigabytes of data are always free
00:22:37.679 and then after that it becomes more so
00:22:41.039 um that's that's just how it applies
00:22:42.659 here as well does that answer your
00:22:44.159 question okay great
00:22:47.400 all right cool well thanks everyone for
00:22:50.580 coming if you have other questions or
00:22:52.200 just want to talk about something else
00:22:53.220 feel free to pop up to the front
00:22:54.659 afterwards and I hope you enjoy the rest
00:22:56.760 of the conference
00:22:57.900 thanks folks