Talks
Automation Engineering with Serverless Compute

Automation Engineering with Serverless Compute

by Kevin Lesht

In the RubyConf 2020 talk titled "Automation Engineering with Serverless Compute," Kevin Lesht shares his journey towards automating the process of updating his omelette blog using serverless computing and Ruby. This hands-on session explores leveraging serverless functions to automate a tedious task that many encounter—managing content updates for a blog.

Key Points Discussed:

- Automation Goals: Kevin commences with the challenge of posting omelette photos to his blog, explaining his desire for a seamless process where a photo taken automatically gets uploaded if it's of an omelet.

- Identifying Structures: To achieve this, Kevin identifies necessary components in the automation pipeline, including capturing photos, parsing and detecting if they are actually omelets, and uploading them to his blog.

- Using Google Photos API: He thinks of utilizing the Google Photos API to directly fetch photos from his mobile devices but highlights the complexity of implementing OAuth for authorization without a web interface.

- Embracing Serverless Technology: Kevin discusses the potential benefits of serverless computing (like AWS Lambda) to abstract away infrastructure management, simplifying his focus on the code.

- Brick Walls: As he navigates challenges (or "brick walls"), Kevin emphasizes persistence and the importance of problem-solving in software development. He cites inspiration from Randy Posh's insights on overcoming obstacles.

- Leveraging AWS Recognition: To facilitate object detection in photos, Kevin explains his use of AWS Recognition, which initially involves using pre-trained models before realizing the necessity for a custom model to accurately detect omelets.

- Building a Custom Model: He elaborates on the process of creating a reliable dataset with labeled images to successfully train a model for recognizing an omelette.

- Automation Strategy: Through a series of serverless functions including parsing photos, processing them concurrently with Lambdas, and storing relevant timestamps in DynamoDB to keep track of submissions, Kevin elaborates the creation of an automated workflow that efficiently updates the blog.

- Final Thoughts: In closing, he acknowledges that while the implemented solution might be complex, the real learning comes from embracing challenges and recognizing "brick walls" as opportunities for growth. Kevin hints at future projects, showing his enthusiasm for continuous learning in the tech space.

Ultimately, Kevin's presentation illustrates how innovative serverless technology can streamline tasks and the importance of resilience when solving technical problems.

00:00:01.280 What's up, RubyConf? Kevin Lesht here, and I wanted to bring you into the kitchen for this talk titled "Automation Engineering with Serverless Compute" or how I keep my omelette blog up to date.
00:00:04.000 Yes, as some of you may know, I eat quite a few omelets. To document those creations, I take pictures to really capture the moment.
00:00:09.760 I post those pictures to my omelette blog, fiveminutesinaflip.com, but I haven't been good about keeping the blog updated. My last post was in 2019, yet I've still been eating omelets and taking pictures, but I haven't been posting. I found that it’s really the process of bringing these photos online that’s been holding me back. It's just a lot to manage.
00:00:18.320 After I take my omelet photo, there’s a whole getting it online part. The blog is managed by Tumblr, so I have to open their app, select the photo, and post it. It's a whole thing. Some might say it’s easy, but to them, I would say it’s not easy enough. I wanted my user experience to be as simple as taking a photo. If it’s an omelet, it makes it onto the omelette blog. That sounds easy, so I started thinking about what that would even look like. This is the future to me, and I have no idea how to get there, so let’s backtrack a bit and see if we can lay down some tracks.
00:01:01.039 Assuming I have a photo and it is of an omelet, I want it uploaded to the omelette blog. If it’s not an omelet, I don’t want it on the blog. Sorry pup, but don’t worry, she’ll be back later. Mapping this out already identifies a step in our process: we’ll need to figure out how we can parse an image and detect if it’s an omelet or something else. Then, if the photo we’re looking at is of an omelet, we can patch it to the blog.
00:01:20.000 So when I take a photo, we parse and we patch—parson patch, parsley patch. Yes, we have a name and a logo, credit to my friend Tom McHenry on the logo there. How great is that? There are a few of those coming up, and he crushed it on all of them, so thank you, Tom.
00:01:30.000 Now we can really get to work on this project. The only thing that was stopping us right there was the logo. In order to build a service that can parse my photos, I suppose we’re first going to need access to those photos. I’m using my phone as my camera, so my first thought is some kind of mobile app that could hook into my camera events. On photo taken, it would kick off the process, whatever that later process might be. But wait! Hang on, I already have an app that does just that. Google Photos is configured to automatically sync my pictures up to the cloud. So maybe there's something to leverage there—a Google Photos API that I could hit.
00:02:44.800 A little research, and look at that—a REST API! That’s interesting. It looks like they do have an endpoint for returning the contents of a library. This might be just what we’re looking for, but wait! Hang on, OAuth. This may actually be tricky.
00:03:23.920 See, when making an API request, some endpoints are going to need authorization. We wouldn’t want just anyone looking into my Google Photos; I mean, who knows what they might find in there? OAuth is a protocol for signing API requests to essentially say that you’re authorized to access what you’re asking for. The problem is that everything I know about OAuth with Google is that it’s usually web-based. You log into your account, and the application initiating the request is then granted an access token. That token can then be used to sign requests for access to the user’s content. Well, that’s just not going to work here.
00:04:01.840 The whole point of this project is easy automation. I can’t be bothered to go to a website and log in. This raises an interesting point: if I’m not planning on running this stuff through a website, where am I planning to run it? I’ve been hearing a lot about this serverless technology stuff and I've wanted to play around with it. Maybe there could be something here. With a serverless function, the provider manages all of the infrastructure: hardware, the operating system, hosting—all of that stuff is abstracted away, and your code is what you focus on. In something like AWS Lambda, you can write a script, wire it up to be called, and run things without being at the controls.
00:05:14.560 I like this; it’s also in the talk title! So let’s see where it takes us. Now, back to that OAuth piece, we still need to figure out how to OAuth when there is no browser. It became pretty clear that this was not going to be resolved by a quick search, and I hit a brick wall. We all hit them, and sometimes they look like this, or sometimes they may look like that. No matter how they present themselves, they come as obstacles, some resistance that we need to fight through to get to where we want to be.
00:06:06.600 Now, I have hit so many brick walls in my career that I’ve studied the concept. One framing that really stuck with me comes from Randy Posh, a famed Carnegie Mellon professor. In his famous last lecture, he presents them as being there for a reason—they’re to let us prove how badly we want things. It would have been easy to quit right here; nothing was working, and everything I tried was coming up short. But I really wanted to automate my omelette blog, so I kept at it.
00:07:02.560 Sometimes, all it takes is a little persistence and a different approach. In an article by Martin Fowler, he explains that you can set up a one-time web flow to issue what's called a refresh token. This refresh token can then be stored and used to call out to Google whenever you want for an OAuth token. That OAuth token is all we need to sign our requests for access to Google Photos.
00:07:19.520 By stepping through this flow one time, we can store our refresh token in an environment variable, which we can later access from our Lambda. Then anytime we kick off our service, we can use our refresh token to authenticate with Google. From there, we can send a signed request to the Google Photos library endpoint to return my photos.
00:08:00.000 With access to a photo in hand, the next step in the system is to parse it and see if we have an omelet. This part I can handle. Have you heard of a site called Caption Caddy? How often have you been about to post a photo to social media only to falter at the last minute without a caption? Caddy was a project I developed to solve just that problem.
00:08:53.760 Users upload a photo, and AWS Recognition runs object detection and returns suggestions on what it thinks is in that photo. From there, we return a high-quality caption from my database. I’m thinking we can leverage AWS Recognition for this project. This service takes a photo, and with deep learning technology, it identifies objects to provide a list of labels and confidence scores. So let's test this out.
00:09:32.960 In order to test, we're going to need data, so let's make an omelet. First, the pan—make sure you have a good one—and fire that thing up. Throw in some butter—like a lot of butter—and it’s time for the eggs. Four in, three yolks out, some whisking in the pan, a flip, and omelet! Throw down an avocado and dig into those right now.
00:09:51.680 We’re ready to go! Taking our photo, we can run it through recognition and wait. Some of these are close, and I guess some of these are technically correct, but no! This is not going to work for us. After some research, it turns out that the default recognition interface leverages pre-trained models, and for a lot of cases, this is great.
00:10:25.120 With a pre-trained model, you can really recognize a lot—a huge span of objects, scenes, and activities are all easily detected. You don’t have to deal with any of the setup needed for a custom model. You don’t have to gather training data, or ready your model for production or anything in between. But if you're trying to detect something that's a bit obscure, you might need to go custom and train a model specifically designed for recognizing what you're looking for.
00:10:49.200 Now, omelets are not that obscure, at least to me, but we need precision here; we can't risk missing a photo and letting our fans down. So it’s time to custom train. First, we need to create our dataset, a collection of images with the objects that we want to detect.
00:11:09.920 Luckily, I've got a few of those. For a good dataset, we want our target objects to be presented in a bunch of different ways—different backgrounds, different compositions, and different framing. The more variability we can provide, the better the chances of our model being able to detect an omelet no matter its presentation.
00:11:53.600 Because let’s be real: I'm not sure I'm always prepared here, and I can't nail it every time. So with a good set of omelet photos in hand, we upload them to recognition, and it’s time to label. At this step, we go through each image and classify what we’re seeing. Each of our photos is going to have an omelet, so we can apply that label across the board.
00:12:27.520 But we also might want to call out other things like avocados, toast, or sometimes chili—no joke, great omelet pairing right there! Not even kidding! All of the labels we provide are helpful, and the work can be a little tedious at times, but these definitions are critical during training to help our model learn to recall the objects that we’re looking for.
00:12:50.000 Once our dataset is labeled, it's time to train. To do this, we want to put together a test set of data. Within this set, we want some images that have our targets and some that don’t. As the model trains, it processes our data and leverages its learnings against the test set, stepping over each test image to validate if our objects are present and honing its accuracy.
00:13:20.880 Once training is complete, we now have access to a machine learning model custom trained to fit our needs. We can spin this up, and by taking our photo and running it through our custom model, we get our labels back and now know that this photo was an omelet. Because it wasn’t apparent before—well, at least not programmatically. This takes us to the patching end of the service; once we know that we have an omelet, it’s time to upload the photo to the blog.
00:14:18.960 But wait a minute! Where are we even sending these images? Up until now, the idea was that all of this work we’ve been describing would live inside of a serverless Lambda function, and well, the implication of serverless is that there is no server. I’m not really going to be able to store these photos in there. You can temporarily store files within a Lambda, but as soon as your function invocation returns, you lose your reference to those files.
00:15:32.880 I know this is going to be a static site just showcasing my omelet photos, and I do have a few of those floating around out there. These are all just hosted on S3. You can configure a bucket for static website hosting, and you can also create, update, or delete your files programmatically through the AWS SDK.
00:15:52.720 Once we know that we have an omelet, we can upload the photo to our S3 bucket—all from within our Lambda. So let’s go with it, and while we’re at it, let’s rebrand the new and improved fiveminutesinaflip.com. Okay, so we have a photo, and once we know that it’s an omelet, we post it to an S3 bucket that sits behind the site. After that, we know what—shoot, the site’s still going to be the same!
00:16:43.760 See, even though we’ve added our photo to the bucket, our HTML hasn’t been updated to include the new post. We need to rebuild the site. This was a tough one and where I really had to take a step back. The process as diagrammed out so far was to authenticate with Google, fetch a photo from my Google Photos library, parse it for an omelet, if found, patch it to the S3 bucket, and then rebuild the site. And this would probably be okay if we were dealing with just one photo.
00:17:16.960 But in reality, there are times when I take more than my daily omelet picture. It’s rare, but it happens. We can’t always assume that the last photo in my library will be an omelet; that’s why we built the whole parsing part of this service. When we request my library contents, we’re really going to want to grab all of the photos taken since the blog was last updated.
00:18:16.000 Then we process that whole batch for omelets, and only once we’ve made it through the entire collection of photos and have posted any omelets to our bucket can we rebuild the site. So how do we do that? How do we reconfigure this process around the idea of a batch of photos? Going about things procedurally might be an option.
00:19:12.960 We could collect all the photos taken since the blog was last updated and one by one parse them for omelets, patching any that hit to the blog’s S3 bucket. Once we process all of our photos, we can run some code to rebuild the site. Here, we’d hit our S3 bucket and collect the keys for all of our photos. With the keys we collected representing filenames and with our site design being that of just photos with headings, we have all the information we need to put things together.
00:19:50.560 We break the keys down into batches so that we can paginate—that’s a must—then for each batch, we step over each item, plugging its key into an image tag and the base name from that key into a heading. We apply this over all of our items, slap our logo on there, link up our pages, and we’ve got our site. We store our pages as temp files within the Lambda, and once all are ready to go, we upload our files to the S3 bucket.
00:20:28.160 Once that’s done, we log the time of build so that next time the service runs, we can read in that timestamp to use in our Google Photos query so we’re only pulling in the pictures taken since our last update was published. But where do we store that timestamp? With Lambda not having any reference to its last invocation, we need to pull in another service—a place of persistent storage where we can hold on to the time of the last run.
00:21:12.160 DynamoDB is another AWS service that might solve this problem. Here we can store keys with values, and it’s easily accessible from within our Lambda. After our site builds, we can write the timestamp of the last item in the batch, then read that back when we kick off our next job so we’re only picking up new photos when we query my library.
00:22:20.160 This might just work, but looking things over, something didn’t feel quite right. Taking a closer look at our parsing and patching of photos procedurally, one photo being processed doesn’t depend on another, so there’s no reason we need to wait on one photo before moving on to the next. Pulling at me was this idea of doing things concurrently—processing all our photos at the same time.
00:22:45.920 We’d have to figure that out, though. We’re so close right now to achieving what we want that reeling things in and being on our way, nobody would fault us for it. However, we have been on a journey here, and this pursuit of easy may have gotten away from us by just a little bit. As I was getting close to wrapping things up, something happened: I wanted the challenge of the harder path; I wanted the brick wall.
00:23:14.560 Thinking back to Randy Posh, he framed those brick walls as being there for a reason—to prove how badly we want something. What I found was that those brick walls offer something in themselves. They can be frustrating, and something you want to power through as quickly as possible, but rather than smashing through a brick wall and never looking back at the bricks, by slowing down and taking things apart piece by piece, you learn about how it was built up.
00:24:01.600 It becomes an exercise in practicing your craft and an enjoyable experience. In the future, you'll recognize similar walls and can move through them more easily once you recognize them for what they are. A certain calmness takes over; you start to want more, and you begin to seek out those difficult challenges. That’s what I recognized here—a real opportunity to try something new. So let’s dig in.
00:24:42.640 In order to process all of our photos at the same time, rather than keep one Lambda open and evaluate each photo one by one, we can move the parsing and patching into its service. If we really want to have fun, how about this: DynamoDB offers what they call an event stream. The event stream is a flow of information about the changes to the items in your table.
00:25:06.080 It captures things like inserts, updates, and deletes. Even better, we can easily hook up Lambdas to fire off at any of these events. When we query Google Photos, we could insert each photo into our database as its own record and wire up a Lambda to invoke on each insert.
00:25:54.320 Our Lambda will receive the photo item as a payload and can handle the parsing and patching of that photo before exiting out. Now we’re really cooking; we have the concurrency we want. But this does throw a curve in our rebuild step. See, under a procedural approach, with each photo being processed one after another, we can confidently say that once the last photo has been evaluated, all before it have too.
00:26:30.560 This made for a nice signal toward when to rebuild, but with all of our photos now running concurrently, they complete independent of one another, and we won’t have a marker telling us we’re clear to rebuild. We’ll need to keep track of when a job finishes, and then when all jobs have finished, we’ll know our photos have been handled and we’re safe to rebuild the site.
00:27:24.560 So let’s think on that. We could count the items in a batch and store that number in our database. For each item we process, we can increment a counter. After each job, we can check to see if our counts match, and once they do, we rebuild the site.
00:27:56.080 But what happens if we hit a race condition? We could run into a spot where two jobs kick off at the same time; both would read the same count number and each write the same value. Our counts would never match, and we’d never rebuild the site. So that won’t work. Well, we’re already planning on managing our photos as items within our database. How about adding an ‘isProcessed’ field to each record?
00:28:16.480 This attribute could start as false, and as a parsley patch job closes, we could flip it to true and query the batch to see if all have finished. When that’s the case, we rebuild the site. But nope, just the effect of jobs running at the same time opens this approach up to a race condition too.
00:28:54.320 This was starting to look more difficult than I had bargained for. It was around this time that I started to question what I had gotten myself into. I could manage a counter but couldn’t query for all the records being processed. I couldn’t close the job until a thought popped into my mind: I just needed some help, so I called up a friend.
00:29:31.760 Dave Junta, VP of Engineering at Home Chef, a great guy and a very inquisitive one—he asks a lot of questions. After some explaining, we had an idea. See, before, the plan was to check for all records being processed after each individual parsing patch job as a callback, and with concurrency, that opened the risk of transactions landing at the same time and not getting the right data.
00:30:06.560 But with a totally separate thread, we could query for all the records in our table on some interval, independently of the jobs, and just keep listening until that ‘process’ field comes back as true across all items in our batch. Once we hear back, we rebuild the site! Yes! This might just work.
00:30:44.080 Let’s put this end to end. Every day, a job scheduler kicks off our first Lambda. We query DynamoDB for the created time stamp off of the last photo record inserted. We get an OAuth token from Google to query and get all the photos that were taken after our timestamp. Each photo is written to DynamoDB, and once all photos are written, our site generator kicks off and starts listening for all records to be processed.
00:31:11.680 Inserting into DynamoDB kicks off a parsley patch Lambda for each photo record. The parsley patch Lambda sends its photo through recognition to receive back labels, and if ‘omelet’ is detected, the photo is patched to the Five Minutes in a Flip S3 bucket. The record’s ‘isProcessed’ value is updated to true, and when all records have been processed, our site generator rebuilds the omelette blog.
00:31:47.760 Holy cow! We’ve made it easy! Now, on the topic of ‘easy,’ I set out on this project because I wanted it easy. It just turned out that easy wasn’t easily achieved. I may have gotten a little carried away and designed an overly complex system, and all of this probably could have been done a dozen different ways.
00:32:16.560 But sometimes you have to dev out. I learned a ton following the threads of this project. Most importantly, what wasn’t complex was breaking through those brick walls. In one case, I just reset and approached the problem from a different angle. In another, I drew from a past project.
00:32:37.280 I broke down a problem into smaller parts. I asked a friend for help. I came into this project viewing brick walls as intimidating obstacles, but they’re not obstacles; they’re opportunities. They take time and patience and sometimes a little help to break through, but they’re opportunities to connect and to learn.
00:33:05.480 And you will break through. Once you do, you can apply your work to all sorts of things— which is great news because with what I’ve learned here, I can get started on my next project. I don't know if you've heard, but I'm starting to make pizzas too! Yes, Kevin's Pizza Pies! We'll save that one for next time, but I'm sure it’s going to come with its brick walls too.
00:33:36.560 Taking what I’ve learned here, maybe there will be an easier approach, but if there’s something more out there—if there’s an opportunity—I’m going to take it. And you should too! When you get hit with one of those brick walls, take a step back and smile because you’ve just found some fun.
00:34:06.720 Thank you! Thank you for attending, thank you to our organizers, and thank you to our sponsors. The slides for this talk can be found at kevinlesht.com/rubyconf2020. If you're interested in checking out the omelette blog, that's at fiveminutesinaflip.com.
00:34:27.600 If you have any questions or compliments for me, you can find me on Twitter at @kevinlesht. If you’re interested in more content like this, I host the Day as a Dev podcast, where I think the omelette blog has come up on every episode so far, and if you'd like to check out any of my other projects, you can keep up with those at kevinlesht.com.