00:00:11
Hello, welcome to Devly, a multi-service development environment. I'm Eric Hodel, and I work at Fastly in the Developer Engineering Department on our development environment, which I'll be describing for you today. I’ve written a lot of Ruby code, some of which you use every day, and I've been writing software of one kind or another for more than 20 years.
00:00:24
We currently work within Fastly's Site Reliability Engineering organization, focusing on improving the internal engineering experience. For those who don’t know, Fastly is a content delivery network and edge cloud provider. We serve traffic for GitHub, New Relic, Spotify, and many other popular websites and services.
00:00:40
We also provide services for all Ruby and RubyGems downloads and do the same for many other open-source projects free of charge. If you're interested in using Fastly for your open-source project, we have servers all over the world that served more than 14 trillion requests each month, which constitutes more than 10% of all Internet traffic. This still kind of blows my mind because it wasn't always like this.
00:01:05
We also employ the owners of a hundred percent of the world's best dogs. We have dogs all over the world! Currently, we are hiring Ruby application engineers, so if you're interested, please find Eric or me after the talk, and we can provide you with the details.
00:01:18
A quick note before I continue: I currently live in Portland, Oregon, but I was born and raised in a small town called Me Phil, about two hours north of here. This is my first public talk, so I'm excited to be giving it so close to home. Thank you for the opportunity!
00:01:44
Today, we would like to discuss a problem that we believe impacts organizations of all sizes. To help illustrate this problem, I'd like to tell you a story about the evolution of Fastly's API. This is a rough approximation of the service architecture that backed the Fastly API circa 2012.
00:02:04
The original Fastly development environment consisted of a copy of each component of the Fastly API running on each engineer's laptop. Soon after, the early team decided that virtual machines should be employed to provide a degree of operational uniformity and parity between development and production.
00:02:29
Another attribute of Fastly in the early days was that all the engineering work was being done by a very small group of people. Changes to the systems were easily introduced and distributed through source control, which allowed the teams to rapidly develop and deploy changes. The small size of the company also facilitated focus, and discussions were possible, which made decisions easy to communicate.
00:02:54
As the company grew successfully during this period, that success opened doors for new opportunities to expand the business by adding additional functionality to our API. In some cases, when we added new functionality, we added new supporting systems, and when we, as software engineers, everyone in this room, added new functionality and dependencies to our systems, we also introduced complexity.
00:03:25
I don’t mean to imply that complexity is necessarily a bad thing; to the contrary, I would argue that complexity is an unavoidable side effect of growth. However, the challenge is that we like to use the right tools for the job, and many of our services were written in entirely different languages with very different workflows.
00:03:57
Despite the increase in the number of languages and services, our development environment stayed much the same. Moreover, the gaps between each group's development workflows grew considerably, which became increasingly problematic as our engineering department doubled in size every six months for several years.
00:04:18
As a result, our original development environment became increasingly unreliable, and established processes to communicate changes broke down. Maintaining an individual engineer's development environment was problematic. We had engineers working on everything from code that runs on the Linux kernel to code that runs in the browser, and the needs of each team in those different areas were dramatically different.
00:04:39
Our original development environment was unable to meet one team's needs without compromising those of another. This growth continued regardless of our development modes, as companies tend to grow—that's just what they do.
00:04:57
Writing, scaling, and maintaining software is complicated, and there are many moving pieces and things to keep in mind while you're doing it. As an industry, we have established and continue to improve upon strategies that help us direct our time and energy. I believe this is due in large part to our ability to observe software systems in isolation.
00:05:16
Organizations, on the other hand, are far more complex and much harder to observe in systematic ways. However, by introspecting on our own experiences and listening to our coworkers, we have been able to find the themes and common frustrations that many share.
00:05:36
Here are examples of some common frustrations: 'Here’s your laptop, we'll see you in two weeks. My development environment is running! Does anyone know why the API gateway keeps crashing in a loop? I booted up my development environment, but now nothing works. What happened?'
00:05:54
And ‘I can’t do my work today because I need to rebuild my development environment.’ Clearly, this is untenable and has been becoming worse over time.
00:06:08
So how many people here have actually faced these kinds of issues? I’m glad to see it’s not just us!
00:06:21
Our friends and coworkers became increasingly frustrated by the situation, so what did we do? During this same period, many new tools arrived on the scene, but none met all of our needs.
00:06:34
Through observation, research, and a lot of discussion with our friends and coworkers in other companies, we arrived at a few important themes. We believe these themes embody the traits desired in developer-focused productivity tools.
00:06:50
First, the development environment must be reliable. I should be able to run a small number of commands to get what I need up and running. I should not have to know how every system works to do my job, and I should be able to easily see the local health of the systems I rely upon.
00:07:10
I should never have to spend an entire day rebuilding my development environment. The environment must be accessible, which means that teams should be allowed and encouraged to maintain their development environments collectively.
00:07:38
I should be able to build and test new changes across systems owned by different teams. The development environment must span multiple teams and workflows and should be maintainable by the community that uses it.
00:08:00
Managing changes in source control illuminates past and present ownership, even with many components. Structure and form should be encouraged through convention—documentation, tooling, and feedback loops—rather than enforced by gatekeepers.
00:08:24
We want our development environment to be able to run disparate services together as composable units. It should be easy to try new supporting systems and swap things in and out without having to worry about writing a bunch of complex code.
00:08:54
Additionally, a development environment must be reproducible. We need the ability to determine and apply the last known good state of all systems. Source control, with similar mechanisms, should allow us to determine how we arrived at this known good state.
00:09:09
We should be able to leverage existing tools like Git, RubyGems, Perl, Python’s pip, and Go modules to arrive here.
00:09:32
Through the rest of this talk, we hope to show you how we started to meet the needs of our coworkers at Fastly by applying these principles to a tool we have been building together for the last year. We call this tool Devly.
00:09:51
To tell you more about Devly, I'd like to hand things off to my friend, close product collaborator, and the lead engineer on the Devly project, Ezekiel Templin.
00:10:10
Thank you, Eric. I will talk about Devly and some of its components and features. Devly is designed for developers. It builds images from your repositories, uses those images to manage containers, and enables communication both within and across teams of developers.
00:10:39
Devly is distributed for Mac OS and Linux. We provide a standalone executable built by a Ruby packer and offer packages for Mac and Debian.
00:10:59
With Devly, you can configure all of your services. It helps you build images from your repositories using Docker files and allows you to configure those images to run as services, running groups of services together as part of a rack.
00:11:19
An image contains the files necessary to run a service. For example, the audit log image uses Ruby, so it includes a copy of our application code. This code requires some libraries like Rails, Sidekick, and a JSON parser.
00:11:45
The image contains those installed gems, and the JSON parser requires the C library. We install that along with the package needed in our repository. There is a Docker file that contains the instructions for building this image.
00:12:10
Images can contain applications for any language. Our stats service is written in Go, the code of which is a Go binary compiled from the stats application code. The web application our customers use is written in Ember; this image runs a copy of the application code ready to execute.
00:12:36
We share all these images across teams by uploading and downloading them from the Google Container Registry. This ensures we’re always using the latest images and the latest source code.
00:12:58
Devly service is a runtime configuration for an image. In this case, we’ve created the audit log service using the audit log image. The service runs a command, providing an API for managing event data.
00:13:20
It runs a Rails server to provide the HTTP interface for events. Our audit log service needs to be accessible to other services so they can read and write events, hence we expose port 8888.
00:13:40
If you use a development framework like Rails that supports live development, you can mount your repository on top of the files in the image. This allows you to work in your favorite editor from your preferred OS and see the changes reflected in your browser.
00:14:01
This service runs the audit log API, but we also have some Sidekick background jobs that make it easier to read our logs. We use a separate service to run those background jobs. Because the background jobs use the same models and databases as our application, we can use the same image.
00:14:28
We create the audit worker service but run the Sidekick command instead of the Rails server command. This way, we can start up the audit log service API, which runs the Rails server, while the audit worker service runs the background jobs.
00:14:50
The separation helps make development a little more accessible because the logs are separated. We can also test our background workers in complete isolation from the API.
00:15:17
We will create a few more services for our applications, including the authentication API, the configuration API, and some databases they utilize. When working on the configuration API, we don't want to start up any services that we don’t need.
00:15:35
So, we create a rack for developing the configuration API that contains only the services it needs: a MySQL database, the audit log service, and the configuration API services.
00:15:58
A rack can customize a service, and since we want access to those services running in the rack for development, we expose ports for a few services to the host OS. This allows us to connect to those ports from our browser.
00:16:14
You can also set environment variables or mount different files to change the behavior of the service. Devly allows you to configure multiple racks. The authentication team can work on its services, which include the PostgreSQL database and the authentication API. The authentication development rack also uses the audit log service, just like the configuration team.
00:16:39
When we start these racks, they use independent containers to run their services. This allows teams to have different configurations and software versions for the audit log service that won’t collide with each other.
00:17:06
For example, you can start both racks simultaneously and isolate bugs across services using common configurations. Replicating services across teams makes sharing your work easier.
00:17:24
The configuration for the images, the services, and the racks are managed in the shared Devly library repository at Fastly. We allow any developer to make changes to the Devly library and discuss those proposed changes with the developers of that service.
00:17:40
The authentication configuration and audit API teams all have racks that utilize the audit service. When the audit dev team proposes changes to the audit log service, all those teams need to be able to discuss them.
00:18:02
By tracking the connections between teams and services through the Devly library repository, these connections become more visible, which improves the maintainability of services and communication across teams.
00:18:19
Now that we've had an overview of the components of Devly and how they combine, I'll show some demos of common development tasks using Devly from the perspective of developers in various teams.
00:18:36
We have gone through some workflows, like getting started with development, sharing changes within and across teams, and setting up some convenience tools to make development easier for ourselves and our coworkers.
00:18:56
Let's start by setting up Devly as a first-time user. We run `devly setup` and give Devly a git repository to pull the Devly library from. The system loads the library repository and the other repositories for our services, along with checking out the repositories.
00:19:18
The setup performs some additional checks, including the Docker version and your Google SDK version. The setup command will try to fix things it can or will provide you with a message to help you fix it if it cannot.
00:19:41
This step takes no more than a few minutes to fetch your repositories and perform the necessary checks. Once setup is complete, we can run `devly info` to see what racks and services are available.
00:19:56
Devly provides a list of the racks and services in our Devly library. We can retrieve information about a rack, including the services it starts, and we can also retrieve information about a service, which includes the image, the repository, and metadata for ensuring that the image is compatible with the files in the repository.
00:20:19
Now that we have completed setting up Devly, let's start the rack and perform some basic development tasks, like viewing logs using our service and making a small change.
00:20:39
The `devly up` command starts the rack. First, we ensure that we have all the necessary images downloaded from the registry. We see Devly pulling one of those images. Once all those images are downloaded, Devly creates a network to isolate this rack and starts all the containers.
00:20:59
When containers are dependent upon each other, Devly can serve them in parallel to speed up startup. Now let's check to see if everything is running okay. We can run `devly status` to see which racks and services are currently running.
00:21:20
We can see that the two API services and the database are running, and we can see that the two API services are accessible from the host OS on ports 8888 and 9999.
00:21:41
We can view the logs for the rack by running `devly logs`. This command will continue to follow any new logs until we exit with Control+C. Since everything seems to have started up correctly, let's test out the configuration service by switching to the browser.
00:22:10
The configuration service was running on port 8888, so when we load it, we see the main page with the configuration API. Now we are triple sure that the rack is working!
00:22:30
Let’s switch back to the terminal and view the logs from this request. The logs show our HTTP requests from viewing the configuration API main page. Everything is definitely working, so let's do some work by opening up the main page in our favorite editor.
00:22:51
We open the source for the page from the host OS and add some text. This is an example service. Since no one has ever figured out how to exit Vim, we simply save the file.
00:23:13
Let’s switch back to our browser to see if our change worked. Reloading the configuration API shows the text 'This is an example service' has appeared—our change was successful!
00:23:30
Let's check the log to see if our request wasn't a fake. The request from the refreshed page appears in the logs, and since this has a 200 response with a different page size, we have definitely loaded new content.
00:23:46
Now that we are done with our work, let’s shut down the rack by running `devly down`. This stops all the containers we had running, which might save me a little bit of CPU power. But normally, our services at Fastly are lightweight, even when a rack has a dozen services running.
00:24:07
When we work within teams, we will be pushing and pulling changes to our repositories. Furthermore, when we work across teams with Devly, other teams will push images for their services once they have a new set of features ready.
00:24:30
For this workflow, the audit log team has updated the audit log image to add a source field for events, and we need our services to utilize this new feature. First, let's see if we already have the source field by loading the audit log service in the browser.
00:24:53
We see the user ID, the timestamp, and the action fields, but no source field. We are using an old audit log image, so let's switch to the terminal and update our image.
00:25:14
Now that we have verified that we don’t have the source field, let’s pull the latest image. Sorry about the lack of output—it’s a bug. Our running audit log service is still using the old image, so we need to shut it down and start a new one with the updated image.
00:25:37
We can do this with `devly restart`, which will replace our audit log service with a new one running the latest image. Let's switch back to our browser to see the updates.
00:25:58
Refreshing the audit logs service in the browser shows that the source field has appeared, just as we expected. Now that our audit log service is running the latest image, we can continue updating our service to incorporate the source field.
00:26:18
So far, we’ve worked outside the container. Sometimes we need to run commands from inside the container, where all of our dependencies are loaded. Let’s pretend that we are on the audit log team again.
00:26:39
We are now tasked with adding that source column to our database, and to do this, we need to run the migration we just finished writing. We can't do this from the host OS because none of our application gems are available on the host.
00:26:57
Instead, they are only installed inside the container. So, let’s run the migration from inside the audit log service. First, we'll start by viewing the audit log homepage.
00:27:14
Of course, we still don't see the source field because we haven't run the migration yet. We can use `devly exec` to run commands inside the container.
00:27:34
Since we don’t remember the exact image layout, we can start a Bash shell to explore. After the shell is open, we check to ensure we're in the right place by running `rake -T`.
00:27:52
Then we can run the migrations. The migrations show that they added the source column, so let's go back to the browser to check that.
00:28:07
Reloading the page in the browser shows that the source column migration is complete. Now that we've identified where the rake tasks run, let’s run the command directly so we can utilize our shell history in case we need to roll back.
00:28:25
Let’s switch back to the terminal and run our migration using `devly exec` along with a complete rake command line. This is much better, but it's still a bit cumbersome for this one task.
00:28:46
When we share this work with our other team members, how will they remember how to run the migrations? What we have done is not very user-friendly, and it would be nice if the migrations ran automatically when we started a rack.
00:29:03
We can automate the process of running migrations at rack startup using post-build tasks. These post-build tasks execute after the rack starts to perform additional setup tasks, like running migrations or seeding data.
00:29:23
This allows users unfamiliar with a service to get to work immediately. Post-build tasks are defined in the Devly library and are built as rake tasks that `devly up` runs.
00:29:43
The task is named the same as the service, and it runs for the specific rack argument, allowing it to run migrations for the correct service, even if multiple copies are running in multiple racks at the same time.
00:30:04
We can execute `devly exec` on the audit log service just like we did earlier to run migrations. After shutting down the rack, we can start it up again with `devly up`, going through all the steps we saw before from the rack startup section.
00:30:25
At the bottom, we can see that `devly exec` ran our migrations along with the output. Now, whenever someone starts our service, the migrations will run automatically, eliminating the need for users to remember or look up what to do.
00:30:45
Of course, we still need to run migrations during development for the next time we want to change the database schema. Shutting down and restarting the rack takes several seconds.
00:31:09
To make this process easier, we can save the long `devly exec` migration command as an easy-to-remember custom command name. We can save commands in a Devly YAML file.
00:31:25
These commands can live in the Devly YAML file for the repository we are working from, here being the audit log repository. Each repository can have its own YAML configuration file with custom commands.
00:31:41
Let’s zoom in and look closer: the save commands are a collection of friendly command names that we want to run. I chose 'auto migrate' as the name of the command, which will run the migration command on the audit log service.
00:32:01
The command line for the auto migrate command is the one we've seen earlier, which runs the database migration tasks. We can also define a test command that runs the tests inside the service.
00:32:21
This way, anyone can run the tests right where all the dependencies are found.
00:32:41
Now we can run `devly run migrate` from the audit log directory, and we see the migrations run. Similarly, we can run the tests. Since these tests are run inside a container that is part of the rack, they can communicate with the other services.
00:32:59
You can have separate racks where one is configured to run unit tests that don't rely on other services, and a larger rack configured to run integration tests.
00:33:12
Any of these test suites could be started with the saved command. Sometimes we need to work on a service together with another team, and Devly provides a workflow for cross-team development.
00:33:32
The audit log team is working on some new high-security features. Their work isn't complete yet, but they want our feedback before they finalize something too difficult to use or integrate.
00:33:53
To provide feedback, we need to build with their work-in-progress branch. We were told that if we went to the audit log page, we would be using the correct code. If a high-security logo appeared, we would know.
00:34:12
We visit the page but see the same one as usual—no high-security logos anywhere. So we need to switch to their branch. The high-security branch may have new dependencies that our image doesn’t have.
00:34:35
We can’t mount the updated branch on top of our existing image because the updated gems and code won’t be there. We need to build a new image to ensure everything works.
00:34:55
To build the new image from the high-security branch, we first tell Devly to use our repository using the `devly link` command. This allows us to build an image from the correct branch.
00:35:15
Now we can change to our repository copy and check out the high-security branch. Then we use `devly build` to create a new image for the audit log service.
00:35:36
Now that our new image is built, we can restart the audit log service. We use `devly restart`, just like we did when we pulled the audit log image that had the source field.
00:36:02
Now, when we reload the browser, we can see we're using the high-security branch because the high-security logo is present. We can now begin testing integrations with the new code.
00:36:25
To give feedback to the audit log team on the high-security features, as adoption increases, we want to centralize image building through continuous integration, ensuring we always have up-to-date images in our registry.
00:36:48
By running your tests through Devly, you establish a more consistent environment because the image, service, and rack are all built and configured the same way. Both continuous integration and local development environments have their regularly set up instances.
00:37:14
In a CI environment, the setup may take too long as it performs more checks and retrieves all the Devly library history. The CI mode for Devly setup reduces the history and repository saved to fetch in order to save time.
00:37:33
The CI environment has a repository checked out to the correct commit already, so we can use `devly link` to utilize the correct source files, overwriting existing files.
00:37:51
As before, the new code we’re testing may have new dependencies, so we need to build a new image just like we did with the high-security branch.
00:38:10
Finally, we start the correct rack for testing this service, and then everything will be ready to run tests, just as in our local environment.
00:38:29
The next thing we do is run our tests using the saved command we saw earlier. If the tests pass and we’re on the master branch, we can push our new config API image to the registry to share it with the other teams.
00:38:50
Adopting Devly has given us a common way to start more and more of our services. Once sufficient teams are using Devly, you can build upon this capability.
00:39:07
All the workflow demos I showed were for Ruby applications, but the process is no different for developing a Go application, which runs a compiled binary within an image.
00:39:30
With a Go application, you add the source code, create a new image with `devly build`, and test it with the saved test command. This approach makes the development process more accessible.
00:39:47
With a common way to start services and run tests, you can go beyond testing a single service. The services you build can be composed into larger racks.
00:40:03
The more services you have in a rack, the closer to a real deployment you come, and the easier it is to run integration tests or end-to-end tests across your services.
00:40:19
By ensuring your images, services, and racks are reliable at every level, you can more easily move your containers toward deployment.
00:40:36
The image serves as the base of all your services and makes the contents of any application accessible to various security scanners. This allows you to run internal compliance processes.
00:40:53
You can also perform vulnerability scans of the libraries you're using in your images or find issues through static analysis. You can conduct enhanced testing, such as fuzzing, where bizarre inputs are sent to your service to try to break it.
00:41:12
You can get started with chaos engineering in a stable, isolated environment. Additionally, you can build separate staging environments for groups of services to run integration tests.
00:41:29
Zeke will now share with us a few lessons we’ve learned while building and collaborating on Devly with our coworkers.
00:41:47
Finding the early adopter group is key. We were fortunate to have a truly diverse group of early adopters with varying levels of experience who were willing to provide us with constructive criticism early on.
00:42:01
Some early adopters had previous container experience and were able to offer us valuable feedback on containerization and orchestration strategies, which has been really helpful.
00:42:18
We also had early adopters who were relatively new to the company and had little to no prior experience. All our early adopters contributed to making Devly a better product.
00:42:32
Because of the invaluable feedback from our early adopters, we learned the importance of building and sustaining a supportive community within the company from early on in the process.
00:42:48
We talked openly about our plans, successes, and especially our failures. Acknowledging the mistakes we made and learning from them together has been vital.
00:43:05
Currently, the Devly library repository has 30 contributors, including people from many different areas of Fastly's engineering, which is quite impressive.
00:43:23
All of this fosters a sense of shared ownership and togetherness, which has been crucial in the development and adoption of Devly.
00:43:38
Establishing feedback loops within this community has proven to be essential. The more people feel heard, the more likely they are to speak up, ask questions, and provide feedback.
00:43:56
That's ultimately what we're trying to do—encourage more communication among our teams. It's interesting that as software engineers, we're building tools in a realm that feels like coding and requests.
00:44:12
One of the primary means we’ve worked towards establishing these feedback loops is by quickly acknowledging, discussing, and ticketing bugs that our users find.
00:44:28
When we fix these issues, we notify the people who reported them, often involving them in the review process to ensure that the solutions effectively resolve their problems.
00:44:47
We emphasize the importance of documentation as well. Good documentation is essential for getting started and reducing friction, and it absolutely needs to be maintained.
00:45:05
One impactful strategy we implemented early on was providing separate targeted documentation for each operating system so that people could find accurate information specific to their platforms.
00:45:25
Combining this with efficient package management allows people to install necessary packages and get a functional environment within a relatively short timeframe.
00:45:44
Our goal was to decrease the time taken to set up a new development environment from several days to 15 minutes, and we have been achieving that for a few months now, which is quite phenomenal.
00:46:04
By building this community, we want our users to feel free to update the documentation if they wish to. We encourage participation but do not mandate it.
00:46:21
Our objective is to have shared resources that everyone can use to learn from and teach their peers. Another significant aspect of documentation involves outlining administrative tasks and release processes.
00:46:39
When everything is documented, especially regarding packaging, users benefit greatly, as these administrative tasks can be complex.
00:46:56
We learned that automating processes for releasing new versions quickly saves a lot of time. Ensuring that our QA automation for cross-platform compatibility is robust since this presents its own challenges.
00:47:13
Despite the Devly CLI having a high test coverage, bugs and state-specific issues continually surface. Thus, enhancing our automation—especially around the release process—has proven to be a significant time-saving measure.
00:47:32
We have also started extending both the tooling and test harnesses to check for common issues related to rack schemas.
00:47:52
This makes it significantly easier for people to write for racks and conduct some automated testing to ensure that all the services are correctly configured and performing as intended.
00:48:07
We regret to inform you that Devly is not yet open source. Supporting our users at Fastly and preparing this conference has not allowed us the time to prepare Devly for open-source release.
00:48:28
However, we are very close. Please keep an eye on our Twitter account at Fastly for updates, as well as our blog.
00:48:45
Once again, we are hiring, so if you’re interested, feel free to talk with us afterward. We’re both very nice!
00:49:00
We’d like to thank our reviewers for helping us prepare this talk, and a big thank you to the users who have provided us feedback along the way.
00:49:21
We appreciate your time, and are happy to address any questions regarding managing configuration for services. The Devly library contains information on how application code gets shared and is a reflection of how it will perform in production.
00:49:38
If there are any other questions, please feel free to reach out!