Talks

Devly, a multi-service development environment

Writing a system alone is hard. Building many systems with many people is harder.
As our company has grown, we tried many approaches to user-friendly, shared development environments and learned what works and what doesn't. We incorporated what we learned into a tool called devly. Devly is used to develop products at Fastly that span many services written in different languages.

We learned that the design of our tools must be guided by how teams work and communicate. To respond to these needs, Devly allows self
-service, control, and safety so that developers can focus on their work.

RubyKaigi 2018 https://rubykaigi.org/2018/presentations/drbrain

RubyKaigi 2018

00:00:02.060 Hello, and welcome to Devly, a multi-service development environment. I'm Eric Hodel, an engineer in the Engineering Experience department at Fastly.
00:00:05.700 Today, I'll be describing Devly to you. I've written lots of Ruby code, some of which you use every day.
00:00:11.880 My co-presenter, Ezekiel Templin, has been writing software of various kinds for over 20 years. We currently work within Fastly's Engineering Experience organization, focusing on improving Fastly's internal engineering tools and processes.
00:00:19.470 Fastly is a content delivery network and edge cloud provider that serves traffic for GitHub, New Relic, Spotify, and many other popular websites and services. Additionally, Fastly provides free services for all Ruby and RubyGems downloads, as well as for many other open-source projects.
00:00:50.610 We have servers located worldwide, which handle more than 14 trillion internet requests each month. This accounts for over 10% of all internet traffic today.
00:01:03.840 Today, we'd like to discuss a problem that we believe impacts organizations of all sizes. To help illustrate this problem, we want to tell you about the evolution of Fastly's API.
00:01:29.751 This is a rough approximation of the service architecture that backed the Fastly API around 2012. The original Fastly development environment consisted of a copy of each component of the Fastly API running on each engineer's laptop.
00:01:55.719 Soon after, the early team decided that virtual machines should be employed to provide a degree of operational uniformity and parity between development and production. In the early days, all engineering work was being done by a very small group of people, allowing changes to systems to be easily introduced and distributed through source control, enabling rapid development and deployment.
00:02:34.310 The small size of the company allowed for focused discussions, making decisions easy to communicate. As the company grew and became successful, new opportunities opened up to expand the business by adding additional functionality to our API.
00:02:57.209 While adding new functionalities, we also added new supporting systems. As software engineers, whenever we introduced new functionality and dependencies to our systems, we also introduced complexity. We do not imply that complexity is necessarily a bad thing; rather, it's an unavoidable side effect of growth.
00:03:18.959 Another complicating factor arises from our desire to use the right tool for the job, leading to many services being written in entirely different languages within very different workflows.
00:03:26.850 Despite the increase in the number of languages and services, our development environment remained largely unchanged. The gaps between each group's development workflows also grew, which became increasingly problematic as our engineering department doubled in size every six months.
00:03:51.194 Our original development environment then became increasingly unreliable, and the established processes to communicate changes broke down. Maintaining any single engineer's environment became more time-consuming, leading to unnecessary conflict.
00:04:11.280 With engineers working on everything from code that runs in the Linux kernel to code that runs in the browser, the needs of each team in these different areas are dramatically different.
00:04:27.509 Our original development environment couldn't meet the needs of one team without compromising the needs of another. This growth continued without regard for the challenges of scaling software, which is inherently complicated, with many moving pieces.
00:04:33.239 As an industry, we've established and continue to improve strategies that can help us direct our time and energy toward effective development, largely due to our ability to observe software systems in isolation.
00:05:03.280 Organizations, on the other hand, are far more complex and much harder to observe systematically. However, by reflecting on our experiences and listening to our coworkers, we identified common themes and frustrations.
00:05:42.060 Here are some summaries of conversations that we have had regarding the experiences of employees in our company: "Here’s your laptop, we'll see you in two weeks when your development environment is running." Or "Does anyone know why the API gateway crashes in a loop? I updated my development environment and now nothing works!"
00:06:05.220 Another example: "I can't do my work today because I need to rebuild my development environment." Does anyone in the room have experience with this kind of issue, too? Quite a few of you do!
00:06:29.960 This is a common but unsustainable problem, growing increasingly untenable over time. Moreover, our friends and coworkers are becoming frustrated, which is hampering our overall productivity.
00:06:51.570 During the same period, many new tools like Docker Compose and Minikube were released, yet none met all of our developer needs. Through observation, research, and extensive discussions with our colleagues and peers, we identified important themes that contribute to an effective development environment.
00:07:16.210 These themes encapsulate desirable traits for developer-focused productivity tools. First and foremost, a development environment must be reliable—developers should be able to run a small number of commands to get what they need up and running without needing to know how every system works.
00:07:52.610 It's also crucial for developers to easily see the local health of systems they depend on; time spent rebuilding a development environment should be negligible.
00:08:25.580 The development environment must be accessible; engineers need to be allowed and encouraged to maintain their development environments collaboratively. Developers should be empowered to build and test new changes across systems owned by different teams seamlessly.
00:09:01.940 A development environment that spans multiple teams and workflows must be maintainable by the community of users. Managing changes in source control should illuminate past and present ownership, even if there are many components involved.
00:09:36.120 The structure and conventions required to manage these changes should be encouraged through documentation and feedback loops, rather than enforced by gatekeepers. We want to create a development environment that allows for running discrete services together in composable units.
00:10:09.270 Furthermore, it should be easy to try out new supporting systems. Finally, the development environment must be reproducible; we need the ability to apply the last known good state of all systems effectively.
00:10:32.520 Source control and similar mechanisms must help us understand how we achieved this good state. We should be able to leverage existing tools like Git, RubyGems, Perl, pip, and Go modules.
00:10:56.270 Throughout the rest of this talk, we hope to show you how we've started addressing the needs of our coworkers at Fastly by applying these themes to a tool we've been developing together for the past year, which we call Devly.
00:11:17.960 To tell you more about Devly, I'll hand things off to my friend, close collaborator, and lead engineer on the Devly project, Eric Hodel.
00:11:36.110 Thank you, Zeke. I will discuss Devly and some of its components and features. Devly is designed for developers and builds images from repositories. It uses those images to manage containers and facilitates communication within and across teams of developers.
00:12:05.350 Devly is distributed for macOS and Linux, providing a standalone executable built with Ruby Packer and offering packages for macOS and Debian. Devly helps configure all of your services and aids in building images from your repositories using Docker files.
00:12:25.410 You can configure these images to run as services and even run groups of services together as part of a rack. Each image contains the necessary files to run a service.
00:12:43.890 For instance, the audit log image uses Ruby and includes our application code, which requires libraries like Rails, Sidekiq, and a JSON parser. The JSON parser relies on a C library, which we install via the OS package system.
00:13:16.290 In our repository, there’s a Dockerfile with the instructions for building this image. These images can contain applications in any language. For example, our Stats service, written in Go, has an image that includes a Go binary compiled from its application code.
00:13:50.620 Similarly, the web application our customers use is written in Ember and contains a copy of the application code ready to run. We share all these images across teams by uploading and downloading them from Google Container Registry, ensuring we're consistently using images built from the latest source code.
00:14:20.800 A Devly service is a runtime configuration for an image. For instance, we've created the Audit Log service using the Audit Log image. The service runs a command because the Audit Log service provides an API for managing event data.
00:14:53.920 It runs a Rails server to provide the HTTP interface for events, which must be accessible to other services for reading and writing event data. To facilitate communication, we expose port 8888.
00:15:20.900 If you use a framework like Rails that supports live development, you can mount your repository on top of the files in the image. This allows you to work in your preferred editor from your preferred OS.
00:15:46.200 You can change a file on your host OS and see the changes reflected in your browser. The service runs the Audit Log API, and we also have background jobs in Sidekiq to help process our logs.
00:16:07.480 To handle these background tasks, we can separate the service dedicated to running Sidekiq jobs, allowing these jobs to access the same models and databases used by our application.
00:16:40.260 Thus, we have the Audit Workers service, running the Sidekiq command instead of the Rails server command, while the Audit Log service runs the Rails server. This separation allows us to manage development more effectively.
00:17:05.570 Next, we’ll create additional services for our applications, such as the Authentication API, the Configuration API, and their respective databases. When working on the Configuration API, we don’t want to start up unnecessary services.
00:17:29.300 For the Authentication API, we can create a rack specifically for developing the Configuration API, containing only the services it needs, including a MySQL database, the Audit Log and Configuration API services.
00:17:51.560 To access services running in the rack for development, we can expose ports for a few services to the host OS. This setup allows us to connect to these ports from the browser.
00:18:07.390 We can also set environment variables and mount different files to change a service’s behavior. Additionally, Devly allows the configuration of multiple racks.
00:18:29.960 The Authentication team needs to work on its services, including the Postgres database and the Authentication API, while the Configuration team works with the same Audit Log service.
00:18:56.760 When we start these racks, we use independent containers to run their services, enabling each team to have different configurations and software versions that don’t conflict with one another.
00:19:15.920 For instance, we could start both racks simultaneously to isolate bugs impacting multiple services using common configurations, making sharing and work across teams easier.
00:19:41.150 The configuration for the images and services—along with their respective racks—are stored in the Devly library repository at Fastly, allowing any developer to contribute changes and discuss proposed alterations with their respective service teams.
00:20:10.270 All teams relying on the Audit Log service will need to collaborate whenever updates or changes are proposed, facilitating communication and improving service maintainability.
00:20:35.280 Now that we have discussed the components of Devly and their functionality, let's move on to some demonstrations of its common development tasks from the perspective of developers across various teams.
00:20:59.460 We'll start by setting up Devly as a first-time user. We run 'devly setup' and provide the repository to pull from the Devly library. This process downloads the Devly library along with other repositories required for our services.
00:21:24.570 The setup also includes checking our Docker version and Google Cloud SDK version. The command tries to fix any potential issues or provides messages to guide user resolutions.
00:21:51.860 Setting up usually takes just a few minutes to fetch necessary repositories and carry out essential checks. Upon completion, we can run 'devly info' to view available racks and services.
00:22:22.500 This command will yield a list of racks and services in our Devly library. Information about a specific rack, including its services, can also be retrieved easily.
00:22:48.890 Once we complete the Devly setup, we can start a rack and carry out basic development tasks, such as viewing logs and making small changes. Using the 'devly up' command starts the rack.
00:23:15.680 If certain images are not yet downloaded from the registry, we utilize 'devly pull' to fetch them. Once all images are downloaded to Docker on our host system, Devly creates a network to isolate the rack and starts all the containers.
00:23:38.920 When containers aren’t dependent on one another, Devly can start them in parallel, expediting the startup process. We can check the status of our racks and services by running 'devly status'.
00:24:01.560 After confirming that everything is functioning correctly, we can view the logs for the rack by executing the command 'devly logs.', allowing for insight into live log data.
00:24:29.660 Once we're sure the rack is operational, we can navigate to the Configuration service through the browser, verifying that the service is accessible on its designated port.
00:24:50.390 After ensuring our rack is set up and running smoothly, we can make changes to the main page of our application using our preferred editor. For instance, adding text to provide information about the service.
00:25:08.880 Once we save our changes, reloading the Configuration API will show the new content has been successfully loaded.
00:25:38.660 After making sure that our changes were successful, we can shut down the rack by using 'devly down', which stops all active containers that were running.
00:26:06.270 When working within our team, we’ll continuously push changes to our repository. When collaborating across teams using Devly, other teams will submit new images for their services as they have developed new features.
00:26:29.460 For example, the audit team has updated the audit log image to include a source field for events. First, we can verify if the source field is present by loading the Audit Logs service in the browser. If the source field is absent, it indicates that we are still running an outdated image.
00:27:00.390 To utilize this new feature, we’ll need to pull the latest image. If the currently running Audit Log service is outdated, we can use the 'devly restart' command to replace the existing Audit Log service with an updated one.
00:27:28.840 Once we check the browser again, we should see that the source field has appeared. With the Audit Logs service now running the latest image, we can proceed to update our services to leverage this new source field.
00:27:50.780 So far, we’ve mostly worked from outside of the container. Sometimes, though, we need to run commands from inside the container where our dependencies are loaded.
00:28:12.300 For better clarity, let's say we’re back on the Audit Log team. We’re tasked with adding a source column to our database, which requires running a migration we've just created.
00:28:31.560 To do this, we’ll need to execute the migration command from within the Audit Log service. We can start a Bash shell inside the container to explore and execute our tasks.
00:29:03.640 After navigating to the appropriate directory, we check our setup and then run the migrations. Once we verify that the migration is successful, we can check back in the browser to see if the new source column appears.
00:29:30.240 With the migration complete, we can run tasks more efficiently and ensure that everyone remembers how to update the database consistently.
00:29:47.150 In practice, we aim to document the steps and commands that every engineer can utilize as a clear reference. By streamlining our process, we can enhance our workflows.
00:30:11.270 As we build the community around Devly, we focus on keeping our documentation thorough and accessible. We acknowledge suggestions and put forth effort to resolve any reported issues quickly.
00:30:43.370 This promotes stability, making it essential to document everything—particularly getting started guides, workflows, and any administrative tasks that need clarity.
00:31:11.810 Documentation can help reduce friction and ensures all users feel empowered to contribute updates whenever necessary. As we have seen, automation of our release processes saves time and helps maintain consistent operations.
00:31:41.080 In conclusion, building a supportive and engaged community is essential for the collective success of tools like Devly. Thank you for your time, and we hope you find Devly beneficial as you develop your applications.