wroc_love.rb 2012

Fear of adding processes

This video was recorded on http://wrocloverb.com. You should follow us at https://twitter.com/wrocloverb. See you next year!

In object-oriented programming, there is a well-known anti-pattern called 'Fear of Adding Classes'. It describes the fear of solving a problem by adding another class because of the (often wrongfully) perceived added complexity. With systems moving towards a distributed nature through the usage of external services, a similar pattern can be seen: the fear of adding dedicated components, mostly independent processes to the system, because of the fear of added management overhead through doing so.

wroc_love.rb 2012

00:00:13.120 As introduced in the introduction, the topic of this talk is the 'Fear of Adding Processes'. A friend of mine misheard that as the 'Fear of Adding Process' at first. What I'm going to discuss is the fear of adding programs to your system, not about the fear of adding workflows to your company.
00:00:24.000 Who am I? I'm a Ruby programmer since 2004. I founded my own company about two years ago, and I'm mainly committed to back-end work. I'm not doing any front-end development anymore, and I primarily work for my company and Pedrino. Before starting to work with Pedrino, I used Rails from 0.14 to 2.0 and switched before 3.0 came out, so I don’t have an opinion on that.
00:00:53.520 Additionally, I participate in running one of the largest German Ruby bulletin boards. Large for a bulletin board nowadays is about 100 users. If anyone is interested in how we run it or in Pedrino, I hope to hold an open space session about it tomorrow. Please contact me on Twitter if you would like to discuss this.
00:01:20.640 One of the ideas behind this talk is inspired by one of the greatest talks I've ever heard at a Ruby conference: 'The Fear of Programming' by Nathaniel Talbot. Has anyone seen that talk? If not, I recommend you watch it afterward. It's a long talk about not doing things because of an uneasy feeling, even though you think it's the right move.
00:01:54.799 There's another concept described in the venerable c2 wiki, known as the 'Fear of Adding Classes' in object-oriented systems. It’s basically the same phenomenon: you could solve a problem by adding another component to the system, but you hesitate because it feels like extra work. You have to add a file, and you don’t know how it works, so you refrain from doing it.
00:02:26.079 The 'Fear of Adding Processes' mirrors this sentiment. It’s the belief that introducing another process into your system will make it more complicated and harder to manage. This isn’t always the case; sometimes it might be beneficial. This talk addresses these fears, exploring when it’s appropriate to overcome them.
00:02:50.000 I often make jokes about CTOs, particularly those who follow trends blindly. Throughout this conference, there have been several proposals about distributed systems. It’s common to see people enter a discussion room, eager to implement a system using multiple small services, a database, and a caching layer without giving it due consideration.
00:03:17.680 But what they often ignore is the level of unease that can accompany such implementations. If you are creating a complex system and do not feel some measure of unease about certain components, that is itself a problem. We often fear unknown grounds, and there are valid reasons for that, but it doesn’t mean we shouldn’t explore them.
00:03:56.239 To tackle this uneasiness, we should reflect on how to resolve it. Is there a technical or other solution available? In programming, we often use code reviews for support. However, when building distributed systems, we need different tools.
00:04:10.639 Let’s delve into specific fears. The first fear I would like to discuss is: 'I think this is a good way to implement it, but I’m not sure if my reasons are valid.' To illustrate this, let me provide a small example of the type of systems we are discussing.
00:04:46.800 For instance, one of our clients imports and re-encodes videos from a constantly changing number of sources. These sources include partners delivering content daily, with contracts being broken and re-offered regularly. Consequently, they need to encode the videos to use them on their platform.
00:05:07.040 The basic implementation would be straightforward, with one process handling the import and another for encoding, running sequentially. If we enhanced this with JRuby, we could utilize queue classes from Java to improve efficiency. Rather than a single program, we could create three programs: one for importing, another for encoding, and a third one to manage the communications between them.
00:06:10.000 Although that would require more effort upfront, it results in significant operational benefits. The key question is: what are those benefits? First, if there are relatively few videos submitted daily—let’s say around 400—each being about five minutes long, it's reasonable to process them manually. However, handling failures is critical.
00:06:37.680 Over the past year, we haven’t faced any failures during the encoding process, but the importer has experienced many failures. Interestingly, despite frequent deployments of the importer, the encoding process has remained constant. This differentiation allows us to deploy specific portions of the system as necessary, which is a significant operational advantage.
00:07:06.240 So, here's my first guideline: if you are considering distributed systems, do not prioritize performance initially. That's a common error. Instead, focus on the operational advantages: does the split-up system allow you to bring down or bring up individual components without harming the overall operation?
00:07:41.600 Some other common examples include having a homepage that markets your product, managed by a designer, and the underlying product is separate. For example, Harvest has a distinct website that is different from the product itself. Essentially, whenever you have disparate software components with widely diverging needs or requirements, it's wise to consider a distributed approach.
00:08:00.800 The second fear I want to address is the notion that implementing a distributed system requires a lot of work. It’s true; it does entail significant setup. However, we can mitigate this by establishing a unified development environment.
00:08:24.240 It can be challenging to achieve unity across the team. For instance, a CTO might decide everyone should use a Mac, but hundreds of euros later, nothing is unified because developers enjoy tinkering with their individual setups. Different environments lead to inconsistencies, and selecting the correct setup ensures that everyone remains on the same page.
00:08:54.800 One effective solution is to use Vagrant, an incredibly helpful tool. It can help manage Virtual Machines (VMs) easily. With Vagrant, you can automate the creation of VMs, allowing a consistent environment for all developers. This method enables you to replicate your production setup, fundamental to testing and developing applications.
00:09:20.000 Automating as much as possible helps ensure that the development environment remains consistent. Additionally, integrating configuration management tools like Puppet or Chef in your Vagrant setup allows seamless operation throughout your development process.
00:09:45.760 Many teams have a repository encoding how each environment operates, including development environments and all relevant developer toolchains. When new developers join your team, they should simply be able to clone the repository and run a single command like 'vagrant up' to set up their local environment. This minimizes onboarding friction and fosters uniformity.
00:10:10.560 Once everything is properly configured, it’s important that all changes to different environments are managed through code review processes to ensure quality and consistency. Having new developers helps validate whether the system works as intended.
00:10:35.680 It's crucial to document everything. The final point related to development environments is: if anything fails during the setup process, developers should have a way to judge if their changes have impacted the environment correctly. If they haven't, the team should have practices to ensure you process changes smoothly.
00:10:58.560 Rarely do developers keep the same tools; IDE compatibility, plugins, and preferences vary greatly, so cultivating a shared environment can be tricky. Yet if tools fail developers consistently, it obstructs workflow. Everyone must be on the same side to ensure collaborative progress.
00:11:22.680 The third fear I want to clarify is introducing complexity on an architectural level that could become overwhelming. While having more programs aids manageability, managing these interactions becomes vital. Employ desktop logging so teams can track and configure parameters efficiently to prevent excessive complications.
00:11:37.920 Whenever possible, centralize what can be centralized for a cleaner architecture. For instance, managing logs and metrics centrally rather than hiding in isolated configurations can impact the whole team's effectiveness.
00:11:54.960 Utilizing tools such as Syslog streamlines everything considerably. You integrate logging to catch errors without disturbing the development workflow. This way, you have a centralized method to handle logs without switching contexts.
00:12:17.600 Invest time in ensuring your inter-process communication is well-supported within your system. It better prepares your team for any failures. When discrepancies occur, developers shouldn't have to scramble for testing; a dedicated command line tool simplifies the task.
00:12:48.640 It’s critical to provide straightforward accessible scripts to mimic backend API calls to promote effectiveness. A well-structured script can make a world of difference in minimizing frustration, enabling developers to focus on developing and solving issues.
00:13:05.760 Using a command-line parser is vital for properly managing arguments. Proper documentation ensures developers understand how to implement systems according to specifications.
00:13:33.840 One last useful Ruby tool is Foreman, a process manager resembling Heroku, which allows you to control all necessary services. It creates a unified logging system for smoother operations and reduced overload when running multiple programs. This approach mitigates complexity, enhancing the developer's ability to monitor applications effectively.
00:14:02.080 As the number of services increases, code ownership becomes more critical. Most programmers may not invest in others’ projects unless given responsibility. Thus, assigning specific ownership helps ensure that all code remains clean and concise, particularly as projects scale.
00:14:39.840 Finally, let’s touch on common pitfalls when implementing a distributed architecture. The first point to consider is to avoid following trends for the sake of it. Don’t launch systems simply because they seem appealing; instead, weigh the managerial burden they introduce.
00:15:11.680 Disorganization can lead to severe complications, especially as team members rely on countless small parts. Maintaining structure across your projects is essential. Failure to do so can lead to recurring issues that can deride long-term sustainability.
00:15:40.480 Another frequent issue is the direct rewriting of distributed systems. You may wish to take a monolithic application and break it apart, which has never worked well. The smaller parts can then lead to an incomprehensible mixture of services that result in the collapse of what remains.
00:16:11.680 We’ve discussed some of the fears and issues, but do you have any questions? I’d like to know more about your experiences and how you manage to tackle these challenges.
00:16:35.680 One individual interjected a law: 'Do not distribute.' This raises an interesting point on duplication; while it can be beneficial, many teams often fail to assess their platform needs before taking the plunge and distributing.
00:16:58.560 Despite their drawbacks, appropriately applied distributed systems can yield positives. Workers can effectively handle their isolated tasks without worrying about the complete structure, much like the worker systems built into Rails. However, it’s worth ensuring a well-planned strategy accompanies any decision to pursue this architecture.
00:17:20.560 Ultimately, the distribution can be advantageous, but it is essential to evaluate such choices critically. Thank you for your attention, and I welcome your thoughts.
00:17:33.840 Additionally, if there are any lingering questions, feel free to approach me later. Let’s aim to enhance our programming world together.