Talks

Practical Unix for Ruby & Rails

Practical Unix for Ruby & Rails

by Ylan Segal

In the talk "Practical Unix for Ruby & Rails" presented by Ylan Segal at LA RubyConf 2015, the speaker emphasizes the power of the Unix command-line interface and its applicability to enhancing productivity in Ruby and Rails development. He discusses key Unix concepts and illustrates how to utilize various Unix commands effectively to handle common programming tasks.

Key points include:

- Definition of Unix: Segal clarifies the distinction between 'Unix' (the original operating system) and 'unix' (Unix-like systems that adhere to POSIX standards).

- Unix Philosophy: He promotes the philosophy of building small, sharp tools that perform specific functions efficiently. This approach encourages composability, modular design, and clear coding practices.

- Using the Shell: Bash, the primary shell in Unix systems, not only serves as a command-line interpreter but also as a programming environment.

- Examples of Commands: Segal presents practical examples such as using the 'tail' command to view the end of a file efficiently, and 'grep' for searching file patterns while using pipes to filter outputs among commands.

- Understanding Processes: The talk explains how Unix processes work and the importance of standard input, output, and error, showcasing how outputs can serve as inputs for subsequent commands, resulting in powerful command chaining.

- Analyzing Logs: He walks through a scenario using Unix commands to analyze Rails logs, counting endpoint requests, and applying commands like 'sort', 'uniq', and 'sed' to normalize and summarize data.

- Search and Automation Tools: The talk highlights how tools such as 'ag' (the silver searcher) and 'xargs' can enhance productivity, especially for searching code and executing commands within Rails projects.

- Tips for Efficient Workflow: Segal suggests using tools like Spork and Zeus to speed up testing workflows, alongside shell scripting to automate repetitive tasks effectively.

Overall, Segal concludes by encouraging developers to explore and utilize Unix's capabilities to not only improve their command-line efficiency but also draw parallels to effective software design principles. He recommends further resources, including insights from other command-line experts. This talk serves as both an introduction to Unix tools for Ruby/Rails developers and a motivational piece to inspire the optimization of their development processes.

00:00:23.670 Hi, everyone. Thank you to the organizers for inviting me; I appreciate it. My name is Ylan Segal, and I'm from San Diego, but I originally hail from Mexico City. I've been using Ruby since 2009, mostly working with Java before that.
00:00:34.750 In this talk, I'm going to deliver a few examples of how I use the Unix environment to accomplish day-to-day programming tasks. I'll briefly discuss how Unix processes work and how they are integrated into the operating system. However, it won't be comprehensive, as I have only 35 minutes. My goal is to spark your interest and show you the kinds of things that are possible with Unix, hoping to help you be more efficient in your day-to-day programming.
00:00:53.530 A little background: there are discussions online about what Unix is, often indicating the distinction between uppercase and lowercase. Typically, when they write 'Unix' in uppercase, it refers to the original operating system developed in the 70s at Bell Labs and later by AT&T. In lowercase, 'unix' or sometimes referred to as 'Unix-like' or other monikers, is any system that implements POSIX standards and essentially behaves like Unix. POSIX stands for Portable Operating System Interface, which defines the APIs that all Unix-like systems comply with. These standards emerged in the 80s. For instance, BSD, which powers this Mac, is fully compliant, Linux is mostly compliant, but Windows is not. However, there's some hope with Windows Subsystem for Linux, which allows for a Unix-like shell environment on Windows.
00:01:40.750 The Unix philosophy has led to many principles associated with it, such as the idea of 'small, sharp tools.' This philosophy often promotes the notion of doing one thing and doing it well. One of the aspects I appreciate the most about these restrictions is the expectation that the output of one program can be the input of another program, even if it’s something you may not know about. This notion directly influences how you format your outputs. Overall, the Unix philosophy represents a set of cultural norms and a philosophical approach to developing small but capable software.
00:02:15.400 It emphasizes writing simple, clear, modular, and extensible code that can be easily maintained by the original developers, and importantly, can be reused by other developers or the same developers for greater effect. I just received a sticker stating 'Not only, but also,' which captures how the Unix philosophy favors composability over monolithic design.
00:02:43.300 How do we interact with Unix? The primary way is through a shell, specifically Bash. Bash is a shell that you will find in any Unix system, readily available on your Mac or Linux system. It serves not only as a shell but also as a programming environment. It's very universal and powerful, although a bit quirky. If you use Windows, you can work with the Windows Subsystem for Linux or set up a virtual machine to run a Unix-like environment.
00:03:05.750 Let’s start with an example. One of my first commands is called 'tail,' which displays the last lines of a file. When you use the tail command, it will only output the last few lines without reading the entire file. For instance, if you have a 100 megabyte file, using tail will be relatively efficient because it does not need to read the whole file. You can specify the number of lines you'd like to see. Typically, the default is around 20 to 50 lines, which is efficient compared to opening a large file in an editor that may crash.
00:03:52.320 For example, when starting a Rails console, you usually see the output. You can achieve the same result by tailing the log file and passing the -f switch to it. This flag allows tail to wait for additional information, meaning it will not exit until you instruct it to. As more requests come into the Rails app, you will continue to see updated output. To get more information about this or other commands I will discuss, you can use the Unix command 'man,' which provides manual information. Interestingly, you can call 'man' on itself, which adds a meta aspect to it.
00:04:21.650 Suppose we use the Rails log to get a list of requests made to the server. The Rails output is quite verbose, giving us information about which layouts were used and how long they took, along with various other metrics. Sometimes, we might not need all that information, so we can use 'grep,' which is a file pattern searcher, to filter the output. You'll output only the lines that match a specific regular expression, with all other lines being excluded. The 'man' command can also be piped to 'grep' to search the manual itself, showing that the output of one command can seamlessly feed into another.
00:05:40.220 In this case, using 'grep' allows us to narrow our focus, and we can specify a command that we want to be filtered down to. For example, if we're capturing lines that start with a certain capital letter, all other lines will be thrown out. Now, let’s explore Unix processes. A Unix process is, in the simplest terms, an instance of a running program. When we invoked tail and grep, both are instances of a Unix process.
00:06:55.590 Every Unix process has three standard streams: standard input, standard output, and standard error. Standard input is where data typically is fed into a program, which by default is the keyboard. Most Unix processes, unless directed otherwise, will take input from the keyboard. Standard output is where the program writes data; it’s expected to be printed to the terminal that initiated the command entry. Standard error also functions similarly, but it is typically reserved for error messages and can be directed to different outputs.
00:07:35.200 In Unix, everything is a file, which means both standard inputs and outputs can be redirected. You can connect a file to standard input, standard output, or standard error, providing immense flexibility. For instance, by using the pipe operator, we can send the output from tail to grep, redirecting the standard output of one command into the input of another. This means that we can compose commands dynamically, creating more complex operations from simple ones. Interestingly, this method of connecting processes via pipes leads to highly efficient workflows since processes are usually initiated concurrently by the operating system.
00:08:54.420 By buffering data flow under the hood, the operating system can handle various speeds of input. For example, if the first command retrieves data quickly, but the following command processes it slowly, the operating system buffers the extra data as it awaits processing, ensuring everything runs smoothly. The efficiency is enhanced because many programs process data line-by-line, ideally suited for handling text files.
00:09:34.090 Continuing with our earlier development log example, let’s say we want to know how many times each endpoint has been hit. To extract that information, we can utilize several command-line utilities available in Unix. Starting with our log, instead of tailing it, we will directly work with filtering the input. The grep command will help us extract just the part of the line we’re interested in. Regular expressions allow us to filter down even further, though they can be complex, I’ll keep it straightforward for this example.
00:10:21.230 After using grep, we usually end up with a list of endpoints, some of which may be repeated. Our next task is to count how many times each endpoint is hit. We need to sort the list first. The 'sort' command sorts the output lexically, meaning that zero will go before one, followed by letters and so forth. It has options for numeric sorting, but in our case, we just use the default lexically. Once sorted, we will use another command called 'uniq,' which will filter out the repeated lines. This way, we will end up with a clean list of unique endpoints that we can analyze further.
00:11:50.610 Now, let’s analyze these requests to glean some statistics. Let's assume that we have endpoints like /token/1 and /token/10 that we would like to categorize. To accomplish this, we need to normalize our output. This is where the 'sed' command comes into play; sed is a stream editor that modifies information line-by-line. We will use sed to substitute specific patterns in our output.
00:12:29.700 In this case, we can categorize any endpoint with a user ID that has more than one digit, replacing it with '/user-ID' so that both /user/1 and /user/10 become /user-ID. Once normalized, we’ll go back to using 'sort' and 'uniq' again—it’s important to ensure that the inputs are sorted before passing them to uniq, or we won’t filter out duplicates effectively. Then, we can include a -c option with uniq to receive a count of how many times each endpoint appears.
00:13:18.980 By following these steps, we gain insight into our endpoints and see how frequently they have been hit. Now, we can reverse sort the output by hit count, showing the most accessed endpoints at the top. This gives us a clear overview of our API performance.
00:14:07.780 Through this combination of commands—grep, sort, uniq, and sed—we effectively demonstrate the powerful capability of Unix for text processing. This reminds me of MapReduce, where you start with a large log, filter the lines, manipulate them to retain only the necessary portions, and normalize and count those for a final result. This approach resonates well with Ruby as well. You could compare it to starting with an array of lines, selecting them, and then reducing them to end up with the results.
00:15:05.880 What’s fascinating about Unix processes is that they run concurrently. For example, when I’ve executed multiple commands like these, each process is initiated simultaneously by the operating system instead of sequentially waiting for one command to finish before starting the next. This allows for significant efficiency gains, especially given that the standard output of one command seamlessly becomes the input for another.
00:15:48.430 With the above example of processes piping data, it’s important to understand that the output buffers managed by the operating system come into play as well. It allows commands fetching data from different sources, be it from the internet or a file, to work together efficiently. Each individual program doesn't need to manage the buffering, allowing for a massive improvement in performance.
00:16:24.420 Let’s proceed to additional examples. One common scenario is searching through files. When you want to refactor a method or determine which classes refer to another class, 'grep' is handy. Yet, this can be slow with large code bases, so alternatives like 'ack' or 'ag' (also known as the silver searcher) are faster. 'Ag' is written in C, making it notably quicker than 'grep' and 'ack.'
00:17:05.500 For instance, if we recently upgraded a gem and want to find which specs are utilizing it, we can use 'ag' to review all files in the current directory and subdirectories. 'Ag' will highlight each line of code with the relevant line number, making it straightforward to navigate back to your editor and review those lines.
00:17:56.210 But running all those specs manually is impractical. Instead, we can streamline the process by piping the results obtained from 'ag' to 'grep,' checking for spec files specifically. For instance, if we only want specs with names starting with 'spec' and ending with '.rb,' we can use 'grep' to filter that output. The output gives us a concise list of relevant specs.
00:18:57.430 To execute these specs collectively, we can take advantage of 'xargs,' which constructs argument lists and executes them in a single command. We’ll pass the list of spec files to 'rspec' through 'xargs,' allowing us to run all scripts in one go. This is especially useful for managing complex commands with multiple outputs.
00:19:40.120 Now for a slightly personal example: I work on various projects that share dependencies but can differ on the Ruby version or the server framework. Sometimes a project uses MRI, and at other times JRuby. Running specs on teams with slow startup times becomes a challenge. To alleviate this, you could benefit from tools like spork. Spork loads your application and maintains a warm server for faster testing. It can streamline your testing process significantly.
00:20:28.790 Another option is Zeus, which operates similarly to Spork but doesn't require modifications to your Gemfile. It runs your tests without the long startup time, taking up memory but optimizing test performance. There are many options to optimize your workflow with Ruby, given the diversity of methods available.
00:21:35.950 Using these tools, we can develop a script that checks for specific settings. You might decide to initiate tests based on whether Spork is running or detect specific ports. If the expected port is closed, your script can execute the appropriate test command. By scripting the setup, you can avoid cumbersome reconfiguration every time.
00:22:40.050 To accomplish this, you could write shell scripts where you check for certain conditions, validating whether specific services are running, and structuring your calls to rspec effectively. An exit code serves as a convenient check for successful executions and helps navigate the complexities involved in automating test scenarios. Over time, this minimization of context switching saves significant time.
00:23:50.350 As a general approach to Unix, this series of commands harnesses simple processes to automate workflows and saves considerable time while improving efficiency. When you have a mix of projects or classes, being able to quickly identify which specs to run saves you many detours. It's about making the tools work for you.
00:24:38.780 I want to emphasize the versatility of the system. Everyone employs Unix differently – some use Vim, others prefer Atom. The essence lies in leveraging the powerful operating system to serve your specific development needs.
00:25:25.030 Unix offers us a collection of small programs aimed at enhancing our development tasks. This creative combination resonates well with the philosophy of composability in code design. By treating module functionalities like small components akin to those we build in Ruby and Rails, we move further ahead in mastering our environments.
00:26:20.320 In conclusion, from collaborative coding principles with small, reusable components to mastering command-line efficiency, the lessons from Unix extend far beyond the command line. It mirrors principles in software design, encouraging our focus on healthy behaviors in composing codes. My hope is that you might be inspired to write some scripts of your own and explore ways to optimize how you work.
00:27:43.030 Thank you so much for your time today. I hope you found the material helpful and thought-provoking. If you have any questions or thoughts you’d like to share, I’m happy to engage.
00:28:56.300 I also want to recommend the short videos by Gary Bernhardt, who offers incredible insights into command-line prowess that might wow you.
00:29:15.500 If you’re interested in staying connected, I’ll have my slides and contact information on my blog and GitHub soon. Thank you all once again!
00:30:05.970 I appreciate this opportunity to speak and look forward to hearing your feedback. Thank you!