Talks

Modern Headless Testing in XXII Century

Paris.rb Conf 2020

00:00:15 There are so many interesting gems and tools being built in Ruby by the community. Today, I am going to convince you to use one more.
00:00:21 We have numerous integration tests in our web application, and until recently, we were running them in Poltergeist, which uses PhantomJS underneath. There was a reason for that: PhantomJS gives you full control of the browser, allowing you to modify requests, responses, headers, and cookies.
00:00:32 It is headless and a perfect fit for crawling, but it uses a ten-year-old WebKit engine. You won't be able to run effective integration tests nowadays with such outdated technology.
00:00:42 As of now, you have three options: you can keep using Poltergeist, use Selenium, or employ a secret weapon that I will discuss in my presentation.
00:00:52 Poltergeist is great; believe me, I have been a maintainer for years, but it is outdated now and I urge you not to use it.
00:00:58 Selenium requires additional software installation, is slower than Poltergeist, and doesn't provide full control of the browser. However, Cuprite is a new gem that addresses all the shortcomings of previous drivers.
00:01:19 Cuprite is faster than Selenium, requires no additional software to install, and gives you complete control over the browser.
00:01:28 I want to begin by contrasting the UNIX philosophy with that of web startups. In the 1970s, when UNIX emerged, almost everything was headless; binaries you ran lacked visible UIs.
00:01:39 Debugging programs back then was incredibly challenging, and almost 50 years later, we still must test every single aspect of our applications to ensure they work correctly with any given input.
00:02:01 Today, software systems are even more complicated. The UNIX philosophy emphasizes programs that do one thing well, focusing on simplicity, whereas web startup philosophy is markedly different.
00:02:14 Startups can evolve in many ways, leading us to make choices based on trade-offs. There's no compilation, no static typing, and automatic memory management—these factors help us write readable code quickly, but they also make it easier to introduce errors.
00:02:26 We must develop and debug our integration tests swiftly, aiming for faster execution than before.
00:02:36 PhantomJS was the fastest browser for testing and was considered a truly headless browser. Its main downside was the inability to view the browser in action while tests executed.
00:02:50 However, you could still take screenshots to analyze results or check log files. If you're using a modern front-end stack, there's a high chance your site will break with PhantomJS.
00:03:02 The introduction of JavaScript ES6, Flexbox, and modern frameworks has made it necessary to adapt our tools. Essentially, two processes communicate through a WebSocket: Poltergeist establishes a PhantomJS server to which we can send commands.
00:03:24 In 2017, Google announced Chrome's support for headless mode right out of the box, which was a significant step forward for testers. Now, we can utilize Chrome for headless testing, write calls, check performance, and collect useful metrics.
00:03:55 Regardless of your feelings about Google, they control 70% of the market, and Chrome is performant and well-suited for our jobs.
00:04:03 Both headless and full Chrome are the same browser, operating without any visible windows if we run them with the headless flag.
00:04:14 By connecting to a designated port, we can begin communicating with the browser. For instance, if we make an HTTP request to the specified address, the most important line to note will be the WebSocket debugger URL.
00:04:25 All communication occurs over WebSocket, a protocol that is language-agnostic and can be implemented in various programming languages. We can send JSON data to the address, but for Chrome to understand our commands.
00:04:38 We must develop a protocol over WebSocket known as the Chrome DevTools Protocol (CDP). This protocol facilitates communication with the Chrome instance.
00:04:47 When we run Chrome and connect to the appropriate port, we can send and receive commands. It’s important to note that the browser can have multiple WebSocket connections—one for the browser itself and others for each specific page or tab.
00:05:08 This design allows us to operate each page independently. CDP is a simple protocol with many domains grouping commands by purpose. A command is structured as a simple JSON object containing a method name, parameters, and a unique ID.
00:05:42 The Chrome response to this command has the same ID along with some associated data. For instance, one domain is the network domain, which sends notifications including opened windows, dialog interruptions, and various network requests.
00:06:07 As we all know, Chrome's inspector can do a lot, and everything visible in the inspector can also be implemented using CDP. Now let’s switch our focus to the Ruby world.
00:06:40 Ferrum is a Ruby gem that controls Chrome via the WebSocket using the CDP protocol and provides a higher-level API. It requires no additional software; just combine Ruby with Chrome to achieve great results.
00:07:05 This gem is impressive because there’s no added complexity. Familiarity with the popular JavaScript tools will help you understand Ferrum, though it's not perfectly compatible.
00:07:27 The way Ferrum operates is simple: you launch the browser, create a new page, navigate to a site like example.com, take a screenshot, and close the browser.
00:07:41 This flow is much more concise in Ruby while executing the same tasks. Ferrum already supports many features, including navigation, screenshots, intercepting network traffic, and simulating real user interactions like mouse movements and keyboard events.
00:08:04 Beyond that, you can work with headers, cookies, and JavaScript dialogues. In fact, we are already using it in production for crawling purposes, employing techniques to mimic real users to bypass certain security measures.
00:08:20 Ferrum was built from the ground up to ensure full functionality and allows working with multiple open pages simultaneously using threads. Each page can be controlled through WebSocket connections.
00:08:37 The above code creates a context, a unique feature of Chrome, akin to incognito mode. Once closed, everything is discarded. You can create several contexts and pages in each to interact with them concurrently.
00:09:07 Now that we understand Ferrum, let's discuss Cuprite. Cuprite is a Capybara driver that utilizes Ferrum underneath, making Ferrum compatible with Capybara calls.
00:09:32 For instance, while Capybara has a `visit` method, Ferrum uses `go_to`, allowing Cuprite to serve as an adapter. Cuprite fulfills this role by providing higher-level operations.
00:09:50 Additionally, Ferrum operates at a higher level than what CDP provides, yet is lower level than Cuprite. Ferrum does not possess methods for filling input fields or clicking links directly.
00:10:10 These actions encompass multiple steps, as a real user would need to click or focus on an input field, clear any existing text, and then enter new data. In contrast, Capybara and Cuprite execute these tasks in just one method call like `fill_in`.
00:10:43 This functionality simplifies actions like `click_link`, which again consists of multiple steps: locating the link, checking its position, scrolling to it if necessary, and conducting the actual click event.
00:11:10 Cuprite is mostly compatible with Poltergeist, requiring minimal changes in your application. The only modifications needed in the spec helper may not be as straightforward for everyone.
00:11:34 When users experience issues after switching to Cuprite, they often seek help analyzing log files. The failing tests are typically not due to the new driver itself but rather issues within the tests.
00:12:07 Cuprite outperforms Selenium, making it more challenging to notice flaky tests once you transition to the new driver. Now, let’s discuss common mistakes people make while writing integration tests.
00:12:31 Firstly, what is nice about the integration process and why is it necessary? Imagine browsing to example.com, grabbing the HTML, and parsing the necessary tags to determine if further requests for CSS or JavaScript are needed.
00:12:58 The moment you recognize that all resources are loaded and JavaScript idles with no pending network connections is critical. However, this moment remains vague in the modern web context.
00:13:20 Capybara and JavaScript drivers face the same issue—they struggle to definitively state when the page is ready for interaction. The only option is guessing.
00:13:52 Synchronization is crucial; it helps determine if an element is present during a specific time frame. We must continually check conditions for a set duration before giving up or succeeding.
00:14:23 The key to reliable tests is using wait methods—this might come as a surprise to you. However, we must acknowledge that people frequently overlook the importance of waiting.
00:14:40 Some wait methods might not function correctly. For example, after a click, if the DOM does not change, you cannot proceed immediately because it would be premature.
00:14:55 We used to implement a wait for Ajax function within our code base, checking JavaScript with jQuery to verify if active connections were in progress. With Cuprite, we can use Ruby to wait for the network to become idle.
00:15:22 It’s essential to mitigate interference from animations within Chrome; ensure to disable them to prevent unexpected behaviors during test execution.
00:15:51 Another critical point is that too much time may be wasted on inefficient debugging of integration tests. Cuprite simplifies this process, allowing you to switch to a headful mode and add a slow mode flag.
00:16:20 By utilizing this flag, all CDP commands sent to Chrome are delayed, allowing you to witness your tests passing in real-time with adequate visual feedback.
00:16:52 Logs can appear chaotic at first glance, but becoming accustomed to them allows you to find and identify race conditions within your tests.
00:17:10 You can also incorporate debug statements anywhere in your tests, freezing execution, and opening Chrome with the inspector activated.
00:17:35 This feature enables you to view the state of your webpage at any time, enhancing your capacity to debug effectively.
00:17:59 To summarize, there are two processes communicating via WebSocket: one is the Chrome process that accepts connections and the other is the Ruby process that sends commands via CDP.
00:18:30 This architecture closely resembles the Poltergeist framework I described earlier. Now, let's explore how Selenium operates.
00:18:56 Selenium employs a protocol called WebDriver, which enables browser automation but requires initiating an external process for each browser you use.
00:19:12 Each browser requires its WebDriver; for example, Chrome has ChromeDriver while Firefox uses GeckoDriver. Instead of sending commands directly to the browser, you send HTTP requests to the WebDriver.
00:19:39 The WebDriver determines the steps necessary to execute the command and returns the results to the browser, further complicating the process.
00:19:58 The advantage of this approach is browser flexibility; however, it requires installing additional software for each browser you want to use.
00:20:20 This is the only benefit I can find, as the limited protocol restricts functionality across various browsers.
00:20:38 For instance, Selenium may not support specific features such as intercepting network traffic, which are available in Chrome.
00:21:01 The unidirectional nature of the HTTP protocol, versus WebSockets which are bi-directional, creates limitations in event streaming.
00:21:21 In contrast, Chrome's events can stream directly to you, allowing for efficient monitoring.
00:21:40 When comparing CDP with WebDriver, it becomes evident that CDP holds a future advantage due to its robust capabilities.
00:21:59 Not only does CDP surpass WebDriver, but with more browsers adopting Chromium as their foundation, it is likely that CDP will remain dominant going forward.
00:22:30 I understand that rewriting and changing drivers for existing projects can be a daunting task, but for new projects, I advise you to choose an appropriate driver.
00:22:45 Thank you!
00:23:31 Does anyone have questions?
00:24:00 Did you try to migrate a test suite from Selenium or WebDriver to Ferrum? What were the performance benefits?
00:24:25 Yes, we migrated from Poltergeist to Cuprite, and it was a relatively smooth switch. I also assisted a friend migrating from Selenium.
00:24:50 They encountered issues related to cookie and header handling, which is where I offered guidance. They had trouble due to insufficient wait methods.
00:25:05 Performance-wise, while my local tests with Poltergeist took about nine minutes, switching to Cuprite increased it slightly, yet only a few minutes on CI.
00:25:20 Unfortunately, I don't have exact numbers for the Selenium comparison, but a colleague noted it was notably faster by about five minutes.
00:26:00 Are there specific actions achievable in Selenium but not in Cuprite? Could you please clarify?
00:26:22 With Cuprite, we utilize it not just for headless testing, but also for web scraping, which necessitates bypassing certain security measures.
00:26:46 This requires JavaScript execution, heater modification, and cookie management—all of which Selenium struggles with unless third-party solutions are added.
00:27:05 For headers and cookies, Cuprite streamlines these processes directly through CDP, enhancing efficiency.
00:27:18 Thank you.
00:27:30 No more questions? Thank you, everyone.