Automating Legacy Desktop Applications with JRuby and Sikuli

by Chris Cha

In this RubyConf 2021 presentation, Chris Cha discusses the automation of legacy desktop applications using JRuby and Sikuli. He begins by providing an overview of the complex landscape of legacy desktop applications, which often pose significant challenges for automation and modernization. The talk primarily details the journey of setting up a Windows virtual machine (VM) to automate testing of a Windows-only application through a series of practical steps and technical methods.

Key points discussed include:

Development Environment Setup: Chris shares his development environment setup involving Vim, Sikuli IDE, and VirtualBox, alongside a Windows 10 Pro image to facilitate remote access and testing.
Utilizing Vagrant: He explains how Vagrant is used for provisioning the Windows VM environment and managing its configuration files efficiently, ensuring isolated changes reduce complexity.
Java Environment Configuration: Chris details the procedure for installing and verifying the Java Development Kit (JDK) within the VM, which is crucial for running JRuby and Sikuli.
Network Drive Mapping: The speaker explains the importance of mapping network drives and the steps taken to automate this mapping using PowerShell and scheduled tasks, facilitating the retrieval of the legacy application.
Headless Automation Strategy: He elaborates on the approach towards headless automation with a focus on the configuration and scripting needed to not only launch the application but also execute tests effectively.
Testing with Sikuli: The integration of Sikuli for UI testing is discussed, where Cha showcases screenshots being captured and tests being executed using JRuby.
Modular Testing Setup: The focus shifts towards refactoring the proof of concept into modular components, creating a maintainable structure for tests using Minitest, which mimics behavior-driven development (BDD) principles.
Conclusion and Takeaways: By successfully demonstrating the setup and testing of a Java Swing application on a Windows VM with integrated JRuby and Sikuli, Chris emphasizes the value of open-source tools in modernizing legacy systems while improving automation workflow.

Overall, the talk conveys the significant technical challenges and strategies involved in automating legacy applications, showcasing real methodologies that can be replicated in similar scenarios.

00:00:10.480 Okay, we'll get started.

00:00:12.320 Uh, hello everyone! Welcome to RubyConf 2021. Yeah, I'm Chris Cha. I'm a full stack developer working in test automation and DevOps at The Container Store, which specializes in organizing solutions like custom closets and the joy of tidying up your space.

00:00:17.840 Yes, we still use containers but also cloud and orchestration technologies to define a scalable, highly available future. Join us!

00:00:38.239 The title of my talk today is "Automating Legacy Desktop Applications with JRuby and Sikuli."

00:00:44.320 Here's an example of my development environment: I have Vim to edit text, Sikuli IDE to manage screenshots, xFreeRDP to remote into the Windows VM, and multiple terminals to run Vagrant and other tasks within the Windows VM. I use Command Prompt to run individual tests with JRuby.

00:01:03.680 Our goal is to automate a very battle-tested legacy application. To start, I couldn't just run Windows; I'm on a Mac. The app is Windows only, so my go-to for VMs is VirtualBox. Vagrant provides an easy way to export from VirtualBox.

00:01:12.799 I found a Windows 10 Pro image from Microsoft and made changes to help with remoting. If I could run JRuby with Sikuli, I could win. At this point, it was all only imaginative strategy. My ambition was to learn Vim, Vagrant, Sikuli, and JRuby, plus Java, all at once, and I assumed it would integrate in the end.

00:01:39.280 Here is the transform script. It takes every ed file, transforms the directory, and applies it to the Vagrant file. I wanted to be able to start from scratch at any time, create a new folder, and be able to get to the current state. This forced me to keep changes isolated and, in retrospect, helped manage complexity that Vagrant would have otherwise been littered with, including commented half-experiments and distractions over syntax from previous attempts.

00:01:55.439 From earlier research, I knew xFreeRDP would let me connect to Windows machines. I just needed to figure out how to do that with a Windows VM instead of a remote host. Fortunately, many others had solved this problem. Our first transform looks for the line 'config.vm.network' and appends content to it. It almost looks like a code block. RuboCop always reminds me to put a new line at the end, which is important to do that with ed files.

00:02:27.599 I knew the app in question pulled in its own JDK. I just needed to bootstrap the app and adopt OpenJDK, which had the easiest way to install. So, we'll call this the "Is this even going to work?" step. If anything, at least I had a Vagrant file for a nice Windows VM, and JRuby needed Java.

00:02:46.160 I could use Command Prompt to poke around inside the system. Here’s the batch script we used to set up Java: we download, extract, set a path, and done! This is the usual route for most binaries. I avoided using a package manager for now since I didn’t want any issues there to complicate attempts.

00:03:02.319 I really like PowerShell to extend batch scripting capability. Since Vagrant prints things to the console, we could invoke commands within the VM OS context and get useful results back. Here, we invoke Java and ask for its version. This verifies the path is set, and JRuby will know where to get its Java, and all will be well. More challenges await.

00:03:50.399 My second transform is a sanity check just to make sure PS1 files work so PowerShell executes as well in case we need it later. But also a way to append to the end of the Vagrant file instead of the middle. We pretty much use this approach to add on the rest of the changes.

00:04:21.120 Here is the transform to the Vagrant file to layer JDK installation on top. Every change, big or small, is run through a pattern of a transform and a script. Here we just layer on the Java verification script where we check the Java version.

00:04:39.360 Still, we need our app to test; it's not available from the VM. We have to retrieve it from somewhere. Our source is a mapped network drive, so we need to map the network drive. We figured out Java first since that was easier, but now we dig further into Windows. We need an app to run.

00:05:04.000 It might be nice to share the Vagrant dev loop we established. We retrieve a public image, apply transforms, and then start it up with Vagrant. We reboot it and then try to remote in with xFreeRDP. Now we have a Java development environment, but it can become so much more.

00:05:34.720 We need a strategy to map a network drive correctly every time. We can do this with a scheduled task. We use the XML version by exporting a basic task because this is the only way to create a task with all conditions disabled. For example, on a laptop that is not plugged in, the scheduled task would not run because it would detect the server our machine was on batteries.

00:05:58.560 The crucial piece here is PowerShell supports here-doc strings so we can paste the XML directly. Here's the rest of the scheduled task XML and script: we create the scheduled task with PowerShell using the XML content. After Vagrant reloads, we should have a mapped drive.

00:06:14.080 All this to support one batch script but a useful pattern for the future. There is more to it. I guess we make sure to delete our temporary XML file. In order to map the drive, we need credentials. These can be collected at Vagrant build time as a user prompt.

00:06:44.720 Now that we can acquire the app, we have to figure out how to run the darn thing. Each app is different and unique in its own way. In our case, the app uses Java Swing with multiple windows and a separate launcher dialog. The app was distributed as a JNLP file opened as a text file. We could retrieve the startup arguments and try to reproduce it ourselves.

00:07:05.920 We take the args and pass them to Java hoping for the best—fingers crossed. Fortunately, it worked. To reliably map the network drive on startup, we will always remap it, remove the mapping, and then remap it. Those magic arguments will be used forever with the right choices in the exported XML schedule task.

00:07:20.000 We’ll never have to worry about drive mapping again, at least for this VM. Oh, I forgot to mention we are creating a startup script that will be run at startup, not actually mapping the drive at Vagrant runtime.

00:07:43.680 Here we map the drive. We also list the files to verify we have indeed mapped the drive; otherwise we can't get the app. Some minor refactoring has been done here to help the devs hint at the format of credentials.

00:08:03.680 All of these steps have been to support eventual automation, which means we need to run things headless. We try a brief foray into Linux with XVFB, a virtual display frame buffer. Without headless, the automation test would always need to run on a user's machine.

00:08:36.479 Here we set up the scheduled task to start up the app on VM startup like kiosk mode, with the rest of the script where Windows launches the app when we remote in. At this point, OS provisioning is complete with the app launching on startup. We can test with Sikuli.

00:09:14.640 Our first test is to take a screenshot since that will prove XVFB can see something and that Sikuli can run with our installed version of Java. JRuby is used here because executing Sikuli IDE needs a version of JRuby.jar that resides next to it. In a future slide, we’ll install JRuby from scratch and use Sikuli solely from code.

00:09:34.640 Note that we are again creating this script that will launch on startup, not actually executing the script. Remoting should launch the app and take a screenshot. Later, we have the app launch on connect, not just at startup. The JRuby used here is somehow already available.

00:09:54.320 The relevant Ruby files will now be covered to take a screenshot. We use the native Java imaging library and Sikuli's capture method to save an image of the screen. We employ an instance of Sikuli's screen and region objects to get our bounding box, capture that, and write the bytes to a file.

00:10:09.760 We also fix a bug in this commit where we forgot to add a quit command to the .ed file. The Ruby script just invokes our screenshot module to send the capture message, specifying a file name.

00:10:29.919 We are finally at JRuby. The text approach allows us to define tests as text and organize a software architecture around a programming language. Here we install JRuby from the internet and specify the Java home environment variable.

00:10:55.840 The script to install JRuby continued by setting the path after we find the latest JRuby installation directory. As before, we use .ed files to transform the Vagrant file with a script to execute and the JRuby setup and verification steps.

00:11:13.680 This commit allows java.exe to pass through Windows Firewall. We also need to fix an issue where the app starts up only the second time and subsequent times, and never the first. Here is the script to allow Java through the firewall and the transform to specify it to run.

00:11:50.320 We're starting to develop a workflow: we'll use the Sikuli IDE to manage images, the VM and JRuby to run tests, and xFreeRDP to remote in and take screenshots of UI components. Sikuli wraps OpenCV which uses Tesseract, which requires the Visual C++ Redistributable, so we install that.

00:12:09.440 Here's the scheduled task to run on login, which can launch the app, run tests, or anything needed. Continuing from previous work, we copy to a common startup folder. Subsequent runs will trigger from a scheduled task. We use both methods to achieve the desired kiosk behavior, where the app always launches.

00:12:29.920 The Sikuli IDE will render screenshots as images, but they are stored as text. Ruby lets us use strings as expressions, allowing us to have a hierarchy where child UI elements are beneath a parent string. The indentation is just a visual convention.

00:12:56.360 As we are still in the proof of concept phase, much of the code is procedural. We have very minimal boilerplate; however, we are just importing Java, specifying a path for jars to Sikuli’s import statements, and setting verbose debugging. After that, we minimize the JRuby window and get to automation.

00:13:20.320 First, we tell Sikuli about our global app path and then we sleep. Now we can start clicking and typing things from this mess. We can break out the code into modules, classes, and concerns. From there, we can start to incorporate tests.

00:13:40.160 We start and stop screen recording as a test and then run config_pos.rb. This batch script will run on login. We begin to break out environment and config code into its own module. We want this repo to potentially host screenshots of different apps to automate, but they all live under a common bundle path.

00:14:10.480 The same applies here, but with two transforms: one to install the Visual C++ Redistributable and the other to schedule a startup task. We take a breather and try something different: screen recording.

00:14:40.800 Now that we have automation in place, we can see if a video can be recorded. First, we’ll install VLC.

00:15:10.560 Here is the invocation to start and stop screen recording. The latter is done with ncat. A lot of options are passed to VLC from the IDE file and the executable. It looks like we bundled VLC in the repo to save a download.

00:15:39.040 VLC turned out to be too laggy for headless recording, and XVFB renders Swing buttons in a strange way, so we need to look for alternatives. That's a future task.

00:16:00.880 In this commit, we fix the app not launching the first time. We reuse our existing script in the startup folder. The bits admin changes here supposedly make the JRuby download faster, but we replace it later with a much faster alternative.

00:16:46.120 In the screenshot below, we copy a script to the startup folder, but it takes forever for bits admin to get its magic going, so we directly HTTP get the thing. Otherwise, downloading Java and JRuby files can timeout.

00:17:07.280 Here, we change out Java 11 for Java 8 and also use PowerShell's Invoke-WebRequest to fetch Java faster. The Java downgrade is due to the Ruby Maven gem not yet supporting the newer version, so Sikuli still works, so no big deal.

00:17:29.920 We also simplify the process from polling bits admin to just HTTP fetching JRuby directly. Evidently, the dash version was updated to the more conformant dash flag syntax.

00:17:51.680 Okay, now we can address gems, which means bundler as well as our internal config tool and linter. Once this project graduates to have additional contributors, we'll need a consistent coding style and to use existing conventions.

00:18:13.920 Here's what the gem file looks like. Of course, it's a script driven by an ed transform in order to get gems installed and the transform itself.

00:18:29.520 By now, it should be very familiar. Now that we have a Windows VM and a running app that we can interact with using JRuby and Sikuli, we can start writing tests. Minitest comes with Ruby, so we'll use that.

00:18:51.920 We install diffutils, although for code diffs I used Visual Studio Code from the local machine before making commits. We had the Minitest gem in the gem file and a minor nitpick to use a no-spaces install directory for VLC and a bit of research to tell VLC to use a new install directory.

00:19:12.560 And here’s the edit script to set up the install. Next, we will refactor into modules and tests.

00:19:42.560 With the proof of concept working, we start modularizing into components. The tests are UI-centered, much like test automation in the web world with Selenium. We will abstract interactions with Swing from user intent using the page object pattern.

00:19:57.920 Here I'm just updating the docs for dev startup and setting a Ruby opt environment variable for Minitest. There's also a minor refactoring here to more reliably select defaults in the point of sale launcher dialog.

00:20:25.120 We split the Sikuli API setup and a helper module. The boilerplate from the proof of concept to initialize Sikuli is moved into a sikuli_env.rb file, and the helper responds to the minimize window message.

00:20:46.640 In Sikuli test_environment.rb, we import useful Sikuli libraries like debug, set the debug level, and introduce classes and methods to help with initializing the screenshot images path. This is where we store pictures of buttons, similar to how Selenium defines DOM elements through xpath.

00:21:06.000 Here we specify that the images representing point of sale UI elements, like buttons, dialogues, etc., belong in a path off of the global image root path. The star.sikuli folder becomes a subtype, a specialization of a common parent class that represents the parent folder.

00:21:40.000 We also see the components class, which represents the generic parent of UI components. Some private methods here help with the methods in the previous slide, and we see the POS component subclass near the bottom, which is a specialization of components.

00:22:07.680 In Ruby, the ease of thinking so fluidly of object hierarchies with the less than operator helps realize the idea of object specialization to provide specific behavior. More of the POS components class in the next slide.

00:22:29.440 With all the setup code from before, POS components just inherit from components, and its constructor specifies the POS app. That’s all the child needs to do is delegate to the parent.

00:22:52.800 Once the type of POS app is known, it’s kind of hard-coding but seems okay since we're setting up a known environment. Introducing Minitest, we want to test our setup code and learn Minitest. Despite these tests, a bug will be fixed later.

00:23:08.720 But hey, we’re in a Windows VM from macOS running JRuby and Sikuli! The next bit is to connect that with testing the actual application. We introduce a spec helper.rb as well, which imports the needed libraries. This assumes JRuby is invoked with a library path.

00:23:29.760 Here’s a checkpoint command. These are fun when you’re grasping for the next set of features and end up roping in a bunch of changes that you later need to summarize. It’s nebulous work like this, and I’m happy that it has turned out to just be a linear sequence of added lines.

00:23:59.280 GitHub shows just addition after addition rather than switching between changes.

00:24:20.800 We’re adding to our image repository now; that is our UI elements repository with more screenshots. We're getting closer to login and the data setup needed.

00:24:45.840 Here we have configuration starting to be stored in a YAML file, with derived values determined at the top, like using the specified default if not specified from the environment.

00:25:09.680 We also see a new Sikuli component module, which itself has a component class and accepts the hash from Sikuli test environment. We are now logging in.

00:25:37.760 Since we automated the mapping between image name and image path, the key is now the same as an xpath in Selenium, and we’re navigating a ‘quote-unquote’ DOM tree. Except instead of a web page, it’s the desktop, and we assert that the image of the UI component is present somewhere on it.

00:26:06.000 Here's the bug fix: there was an off-by-one directory level. It wasn't to 'slash app' but the base bundle path itself, which seemed to create a redundancy in classes, but I might have let it be.

00:26:39.920 Now we're starting on the road to automating the app. The setup we saw in Minitest is realized in the 'before do' block. This turns out to be costly, starting up the app for each single test, but it's progress.

00:27:01.840 Here we see the change to use the app after all. Maybe the bug fix previously was to generalize the library as we are closer to the runtime state, and the test would know more about initialization than the library.

00:27:18.200 We bulk up spec_helper.rb with our in-house config YAML parser, too. We had a requirement early on not to use Cucumber, but the design of the classes mirrors an organization toward natural language.

00:27:40.080 We do not end up needing this Sekuli feature module, as feature classes can be inherited from Minitest. A minor refactoring here ensures that named components do not have to initialize themselves but defer to the parent.

00:28:03.000 Now we're just left with intent calls, simply named and possibly various private helper methods. Now that we can log in, we can start automation on a basic workflow in point of sale.

00:28:28.720 Here’s the Sikuli feature module that we use with Minitest’s feature inheritance instead.

00:28:54.560 Here we are just starting out on the boilerplate for the initial test. Here is a test where we want to try clicking all the buttons and ensure they act as expected. The difference here is to log out from the main window instead of from a dialog.

00:29:16.320 Updating our spec_helper.rb to include the unused feature module.

00:29:39.000 Second interlude: there are some pull requests that never got merged. Those are from trying out the state machine model sample pattern, but I kept getting stack overflow in my state machines.

00:29:59.280 Getting a Ruby version of his rocket launch example is a definite to-do for me, but here we just start adding tests. The framework is set up now for us to add a lot of tests.

00:30:24.800 Some notes to self in the form of a README—how to exit from java.exe. We’re removing the screenshot function, as by this point the app launches reliably enough to debug other things.

00:30:45.920 More work with VLC, but there are some quirks. The XVFB shows the UI with exaggerated Swing buttons so the screen caps we took of the components are not recognized. We end up having to hack around that and headless is tabled for the future.

00:31:16.400 If anything, at this point, we’re pretty much writing page objects like in Selenium. Reused constants that would be XPaths are image names. In this example, 'checkbox no devices' constant is used more than once, so we dry it out and add it up top.

00:31:41.520 Here's the proof of concept code: the messy stuff we had before has been moved into components and methods. Here we use runtime exceptions as pre and post conditions, having methods fail fast. This has been something I brought into my Selenium web testing as those assertions live in the methods as contracts.

00:32:00.720 We find that the specs don’t always have to check for them. In Cucumber, the then steps can be shorter and reflect user capability instead of specific UI elements. Borrowing from Node.js, we use an index.rb file to define the UI components, which are themselves separate classes.

00:32:22.440 Did you all know a separate file that specifies an existing module automatically gets included for namespace purposes? That’s nifty and keeps each file smaller.

00:32:42.800 We have a process in point of sale called 'take', and the UI elements we used are defined as constants. Each public method uses the primitives we have built up or borrowed from Sikuli, including click, type, wait for, and click find.

00:33:05.120 Private methods for tracking state, like line item count, are managed through messages like increment line item count. The page object is 'take order'. The interface reflects intent, like adding and accepting, and the consumer never directly clicks or types anything.

00:33:31.280 Here are some more public methods from 'take order' and service to tests and expectations.

00:34:03.560 Continuing with the methods to use in tests before we look at the private methods.

00:34:21.520 We just have only a couple of private methods to track line item count state and another component for the top bar UI as well, just for logging out. We just click the logout button, which is the image of the logout button. Once recognized, it sends a click.

00:34:41.760 Here are the helper methods used in components defined in Sikuli component.rb.

00:35:06.560 Here is a simpler component, at least to flesh out one of the dialogues. I remember this dialog has several more buttons.

00:35:29.920 Oh, and the screenshot capability gets moved into a helper method. Thanks to Ruby's ability to execute shell commands, we can encapsulate screen recording controls into this helper.rb module.

00:35:55.920 In Sikuli test_environment.rb, we also add logging and some more bug fixes. We also see the uppercase I live flag, so our require statements don’t need to all be relative.

00:36:30.440 In future work, it would be nice to provide a Rails dashboard that could adjust on login.batch to run one or more tests.

00:36:52.120 In Minitest, there is not a before all concept, but the internet had a solution: just call them before the 'before do' block.

00:37:02.480 We see the initial steps to start up the app are pretty expensive, so we only want to log in once. Then, every test per scenario can be done in an authenticated context.

00:37:37.920 Here I'm refactoring placeholder code with actions. In addition, I’m adding more tests.

00:37:57.600 So now we’re starting to see the benefits of the interface. The tests nearly read like step definitions.

00:38:30.840 Finally, adjustments to the spec helper to include those new components.

00:38:54.560 With that, we have successfully shown a Windows VM running our Java Swing app, integrated JRuby with Sikuli, and used Minitest to verify behavior. Not a bad run with free and open source software.

00:39:34.720 Thank you!