Euruko 2017

Lightning Talks Day 2

Lightning Talks Day 2

by Pilar Andrea Huidobro Peltier, Jake, Quentin Godfroy, Mehdi Lahmam B., and Ana María Martínez Gómez

The video "Lightning Talks Day 2" from Europko 2017 features a series of presentations focused on the use of Ruby in various technology applications. The talks cover topics ranging from Optical Character Recognition (OCR) to performance improvements in Ruby and property testing.

Key Points Discussed:

  • Optical Character Recognition (OCR) Using Ruby:

    • Vinod, the first speaker, passionately discusses OCR and its relation to computer vision, defining OCR and emphasizing its role in making digital information accessible.
    • He introduces Tesseract, an open-source OCR engine, highlighting its functionality and showing how it can be integrated with Ruby to digitize and analyze text from images.
    • Vinod encourages the Ruby community to embrace OCR, advocating its use despite some skepticism about Ruby's performance.
  • Performance Improvements in Ruby:

    • Jake follows with insights on reducing memory usage by updating to Ruby 2.4, which is around 10% more efficient.
    • He discusses technical strategies such as using alternative memory allocators, garbage collection optimizations, and future improvements aimed at optimizing Ruby’s concurrency.
  • European Rail Ticketing Systems:

    • The next speaker outlines the historical challenges of European rail ticketing and the role of Trainline in simplifying the process across multiple countries.
    • They describe Trainline's architecture, built on Ruby with a Rails app for backend operations, and how they manage workflows using middleware to handle requests efficiently.
  • Property Testing in Ruby:

    • Anna concludes the session by discussing property testing as an advanced form of testing that uses randomly generated data to validate code behavior, rather than relying solely on fixed test cases.
    • She suggests several libraries that facilitate property testing in Ruby, enhancing the thoroughness of testing practices and leading to more robust applications.

Conclusions:

The talks collectively celebrate Ruby's versatility across different areas of technology, from OCR and performance optimization to innovative testing methodologies. Each speaker provides valuable insights for developers on using Ruby effectively in modern programming environments, encouraging attendees to explore these applications further. The community aspect of events like Euruko is highlighted, fostering collaboration and shared knowledge among Ruby enthusiasts.

00:00:09.600 Hi everyone! I'm going to talk about something I'm personally very passionate about, which is doing Optical Character Recognition (OCR) and computer vision using Ruby. Last time you saw me on stage, you might have inferred a bit about who I am, but just for those who didn't get to know me earlier, my name is Vinod. So yeah, let’s get started!
00:00:36.100 To make the talk a little simpler, I'll be speaking as Fiona, my dog, and for the sake of this talk, her favorite toy, which is very much Ruby. How many of you are familiar with OCR? Are you good at it, or do you find it kind of weird?
00:00:59.949 It's good to see some familiar faces! For those of you who aren't familiar with it, OCR stands for Optical Character Recognition. Computer vision, on the other hand, is about giving machines an understanding of what you've picked up with OCR or of the analog world. To clarify, computer vision is not the same as OCR, but OCR can certainly be used in computer vision.
00:01:14.460 So, why would we want to do this? The digital world is appealing; it's eco-friendly and immensely accessible. For example, you could scan a music sheet and have a program like Sonic play it back, which is both fun and cool.
00:01:40.299 Now, before you give me that skeptical look about using Ruby for OCR, let me remind you that Ruby is great! We all love Ruby, and that’s why we’re here, right? If you recall from the first day, there was an important point made about Ruby's capabilities. To get started with OCR, you need an OCR engine or program. The one I use is Tesseract, which is open-source. You run a command in the shell that looks something like this: you call Tesseract, provide the file path, set the languages, specify the output format you want, and give it the configuration.
00:02:13.800 Tesseract picks up text from images, and while it may not look impressive initially, it provides a bunch of data. You can then use this data in your Ruby codebase, save it to a variable, translate it into HTML, and parse it. This enables the program to recognize structured data, such as dates.
00:02:54.930 You might be wondering, again, why do OCR with Ruby? I get asked this all the time. Some say Ruby is slow, and while that may be a debate, the core point is that it allows us to use a tool we love to give meaning to the data extracted by the OCR engine.
00:03:19.290 So, how does it work? You run the OCR engine, collect all the words, and give them meaning using Ruby. It's logic; you define that a word at the top of the page, for example, could be a date if it consists of two numbers followed by four numbers. You're essentially digitizing the world!
00:03:57.900 I hope I've inspired some of you and maybe even left you slightly skeptical about using OCR with Ruby. Thank you for being here! It's fantastic to be part of a community that appreciates Ruby, even when we get skeptical looks for advocating OCR with it.
00:04:31.169 Now, let's transition to another speaker. Thank you so much for listening! My name is Peter, and if you're interested in OCR or using Ruby for this kind of stuff, come find me later.
00:05:00.450 Alright, hi again! My name is Jake, and I work with Chart. Over the past year, my goal has been focused on improving performance at Chart. A significant part of this effort involves reducing memory usage. I have been working on patching Ruby’s garbage collector, but today I don't have time to dive deep into that.
00:05:40.920 Instead, I will quickly share some straightforward steps to help you see reduced memory usage in your applications. First, upgrade to Ruby 2.4, as it generally uses around 10% less memory. A big part of this improvement is related to the ‘heat growth factor’ which determines how quickly Ruby increases memory allocation when it runs out of memory. You can either set this as an environment variable or directly upgrade to Ruby 2.4.
00:06:40.080 Another quick fix is to use STS J malloc, which is an alternative implementation of malloc created by Jeremy Evans at Facebook. Since Ruby 2.2, there's basically no downside to using it; it's faster and uses less memory. If you're utilizing Sidekiq, you'll find it runs significantly better with reduced memory consumption because it's thread-based.
00:07:59.541 Now, be aware that Ruby isn't entirely friendly to 'copy-on-write' forking since prior versions, and while Ruby 2.0’s bitmap garbage collector is a bit better, it's not optimally effective. Therefore, consider enabling ‘transparent huge pages’ as this Linux kernel optimization benefits most applications but can create issues with Ruby.
00:08:45.180 Additionally, consider using tools such as Asada’s Nakayoshi fork gem or simply start the GC with specific configurations in your setup file. You should be aware that Ruby 2.0’s bitmap garbage collector contained mechanisms that detrimentally affected forking.
00:09:49.770 When it comes to the future, can we make things better? There are low-hanging fruits to improve Ruby’s performance. Tenderlove, Aaron Patterson, has been discussing compacting garbage collection, which will help optimize memory use.
00:09:58.890 I am also working on adapting Ruby’s garbage collection to utilize multiple heaps similar to Erlang, which will be vital for Ruby 3 as we integrate concurrent operations. I apologize that was a lot of information to digest within five minutes, but thank you for listening!
00:10:39.510 Next speaker, please.
00:10:41.970 Now, I’m going to talk about the European rail situation over 15 years ago. Back then, there was a single carrier per country, so if you wanted to send an item from France to Germany, it became complicated.
00:11:40.750 Let’s look at Italy, where you had both Italo and Trenitalia. They, unfortunately, do not communicate with each other very well. If you want to buy a ticket, you must do so through each carrier's individual website. Now, cross-border ticketing has become even more complex, and fares are confused, with companies reluctant to share their pricing.
00:12:07.930 Into this landscape enters Trainline, making rail tickets easy. We currently operate across 20 countries and have access to multiple fare vendors. We're about 5 million pounds in daily turnover, focusing on using Ruby for our backend operations.
00:12:39.250 In terms of architecture, we operate mainly in mainland Europe, using a Rails app as an API, which interfaces with our rate integrator, built with EventMachine and RabbitMQ. This structure allows for seamless horizontal scaling.
00:13:30.300 Our architecture allows our API to handle multiple ticket vendors without colliding with each other, making it easy for new developers to learn and contribute to the system. You can implement a new vendor by adhering to our standardized set of APIs that ensure a seamless integration. Thank you for your attention, and I look forward to any questions.
00:14:37.959 Now, let’s dive into Ruby middleware and how we use them in production at Trainline. We have a system that processes ticket requests across multiple carriers through a unique middleware approach.
00:14:50.900 Requests come into our system and pass through a stack of specialized middleware. This architecture is heavily influenced by Rack, where each middleware handles aspects of the request before passing it on to the next.
00:16:21.150 The entire process involves middleware calling each other until reaching a response at the top of the stack. This guarantees a responsive experience and efficiently handles multiple parallel requests to various carriers.
00:17:15.100 Next, I’d like to introduce Anna, who will share insights on property testing in Ruby.
00:17:54.320 Hi, I’m Anna! Today, I want to introduce you to property testing. Most of you are likely familiar with unit testing, right? But how do we know if we have a sufficient number of tests to thoroughly cover our code?
00:18:23.340 Unit tests often only check for specific cases, and this can lead to gaps in coverage or instances where the implementation might break without unit tests failing. For instance, if we’re testing a simple addition operation, we may think we’ve covered the edge cases, but there are always exceptions.
00:19:21.980 Property testing allows us to generate random examples to verify our code's behaviors. Instead of relying on a few test cases, we can validate properties like commutativity and associativity of addition by performing numerous randomized tests.
00:20:01.350 For instance, we can create tests for the addition operation that ensure it behaves correctly regardless of the input parameters. This way, we’re checking for properties rather than specific values, covering more ground with fewer tests.
00:21:22.410 I encourage you to explore property testing libraries available in Ruby. One great library allows you to generate random data and seamlessly integrate property testing with testing frameworks like MiniTest and RSpec.
00:22:45.180 This approach has been used effectively in projects to enhance testing practices, ensuring our implementations are robust against a wider range of scenarios. Thank you all for your time and attention!