00:00:09.600
Hi everyone! I'm going to talk about something I'm personally very passionate about, which is doing Optical Character Recognition (OCR) and computer vision using Ruby. Last time you saw me on stage, you might have inferred a bit about who I am, but just for those who didn't get to know me earlier, my name is Vinod. So yeah, let’s get started!
00:00:36.100
To make the talk a little simpler, I'll be speaking as Fiona, my dog, and for the sake of this talk, her favorite toy, which is very much Ruby. How many of you are familiar with OCR? Are you good at it, or do you find it kind of weird?
00:00:59.949
It's good to see some familiar faces! For those of you who aren't familiar with it, OCR stands for Optical Character Recognition. Computer vision, on the other hand, is about giving machines an understanding of what you've picked up with OCR or of the analog world. To clarify, computer vision is not the same as OCR, but OCR can certainly be used in computer vision.
00:01:14.460
So, why would we want to do this? The digital world is appealing; it's eco-friendly and immensely accessible. For example, you could scan a music sheet and have a program like Sonic play it back, which is both fun and cool.
00:01:40.299
Now, before you give me that skeptical look about using Ruby for OCR, let me remind you that Ruby is great! We all love Ruby, and that’s why we’re here, right? If you recall from the first day, there was an important point made about Ruby's capabilities. To get started with OCR, you need an OCR engine or program. The one I use is Tesseract, which is open-source. You run a command in the shell that looks something like this: you call Tesseract, provide the file path, set the languages, specify the output format you want, and give it the configuration.
00:02:13.800
Tesseract picks up text from images, and while it may not look impressive initially, it provides a bunch of data. You can then use this data in your Ruby codebase, save it to a variable, translate it into HTML, and parse it. This enables the program to recognize structured data, such as dates.
00:02:54.930
You might be wondering, again, why do OCR with Ruby? I get asked this all the time. Some say Ruby is slow, and while that may be a debate, the core point is that it allows us to use a tool we love to give meaning to the data extracted by the OCR engine.
00:03:19.290
So, how does it work? You run the OCR engine, collect all the words, and give them meaning using Ruby. It's logic; you define that a word at the top of the page, for example, could be a date if it consists of two numbers followed by four numbers. You're essentially digitizing the world!
00:03:57.900
I hope I've inspired some of you and maybe even left you slightly skeptical about using OCR with Ruby. Thank you for being here! It's fantastic to be part of a community that appreciates Ruby, even when we get skeptical looks for advocating OCR with it.
00:04:31.169
Now, let's transition to another speaker. Thank you so much for listening! My name is Peter, and if you're interested in OCR or using Ruby for this kind of stuff, come find me later.
00:05:00.450
Alright, hi again! My name is Jake, and I work with Chart. Over the past year, my goal has been focused on improving performance at Chart. A significant part of this effort involves reducing memory usage. I have been working on patching Ruby’s garbage collector, but today I don't have time to dive deep into that.
00:05:40.920
Instead, I will quickly share some straightforward steps to help you see reduced memory usage in your applications. First, upgrade to Ruby 2.4, as it generally uses around 10% less memory. A big part of this improvement is related to the ‘heat growth factor’ which determines how quickly Ruby increases memory allocation when it runs out of memory. You can either set this as an environment variable or directly upgrade to Ruby 2.4.
00:06:40.080
Another quick fix is to use STS J malloc, which is an alternative implementation of malloc created by Jeremy Evans at Facebook. Since Ruby 2.2, there's basically no downside to using it; it's faster and uses less memory. If you're utilizing Sidekiq, you'll find it runs significantly better with reduced memory consumption because it's thread-based.
00:07:59.541
Now, be aware that Ruby isn't entirely friendly to 'copy-on-write' forking since prior versions, and while Ruby 2.0’s bitmap garbage collector is a bit better, it's not optimally effective. Therefore, consider enabling ‘transparent huge pages’ as this Linux kernel optimization benefits most applications but can create issues with Ruby.
00:08:45.180
Additionally, consider using tools such as Asada’s Nakayoshi fork gem or simply start the GC with specific configurations in your setup file. You should be aware that Ruby 2.0’s bitmap garbage collector contained mechanisms that detrimentally affected forking.
00:09:49.770
When it comes to the future, can we make things better? There are low-hanging fruits to improve Ruby’s performance. Tenderlove, Aaron Patterson, has been discussing compacting garbage collection, which will help optimize memory use.
00:09:58.890
I am also working on adapting Ruby’s garbage collection to utilize multiple heaps similar to Erlang, which will be vital for Ruby 3 as we integrate concurrent operations. I apologize that was a lot of information to digest within five minutes, but thank you for listening!
00:10:39.510
Next speaker, please.
00:10:41.970
Now, I’m going to talk about the European rail situation over 15 years ago. Back then, there was a single carrier per country, so if you wanted to send an item from France to Germany, it became complicated.
00:11:40.750
Let’s look at Italy, where you had both Italo and Trenitalia. They, unfortunately, do not communicate with each other very well. If you want to buy a ticket, you must do so through each carrier's individual website. Now, cross-border ticketing has become even more complex, and fares are confused, with companies reluctant to share their pricing.
00:12:07.930
Into this landscape enters Trainline, making rail tickets easy. We currently operate across 20 countries and have access to multiple fare vendors. We're about 5 million pounds in daily turnover, focusing on using Ruby for our backend operations.
00:12:39.250
In terms of architecture, we operate mainly in mainland Europe, using a Rails app as an API, which interfaces with our rate integrator, built with EventMachine and RabbitMQ. This structure allows for seamless horizontal scaling.
00:13:30.300
Our architecture allows our API to handle multiple ticket vendors without colliding with each other, making it easy for new developers to learn and contribute to the system. You can implement a new vendor by adhering to our standardized set of APIs that ensure a seamless integration. Thank you for your attention, and I look forward to any questions.
00:14:37.959
Now, let’s dive into Ruby middleware and how we use them in production at Trainline. We have a system that processes ticket requests across multiple carriers through a unique middleware approach.
00:14:50.900
Requests come into our system and pass through a stack of specialized middleware. This architecture is heavily influenced by Rack, where each middleware handles aspects of the request before passing it on to the next.
00:16:21.150
The entire process involves middleware calling each other until reaching a response at the top of the stack. This guarantees a responsive experience and efficiently handles multiple parallel requests to various carriers.
00:17:15.100
Next, I’d like to introduce Anna, who will share insights on property testing in Ruby.
00:17:54.320
Hi, I’m Anna! Today, I want to introduce you to property testing. Most of you are likely familiar with unit testing, right? But how do we know if we have a sufficient number of tests to thoroughly cover our code?
00:18:23.340
Unit tests often only check for specific cases, and this can lead to gaps in coverage or instances where the implementation might break without unit tests failing. For instance, if we’re testing a simple addition operation, we may think we’ve covered the edge cases, but there are always exceptions.
00:19:21.980
Property testing allows us to generate random examples to verify our code's behaviors. Instead of relying on a few test cases, we can validate properties like commutativity and associativity of addition by performing numerous randomized tests.
00:20:01.350
For instance, we can create tests for the addition operation that ensure it behaves correctly regardless of the input parameters. This way, we’re checking for properties rather than specific values, covering more ground with fewer tests.
00:21:22.410
I encourage you to explore property testing libraries available in Ruby. One great library allows you to generate random data and seamlessly integrate property testing with testing frameworks like MiniTest and RSpec.
00:22:45.180
This approach has been used effectively in projects to enhance testing practices, ensuring our implementations are robust against a wider range of scenarios. Thank you all for your time and attention!