Talks

Orchestrating Video Transcoding in Ruby

wroc_love.rb 2019

00:00:14.670 Right, this is actually my fourth time on this stage, and this is the first time I'm on this side of the room. I have to tell you, this room looks much bigger from here. I had to slightly rename my talk; it used to be called 'Transcoding in Ruby: A Story.' People were asking me if I was going to do the actual transcoding in Ruby. No, I'm not crazy. I am going to use FFmpeg. I actually had to Google the pronunciation of that because I kept saying it wrong for the past few years.
00:00:49.190 You can follow the slides live on your phone or laptop. You can either scan the QR code or find the URL that will be posted on Twitter in a few minutes. If you use your phone to check the slides, please put it on full screen and remember that it has to be in landscape mode; otherwise, everything will look broken, and you'll claim that it's my fault. Now, I will be avoiding talking about any business side of the project, so please don't ask me about it. The reason is that I really didn't ask for permission to talk about this project, and I'm not quite sure what the legal status is right now. I will also be a bit vague about the timeframe just so you cannot easily Google it. The entire presentation is actually based on the conversation logs that I had, not on the actual code, as I no longer have access to that code. So keep in mind that the code examples may or may not be broken or might not be up to date.
00:01:52.260 There was once a project. The project was a media platform where users could upload both video and audio, and it featured an HTML5 video player. We had to process and transcode the videos, and we also had some extra processing because of the secret business logic that I cannot discuss. For the proof-of-concept version, we decided to run it on a dedicated server instead of in the cloud. At that time, the solution for handling file uploads was CarrierWave, so we obviously went with CarrierWave.
00:02:09.160 Since we were Rails developers, we also decided to leverage some of the Rails gems. We used CarrierWave Video for processing and CarrierWave Backgrounder for running it in the background because, as someone previously mentioned, we cannot really transcode videos while they are uploading. The code looked somewhat like a creative guessing game and was prone to changes due to the unreliable internet connections. So, the lesson here is to avoid callbacks whenever possible; they complicate the code significantly. I believe this is obvious to those attending this conference; callbacks can make your code extremely difficult to reason about, and they usually turn everything into a tangled mess.
00:02:44.290 So, how was it built? We had our Video class, which was based on Active Record, and since it used CarrierWave, it was mounted on an uploader. The uploader then spawned a background process using CarrierWave Backgrounder for transcoding. Additionally, we implemented various callbacks to set the necessary states before and after processing. So the lesson here is: avoid chaining too many callbacks, as it makes code maintenance a nightmare. Eventually, we needed to transcode two different versions of the videos, namely MP4 and WebM, so we also decided to change the resolution, creating multiple quality versions. The trick here involved applying a quality picker to make the user experience consistent.
00:03:41.160 In retrospect, we should have been cautious about using CarrierWave for anything even remotely complex. It's likely sufficient for tasks like resizing a thumbnail, but anything larger can lead to significant challenges. At the time, there was also a bug in CarrierWave that made it impossible to reprocess just one version of the file. So every time transcoding failed, and trust me, it failed often, we had to reprocess everything, which took an incredibly long time. Moreover, changing video resolutions or creating quality versions logically implied transcoding from a central original file.
00:04:09.160 Unfortunately, CarrierWave didn’t support creating versions based on other versions, leading to performance issues as our processes struggled to keep up with the demands of video transcoding. Hence, we decided to rewrite the entire system. Our new approach was to create one database model for each file, which enabled us to build a clearer and more efficient data structure. Our original Video model served as a placeholder for all versions, which were stored as different entities. We continued using CarrierWave for uploading video content, but once we moved fluidly across multiple models, we found that processing state management became even more complex.
00:04:52.240 Each version required its own uploader to initiate the background process, and then each video had to check whether the other processes were complete to ensure the statuses were set correctly. This led us to encounter various race conditions, resulting in permanent processing stages where the task was complete, but the flag hadn’t been set accurately. Around this time, as we outgrew the initial MVP phase, we recognized that relying on dedicated servers would not scale and was problematic. The constant need for manual backups and the fixed storage limits were incredibly frustrating. Therefore, we decided once again to write everything from scratch to resolve long-standing issues.
00:06:01.740 After the rewrite, we implemented a clearer system where the original video file could be stored locally, with a background worker checking the existing versions and processing them based on priority. User accessibility was key, and this remains akin to how YouTube operates; they typically first transcode the lowest quality version for quick access and process the higher quality versions in the background. The processing workflow still utilized Stream Your FFmpeg with custom arguments, allowing us to handle the specific codecs and optimize for the best performance. This approach enabled us to swiftly create reference files, making it possible to transcode in a significantly reduced timeframe.
00:06:37.350 As video uploads proceeded, we made sure to segregate queues in Sidekiq for uploads and processing, which helped reduce CPU saturation. The lesson here is that simpler solutions often prove to be the best. Our initial attempts to do everything 'the Rails way' caused more issues that we could have avoided with straightforward implementations. If we had opted for a more direct processing worker rather than concatenating multiple gems, it might have saved us considerable time and energy.
00:07:08.150 I’d like to address some common questions. Why didn’t we consider AWS Lambda? The issue stems from its 15-minute execution time limit, which is insufficient for many videos that exceed two hours in length. Similarly, while there are options like Zen Coder or Amazon Elastic Transcoder, they are expensive and not suited for our needs, considering we aimed to process many videos through a robust media platform. As for Docker, we primarily relied on bare servers, as containerization was not as prevalent or supported then, especially in production scenarios. Our focus for video transcoding centered mostly on converting various codecs to MP4.
00:08:47.000 As browser support for MP4 became widespread, we found that many users, especially Linux users, were not among our target demographic. Considering this trajectory, our strategy shifted towards limiting our focus on MP4 as it allowed us to streamline the processing and reduce output version management. We learned several tips while developing our transcoding solutions. For instance, it's advisable to copy streams whenever possible when down converting video, as it saves on time and avoids unnecessary processing where it isn’t needed.
00:09:21.160 Also, understanding the metadata of video files is critical. Metadata is categorized into fixed parameters regarding file types and specific pointers for data chunks. It’s crucial for effective streaming capabilities. Another note is optimizing for streaming right from processing, as using certain flags with FFmpeg can help reposition metadata towards the beginning of the video file, creating a better experience for users who may start viewing content before the entire file downloads. You will want to check FFmpeg's documentation on this because it can significantly affect the user's ability to watch videos smoothly.
00:10:25.600 Aside from that, we found that there are no universal solutions. We realized that we could either download a file to the server and transcode it or provide FFmpeg with the URL to transcode on the fly. While the latter approach seems efficient, we've found it to be highly dependent on the input quality of the videos. Sometimes working with certain inputs led to unacceptable processing delays—leading to clients complaining that their videos were still processing days later. This is an ongoing challenge because you can’t predict behavior purely based on codecs or formats without doing more intricate troubleshooting. Ultimately, you would have to standardize handling to ensure smooth performance for all users.
00:11:48.800 Also, realize that trust but verify applies when integrating external libraries like FFmpeg. Despite how reliable it may seem the files can sometimes avoid proper error checks, leading to seemingly valid output files that are either incomplete or corrupted. We found ourselves implementing a validation check that compares the output duration with the input duration to ensure they align, which initially resulted in a flurry of false positives. While we’re continually working to finesse and automate these checks, such annoyances remind us why rigorous testing and monitoring are vital in a production environment.
00:13:09.990 Thank you for listening, and I open the floor to any questions.