Talks

Re-interpreting Data

Re-interpreting Data

by Murray Steele

In Murray Steele's talk titled "Re-interpreting Data" at RubyConf 2023, he delves into the intriguing process of converting various file types into audibly and visually interpretable formats. The core discussion revolves around the ways data can be manipulated and restructured using Ruby, emphasizing the creative and playful sides of programming.

Key points discussed include:

- File Extensions and Types: Beginning with the basics, Steele explains how file extensions indicate file types, but merely renaming a file does not alter its intrinsic structure or content.

- WAV File Structure: He highlights the simplicity of WAV files, consisting of a header and data section, demonstrating how to create a WAV file from a PDF by appending an appropriate WAV header.

- Example: Murray illustrates the concept with Ruby code, showcasing how to manipulate bytes and pack data.
- Experiments with Audio: Incorporating live demonstrations, Steele generates audio from the Ruby interpreter leading to surprisingly structured but unlistenable outputs.
- BMP File Format: Steele transitions to discussing BMP files, explaining the image format's header, pixel data, and the complexities in structuring pixel representation.

- He notes the necessity for padding to meet format requirements, showcasing techniques to accommodate pixel layouts without wasting data.

- MIDI Files Overview: Finally, he addresses MIDI files, outlining their unique structure compared to WAV, focusing on compact event representation and efficient data storage.

- Steele runs a demonstration to convert a README into a MIDI file, successfully creating an orchestral interpretation.

Steele's essential takeaway stresses the value of curiosity and creativity in programming. He encourages programmers to leverage familiar languages like Ruby to explore and discover new ideas within the realm of data manipulation. Emphasizing that programming is a magical process, he concludes by inspiring attendees to embrace their interests and pursue explorative coding endeavors.

00:00:18.960 Welcome everyone! My name is Peter, I'm with the program committee, and I have the pleasure of introducing Murray Steele today. He is an engineering manager by day and a co-organizer of the London Ruby User Group by night. Murray cares about encouraging a sense of curiosity and play while programming. Please give him a warm welcome!
00:00:44.000 Thank you! Hello, I'm Murray. Thanks for coming to my talk.
00:00:49.199 I’m an engineering manager at Cleo. We're based in the UK, but our customers are in the US. We're empowering people to build a life beyond a paycheck through an AI assistant that understands their banking information, providing personalized and relevant advice on personal finances. For a fee, we offer access to services that actively help improve your situation. However, I'm not talking about any of that today. If it sounds interesting, we are hiring and can help with visas and relocation, so come and find me later!
00:01:12.200 What I am here to discuss is files and data, so let's get started! Here’s a screenshot of a fairly standard downloads folder featuring different types of files—pictures, movies, documents, audio, etc. The file names include the title of the file and, after the dot, the file extension, which indicates the file type.
00:01:32.920 In a modern graphical operating system, you don't have to parse that extension yourself; your OS does it and provides a handy icon to identify the file type and hints about the application that will open it when you double-click. Indeed, if you rename the file to change the extension, it is likely to change the icon and the application that opens it. If the OS detects the change, it might warn you about it. When I first started using computers, I thought this was all that was involved—rename the file, and you'd be able to use it. However, as I quickly discovered, that isn't true.
00:02:05.119 When I tried renaming a file from .doc to .txt, I didn't have Microsoft Word and couldn’t open the file with Notepad. What I found was just a stream of nonsense. I gave it a try with a PDF, hoping that if I renamed it a .wav file, I'd be able to magically listen to all the words contained within. Obviously, that didn't work—I received an error instead. This highlights that there is more to using files than simply changing their extensions.
00:02:49.120 On Unix systems, there’s a command called 'file' that attempts to identify the actual type of a file. Interestingly, one of the things it does is open the file and look at a portion of it to make an educated guess about its format. It doesn’t just accept the file name blindly; it checks the contents—this is especially relevant when you think about my earlier attempts to open files without the appropriate software. This is why renaming a .txt file to a .wav won’t let you hear its contents. Understanding this deeper level of file structure and content is key.
00:03:26.599 At some point in my early online life, I came across a website that detailed the data structure for WAV files. WAV files are simple uncompressed sound files that store a digital representation of a recording of sound. The binary data represents the sound wave, hence the name. They comprise two parts: a header and a data section. The data section holds the raw bytes that convey the sound wave, which are essentially just a stream of numbers. The header explains how to interpret that data.
00:04:02.400 The header consists of two parts. The first part serves as an introduction, declaring, 'Hi! I'm a WAV file, and I'm this long.' It’s a short section meant to inform any software that encounters it. The second part specifies how to interpret the subsequent audio data. Details include the number of channels—whether it’s mono or stereo—how many samples are taken per second, and the bit depth of each sample.
00:04:22.560 Renaming a .pdf to a .wav won’t allow you to listen to the contents of that PDF. However, since the WAV file format is straightforward, we can actually take a PDF file, place a WAV file header atop it, and then we can listen to it. The process involves just taking the entire content of the PDF and merging it with the WAV file structure to create a valid WAV file. We compute what the header requires by examining the file size and determining appropriate sample rates and bit depths.
00:05:11.960 Here’s some Ruby code that helps construct the header. The first part builds the identifier and length components of the header, stating, 'I’m a WAV file, and I'm this big.' The next section involves our arbitrary choices for sample rate and bit depth, combined with calculations based on the file size, defining how to interpret our data. There are some magic numbers inside the code which serve specific purposes, but while they are crucial, they can be a bit mundane. The third section handles the final part of how we interpret the data, which also uses the file size to define how much data there is.
00:06:33.799 After the header, the file contains the raw bytes that make up the sound data. A notable point is how the 'pack' method is utilized. I hadn’t encountered it in my usual coding routine. When called on an array of numbers with a format string, 'pack' converts those integers into byte representations. The fascinating aspect of this conversion is that numbers can be represented in various forms within bytes. With 'pack,' you can specify whether to represent a number as a 4-byte or 2-byte integer, in big-endian or little-endian order. It's vital to grasp these distinctions, especially in the context of byte manipulation.
00:07:34.880 As an aside, we have a custom RuboCop rule at work which highlights magic numbers in our code. If we've used such numbers, it encourages us to define a named constant instead. However, since this code is primarily for personal enjoyment, best practices can be left behind. Returning to the code, I wondered about a specific '16’ I encountered—is it special because it relates to bytes and a multiple of eight, or is it something particular about the WAV file format? I can’t recall, and perhaps if I’d defined it with a name, it would be clearer.
00:08:29.000 We've explored the code, so let's put it to the test. First, I’ll require my library named Stegosaurus, which I thought was apt since this code could serve as a steganography tool—hiding data within other data—and, well, dinosaurs are cool! We’ll create a WAV object to build WAV files, using a README from my repository that is about 1,000 bytes long, which provides an interesting audio representation.
00:09:06.960 It’ll showcase how creative we can be with this approach. However, to really generate something audibly interesting, we need a larger file. While I could potentially find bigger files around, the largest file I definitely know I have is the Ruby interpreter itself. Ruby provides a helpful module called RBConfig that provides numerous details about how Ruby is built, including 'rb_config.ruby', which gives us the path to the current Ruby interpreter.
00:10:00.120 Let’s see if this generates sound. Never do live demos! Okay, get ready for this moment. What I outputted is essentially unlistenable white noise—fair enough, right? But there’s a slight interest in it. As we skip through the output, we can discern some structure. Different segments sound distinct. If you’re old enough, you might feel nostalgic as it reminds you of loading software from tapes or dialing into the internet via a phone line. While it may not be enjoyable to listen to, it indeed has some structure.
00:10:31.240 We explored the WAV file structure a bit, and there is still much more to uncover. However, rather than continue with this white noise, let’s explore the file visually. We can do this with the BMP image format, which has a header and then pixel data. This means we should be able to replicate our earlier process: calculating the header and writing the source file as an image. If we go into more detail, we can see the header stores pixel information, including color values for each pixel.
00:11:16.240 The BMP header breaks down into three primary components. The first segment features the BMP identifier and the file size, followed by the segment detailing how to interpret pixel data, such as width, height, color depth, and DPI resolution for printing. Finally, there is a segment describing colors used in the image. The BMP format uses an indexed color approach, meaning the image pixels refer to an entry in a color table rather than being expressed in RGB values. While we've encountered similar requirements with WAV files, the pixel data is slightly more complex.
00:12:54.520 To write a valid BMP file, we need to solve three problems. The first is color depth—how many colors the image has. Using arbitrary choices for color depth impacts the amount of data we need. With a monochrome image, we can choose a one-bit color depth, where each byte represents eight whole pixels. For example, a 25-byte file could represent a 200x200 pixel image. If we want 256 colors, we assign an 8-bit color depth, where each byte represents one pixel. For more colors, like 24-bit color (16 million colors), we assign three bytes per pixel; thus, a 25-byte file would now represent an 8 and 1/3 pixel image. If we need to adjust for incomplete pixels, we can rely on padding.
00:14:42.440 The second challenge is shaping our image into a rectangular format. Our pixels must fit neatly into this structure. If, for instance, we start with a 28-byte file that results in 10 total pixels, we can rearrange them into a 5x2 rectangle; however, most of the time we want simple squares. To determine the minimum square that accommodates our pixels, one can follow a straightforward algorithm. It starts by calculating the total number of pixels (for example, 10), taking the square root, rounding up to the nearest whole number, and calculating any necessary padding to achieve complete pixels.
00:15:43.440 This is where null bytes save the day. We can add extra padding bytes to maintain valid structures without wasting source file bytes. For scan lines, the BMP format mandates that they should be a multiple of four bytes long. The challenge arises if the first scan line ends up not being a multiple of four bytes. However, we can again employ padding—by rearranging the pixel data into valid scan lines that abide by the format's requirements. It may feel as though we are wasting some of our data, but it's a necessary compromise for ensuring compliance.
00:17:10.560 In summary, we have considered the different padding requirements for both pixel and rectangle structures. Line padding, however, is unique. When handling that padding, we assess it against the final pixel layout while keeping in mind that the line padding must be accounted for at the end of each line. The code for handling BMP writing is undoubtedly more involved than that for WAV files, so let’s discuss how we glean the pixel data from our source file. We will pull pixels based on their width in pixels, write those pixels to the target file, and then add necessary padding at each line, repeating the process until we exhaust the content.
00:18:15.680 When forming the padding, we utilize the pack method once more but in a clever way. In this case, we create an array of zeros and 'pack' it to generate the desired number of null bytes without utilizing any array slots. It's a fascinating technique—getting something from nothing! While I've shared a lot regarding padding calculations, there is more code included that handles additional padding enhancements and creating color tables. Critical analysis would question why I used so many while loops or why I didn’t utilize more idiomatic iterators—questions I may never answer.
00:20:04.240 Let’s take a demo! First, we'll create an object from the Stegosaurus library—we’ll apply another generator that outputs BMP files again from our README. If this works correctly, it will give us a visual representation. What we see here is a bitmap image generated from our README file. It’s intriguing—the color representation showcases color values, but you can see parts cut off due to the null bytes. Interestingly, BMP files display upside down! Overall, there isn’t much of a story in such a short README file.
00:21:27.680 To add more visual representation, we could generate a monochrome version to see how structure emerges more clearly. Look at this—it provides more structure, possibly allowing you to read it like a character in a movie. This could be a way to interact with computers creatively! Now, while we’re exploring these formats, don't forget about MIDI. MIDI stands for Musical Instrument Digital Interface and represents a protocol for communicating with actual hardware instruments like synthesizers. Unlike WAV files that store recorded sounds, MIDI is akin to musical scores and, as such, comprises headers and data.
00:23:08.960 Inside a MIDI file, the header usually contains two parts: an identifier mentioning that it’s a MIDI file, along with details about the MIDI type and time signatures. However, the track data is made up of streams of MIDI events, which are structured distinctly. Each MIDI event consists of data regarding the timing and the corresponding musical action—sometimes referred to as Delta time. Rather than being a fixed length, Delta time can vary between one and four bytes. By storing the event details efficiently, MIDI can save considerable space.
00:24:15.320 For example, if a musical piece has many events, storing each event's timing in fixed four-byte chunks could lead to unnecessary bulk. Instead, MIDI uses a method where event data can be compactly represented with up to 28 bits for time encoding. When encoding specific values, we make use of seven bits for standard notes, using a single indicator to determine if more values are to follow. The event type is a single byte, however here we only have seven bits for the type value.
00:25:21.960 There are several types of events, ranging from turning notes on/off to setting tempos. Each of the notes requires specific key and velocity data, which adds another layer of interaction. To codify this efficiently, variables are read from the source file so that we can build valid MIDI data using what we already have. Essentially, we can take our source data, form bits according to the MIDI specification, and write those to a new file. Let’s give this functionality a test run!
00:26:45.120 We’ll again create an object from Stegosaurus called 'midriffs'. This means we are now generating from our README file and passing it to our helper function. If this works, I’ll open the MIDI in an application called MIDI Tril. It allows me to play the MIDI file, but also shows it visually as a keyboard traveling along a road of notes. Let's see if we can listen to the orchestral score of my README file!
00:29:22.800 However, I know what you might be wondering: why this path? Programming languages like Ruby don't typically encourage serious bit manipulation—shouldn't that be reserved for something like C? While that’s true, Ruby was the language I was most comfortable with, allowing me rapid feedback and exploration.
00:29:48.960 My emphasis through this journey is on the importance of picking a familiar tool to learn something new. If you are learning about files and bytes, do not add the challenge of acquiring a new language. However, if you’re delving into a new language, consider solving a problem you already know; that’s how I ported my old WAV generator into Ruby years ago! Finally, while the toy I created may feel trivial, I made it because of my curiosity and the fun of combining aspects of programming in ways we don’t typically see.
00:31:14.320 We often get caught up in string manipulation for practical work, but we sometimes forget that programming computers is simply magical. We have potential to create anything we can imagine, and I encourage each of you to explore ideas that excite you. Take time during your programming sessions—whether they are enjoyable or otherwise—to embrace creativity and curiosity within your work. Don't hesitate to pursue an exploration of coding that aligns with your interests. Thank you all for listening!