Cats, The Musical! Algorithmic Song Meow-ification

by Beth Haubert

In her talk "Cats, The Musical! Algorithmic Song Meow-ification" at the Keep Ruby Weird 2018 conference, Beth Haubert, a software engineer, discusses the whimsical process of transforming songs into cat meows through her application, Meow Fire. She begins by expressing her excitement for the opportunity to present at the conference and introduces her background, including her recent move from Omaha to San Francisco where she will work at Thoughtbot.

Key points of her presentation include:

- Introduction to Meow Fire: Beth explains how her application takes a song's audio file and outputs it with meows instead of vocals.

- Challenges Faced: She identifies three major challenges in creating the application: extracting the melody from a song, adjusting the lengths of the meows to match the melody, and assembling a comprehensive meow library that covers the necessary musical ranges.

- Technical Decisions:

- For melody extraction, Beth initially relied on an external tool rather than creating her own algorithm.

- To ensure note lengths matched, she utilized 'ffmpeg' to modify the duration of the meow files.

- She encountered the limitation of her meow library and decided to expand it by exploring online resources and ultimately recording her own meows.

- Iterative Development: Through experimentation, Beth shifted from a basic melody extractor to robust tools like Melodia, indicating her adjustments based on real-world performance across different music genres.
- Future Directions: She shares her aspirations for further improvements to Meow Fire, including a client-side interface and possibly a reverse melody analyzer akin to Shazam.

Beth closes her presentation by inviting the audience to connect with her on Twitter and shares that they can find her slides on Speaker Deck. She emphasizes the playful nature of her project while highlighting the technical challenges of music processing and software development, culminating in her quirky yet technical approach to meowing musical themes as a creative endeavor.

00:00:09.530 I'm very excited to be here! I was telling the organizers that this is the conference I always wanted to talk at, and I'm finally here. So, thank you for accepting my talk! I hope it's weird enough for you. Over the next 25-ish minutes, you're going to encounter some poor singing, too many cat GIFs, and an excessive amount of silliness.

00:00:23.010 First, I want to mention that my bio is a bit out of date. As you can see up there, it says I live in Omaha, Nebraska. However, I no longer live there; I moved to San Francisco just two weeks ago. A fun fact about Nebraska is that the tourism department recently released a new tagline: 'Honestly, it's not for everybody.' And honestly, they're right; it's not for me either. In fact, later this month, I will be starting at Thoughtbot in San Francisco, and I’m very excited about that! I know there are a few Thoughtbot employees and former employees in the audience—hi! If I haven't met you yet, I'm looking forward to it.

00:01:12.680 Now, I've been told by a few people that one of my strengths is thinking outside the box—or as I like to call it, being really good at coming up with ridiculous ideas. Some of my past ridiculous ideas include: 'feces book'—a social media website for your poo; 'Kombucha'—a super boozy fermented tea drink; and also, I considered birth announcements for new features released at work, although I still need to name that one. There's also 'Malecha,' which is a U.S.-based anti-social network for your cat that I tested out in a smaller market in Omaha.

00:01:56.280 Now, switching subjects but still very relevant, who here is familiar with Game of Thrones? Raise your hand. I'm assuming most of you are since it's the most popular show in the world right now. I'm going to give you a bit of background on why I'm talking about Game of Thrones. This is my wonderful husband; we got married in a bar to the disappointment of both of our parents—not that any of that is relevant. We have some cats: Xiao Wei and Clementine, who are both sweethearts, and this is Geeta.

00:02:31.500 Yes, I just shared a professional picture of my cats, so yes, I'm that person. This is also me and my husband, which is relevant because we like Game of Thrones. This photo was taken a few years ago after Season 1 aired, so you probably can't tell the difference. On to what really matters: you might be confused about the connection between Game of Thrones and my talk, but I'm going to play the Game of Thrones theme song.

00:03:03.870 Let's see if it should start playing. [Brief pause] You recognize this? [Music plays] I'll stop it there. It's an instrumental score with no lyrics—or are there? Well, there are if you add them yourself. So what we're going to do is something a bit unorthodox. I mentioned earlier I would be singing, and we are going to meow the Game of Thrones theme song! I'll get us started. One day, while watching Game of Thrones, we noticed it was one of those shows without anything to sing along to, so we just started yelling. Are you ready? Because it's kind of addictive once you get started.

00:04:02.950 It goes: 'Yeah, yeah, yum, yum, yum, yum, yum, yum, yum, yum, yum, yum!' See? It's fun! We've just meow-fied the Game of Thrones theme song! I probably set a world record; I'll submit that to the Guinness Book. So, Meow Fire is the name of my application. It's just one idea in a long line of silly ideas I've come up with over the years, but it also seemed like an interesting coding problem to solve. I finally said, ‘What the hell,’ and went ahead and built it!

00:05:06.080 How does it work? Well, the basics are that you upload a song's audio file, and Meow Fire outputs a new audio file with that song's melody sung by cats. I wish I could say that there was such a thing as a cat choir; I Googled it, but nothing came up. During my internet research, I came across one of my favorite musical cats of all time—Keyboard Cat. Rest in peace, Keyboard Cat! That's from the early nineties, which is almost as old as I am.

00:05:39.840 Interestingly, I discovered I'm not the first person to come up with this idea. There’s such a thing as a cat organ; a historian described a scene where it was played by a bear when King Philip II rode into Brussels. It sounds legit, right? Some psychiatrist suggested there could be some medical potential to this whole cat organ thing since it was the only thing crazy enough to get 'crazy' people to pay attention. It was likely just a hypothetical instrument until now, but here we are!

00:06:48.030 So, I was ready to do this! I knew that I would face some challenges while building Meow Fire. The good news was that I had momentum with this idea I was excited about. I decided to start building before really researching how I was going to do this and what the solution would be. I had read 'Practical Object-Oriented Design in Ruby,' one of the best Ruby books of all time, and I highly recommend it if you haven’t read it yet. Thanks to the book, I knew I needed to make my classes straightforward and easy to understand.

00:07:07.840 I also decided, based on previous experiences with legacy untested code, that this application would be 100% test-driven. I was excited about having a solid standard in place for building this application! While building Meow Fire, I identified three major challenges: 1) Finding a way to extract the melody from a song's audio file, 2) Correcting the meow length to match the length of the note in the melody, and 3) Creating a multi-octave library of meows.

00:08:01.700 Starting with the melody: it's pretty easy for a human to pick out the melody of a song, especially with any musical training. The melody is the principal part of a song. For instance, in Queen's 'Bohemian Rhapsody,' the melody is the part that Freddie Mercury sings: 'Mama, just killed a man.' However, if you were to include the harmonies, bass parts, and other musical instruments, the resulting song would sound jumbled together. I didn't want the bass line or harmony to muddy things up, so what did I do?

00:08:27.940 Well, computers aren't exactly intelligent, so someone has to write an algorithm to enable them to extract the melody from a song. Did I write an algorithm? No, I did not! I didn’t even try at first. I did what any good programmer does: I searched for something already existing. I found a tool that allegedly offered professional-grade audio technology with world-class algorithms. I figured I would give this a shot.

00:09:29.590 I had a simple song parser class. Inside this class, I created a method that allowed me to pass the necessary parameters through an HTTP call along with the song file. It was supposed to extract the melody for me. However, the response I received looked messy. Essentially, it returned hundreds of lines of data that included the first few notes of the Game of Thrones theme song. We had to take a closer look at this to see what it actually looked like.

00:10:29.630 There were four pieces of data in each collection: the MIDI pitch maps to the pitch of the note. For those unfamiliar, MIDI stands for Musical Instrument Digital Interface; it’s a standardized system that electronic instruments use so they all play the same notes. The MIDI pitch came back with a floating-point number. To map this accurately, I needed to round it to the closest whole number. For instance, if the MIDI pitch is 35.99348, I would convert it to 36.

00:11:43.490 Now, this MIDI 36 corresponds with the note C that is two octaves down from middle C. As a pianist, I can visualize this on the piano, which helps me understand where the notes fall. Each MIDI file returns a specific pitch, denoting which key it corresponds to on a piano. Following this, the collection of data returned after parsing the song will need to match the respective note pitch.

00:12:43.580 Next, I had to deal with the correct length of the meows. If you consider a melody, the notes have varying lengths. For instance, in 'Bohemian Rhapsody,' Freddie Mercury doesn't hold each note for only half a second; some notes are held longer. The problem is I couldn’t have a meow library containing every conceivable note length, as that would be an overwhelming amount of files. Instead, I had to find a tool that could either truncate or extend a meow file to fit the correct duration.

00:13:49.950 In a previous talk, the speaker discussed various tools, but I took the laziest approach and went with 'ffmpeg'—a tool that’s been around for almost 20 years. Many have used it multiple times. The challenge was the numerous settings available with ffmpeg, which required some extensive searching on Stack Overflow to find just what I needed. Ultimately, I got it working and had a somewhat messy method to achieve this, and while it works, I haven't refactored it in about two years.

00:14:51.890 The core functionality is that I pass in the parsed song; there's a library of meow files stored in my application, and I built a function to ensure I have the correct duration. For short notes, whenever the length of the extracted note was shorter than the library file, I made a copy of the library file, trimming it down to the desired length using ffmpeg. For longer notes, the extraction process involved duplicating the meow files until they met the duration of the extracted note and then combining them.

00:15:49.750 What about those longer notes? This was a tricky part, and there were a few ways I could have handled it. I chose the easier route: If the extracted note was longer than the meow library file, I'd keep duplicating that meow file until it was longer than the extracted note, then combine it into a single audio file and truncate it. The math involved is simple—taking the note length and dividing it by the file length—and using the ceiling method to ensure I cover the duration needed.

00:16:44.230 Now, all the meows are stored in an array, and in my song builder class, I combine all the files together into a new audio file output. This resolves the note duration problem! The next challenge was crafting the meow library. It’s difficult to find cats that meow across all ranges, let alone hire a cat choir. Therefore, I created my own custom meow library.

00:17:59.340 I started out by sitting at my piano, literally playing notes and trying to meow into a microphone. However, my piano was slightly flat, and I was flat too. That approach didn’t pan out as planned, so I began singing into my laptop's microphone using a tuner app, aiming to produce more accurate notes. Unfortunately, I only succeeded in recording five notes before it got tedious. Knowing I had limited resources, I explored online, eventually sourcing octaves of a man meowing from a free sound website. That helped bridge the gap, but the meow library I produced contained only about half the pitches available on a standard keyboard.

00:19:24.600 When examining what to do if an analyzed note fell outside the meow library's range, I opted to write a method to adjust the octave rather than create an entirely new library. I would modify the note to the nearest octave that my library accommodated. This was a less labor-intensive method. As I began adjusting my application to work with the specific library I created, I realized I still wasn't satisfied with the output.

00:20:02.580 Consequently, I decided to completely redesign my custom cat meow library. I expanded its reach to match multiple pitches, creating unique meow sounds specifically instead of relying on downloaded library sounds. Now, I hold the complete spectrum of desired notes within my meow library, which eliminated the previous formatting complexities and unnecessary adjustment methods altogether.

00:20:47.000 We've covered a lot, but just to remind everyone of our goal tonight: here's the Game of Thrones libretto that we've been aiming for. [Music plays] So, did you hear that? It doesn't quite sound right. The initial steps had complications involving the melody analyzer. I discovered it wasn’t adequate for my needs. If you venture into this domain, be prepared to pay significantly for quality solutions; free ones fall short.

00:21:28.720 In my second attempt, I found a library called Melodia—a tool that claimed to better perform melody extraction. It was developed by a brilliant researcher who published his PhD thesis in this area. The software he created did return MIDI files swiftly, and I pursued testing and evaluation before rolling out additional features. I discovered that melody extraction is challenging, and various music styles may impact extraction performance.

00:22:26.070 While I was able to identify some specialties of Melodia, it was particularly proficient with jazz music, which isn't necessarily widely appreciated. However, it struggled with pop and opera genres. Therefore, I returned to Sonic API, the first tool I employed, which at least provided some results—albeit far from perfect! Like I mentioned, it's a work in progress, and I'm not being compensated for this endeavor, so I won’t rush into it until I find something unparalleled in value.

00:23:12.500 As I move forward, I hope to incorporate all the necessary modifications to the melody analyzer and eventually integrate a client-side interface. Moreover, I'm contemplating a more innovative direction—something like a reverse melody analyzer tool akin to Shazam that could identify songs a user uploads and subsequently pull MIDI versions. That way, I could build it upon my current setup without repeating analyses, thereby enhancing the efficiency of my app with pre-saved data on songs already uploaded.

00:24:44.920 That concludes my talk! You can find my slides on Speaker Deck, and feel free to keep an eye out on Twitter as I may be releasing Meow Fire to the public soon. If you have any questions, please don't hesitate to ask! You can find me on Twitter, with the handle @HaubertSheree, or approach me in-person if you prefer.

00:25:06.900 Thank you so much for listening!