Cats, The Musical! Algorithmic Song Meow-ification

Talks

Beth Haubert

Cats, The Musical! Algorithmic Song Meow-ification

by Beth Haubert

In her presentation at RubyConf AU 2019, Beth Haubert showcases her whimsical application, the 'Meowifier', which transforms audio songs into cat meows. This concept is not only humorous but also a technical achievement that blends creativity with programming.

Background: Beth’s journey began in the US Air Force, where she served as an airborne cryptologic linguist, later transitioning into software development and specializing in Ruby on Rails.
The Meowifier: The application's purpose is to add lyrics to instrumental songs, exemplified by an interactive session where attendees meowed along to the Game of Thrones theme song.
Technical Challenges: Throughout her development process, Beth encountered three main challenges:
- Extracting just the melody from songs, which required using a tool called Sonic API to identify melodic lines without harmonic clutter.
- Modifying the duration of meows to match the length of the melody notes. For this, she utilized FFmpeg to adjust audio lengths appropriately.
- Creating a multi-octave library of meows since cats do not naturally produce a full musical range. She experimented with different methods, eventually sourcing auto-tuned cat sounds online.
Conclusion: Beth emphasizes the excitement of creating playful applications while facing programming challenges. The Meowifier exemplifies how fun and complex coding can be, encouraging developers to embrace unconventional ideas. This presentation is a celebration of creativity in technology, showcasing Beth's unique approach to programming and her passion for both cats and coding.

00:00:00 So our last speaker is Beth. She has come, and she is pleased to be here. Beth has had a really interesting journey into development and Ruby on Rails.

00:00:13 When she left high school, she joined the Air Force. Not just a regular Air Force job; she was an airborne cryptologic linguist. Essentially, she went to Mandarin boot camp, not the orange fruit but for the language. It sounds a little more stressful than just eating citrus fruit all day. She learned fluent Mandarin and had to decrypt things in it, which is quite a challenge for such a hard language. After six years, she moved on to become an architect but realized she didn't want to do that five.

00:00:38 Finally, she discovered development, found Ruby on Rails, and did a code school in Omaha with a Ruby on Rails focus. She graduated and found a job a week later at one of the only startups in Nebraska. Recently, she moved to San Francisco, where she is working at Thoughtbot. Please welcome Beth.

00:01:26 Hello, can you hear me? That's very important. I am the last talk of the entire conference, so no pressure. This won't leave a lasting memory or anything, right? Anyway, let's do this. I’m having a few difficulties; give me a moment to sort this out. Great, there we go! Now we can get started!

00:02:34 I've been told by a few people that one of my strengths is thinking outside the box or, as I like to call it, coming up with really stupid ideas. Some of these ideas have included things like "feces book," a social media website for your poo, and really boozy kombucha. You know, it's that fermented tea drink that’s supposed to be good for you, but once I've had alcohol in it, who knows what you'll get!

00:03:04 Another idea I came up with is birth announcements but for features that you release at work. I don’t have a name for that one yet. Then, there's America, a US-based antisocial network for your cat, which I actually built and released to a smaller market in Omaha, Nebraska, called Oh Meow-ha.

00:03:48 Before we really get into things, I need to give you a little more background on me. Like Kaitlyn said, I am a developer at Thoughtbot. I have a husband, and we got married in a bar to the disappointment of both of our parents. We have cats; this is Xiao Wei, who goes by 'Kitten' because nobody else I know speaks Mandarin, and this is Geeta. Yes, we have professional pictures of this cat. We are multilingual, which is helpful if you can't tell.

00:04:21 Another thing about us is that we really like Game of Thrones. This picture was taken a few years ago, right after Season 1 aired, so I think that was in 2011. You can’t even tell the difference, right? It's perfect! You're probably a little confused as to why I’m showing you all these details, and that’s okay. It's okay to be confused sometimes, especially during this talk. I am confused right now!

00:04:54 Imagine you're sitting at home on your couch with your partner and your cats. It's Sunday evening at 8 p.m. or 9 p.m. Central Time in the US, and the Game of Thrones theme song comes on.

00:05:44 So that’s what we’re actually going to do right now. And you’re going to help me. We are going to meow the Game of Thrones melody. Why now? Why not now? Every song deserves some lyrics, and yelling is pretty simple! I'm going to start it, and then you're all going to join in. Alright, let’s do this!

00:06:08 So you get the idea, right? We just meow to the Game of Thrones theme song. We may have set some kind of world record, I'm not sure, for the most people meowing in a room!

00:06:28 Thank you! The Meowifier application is just one idea in a long line of bad ideas that I’ve come up with over the years. But it also seems like a really interesting problem to solve. A couple of years ago, I finally said, 'What the hell!' and went ahead and built it.

00:06:42 How does it work? Well, you upload a song's audio file, and the Meowifier outputs a new audio file with that song's melody sung by cats. I wish I could say there's such a thing as a cat choir, but when I googled it, nothing came up. It's all up to me to figure that out.

00:07:12 Also, I discovered I was not the first person to come up with this idea. In the sixteenth century, a historian described seeing a cat organ—a cat organ being played by a bear when King Philip II rode into Brussels!

00:07:25 If that’s not weird, I don’t know what is! There are a few other mentions of cat organs throughout history, but scholars are unsure whether anyone ever actually built one. It was likely just a hypothetical instrument. It would have made terrible music anyway because cats do not meow on a fixed pitch. Until now!

00:08:02 I was only about six months into programming when I wrote the majority of this program, so let's revisit the questionable code aspect of my talk. As a new developer, I knew I had some big challenges ahead of me, but I also had momentum and motivation.

00:08:21 I was excited, so I decided to start building before I had figured everything out. The thing that I had going for me was that I'd already read "Practical Object-Oriented Design in Ruby" by the fairy godmother of Ruby, Sandi Metz. Thanks to her, I learned that I needed to make my classes really simple.

00:08:44 In other words, I aimed to create single responsibility classes. For example, you'll see later that I designed the application to easily swap out the melody analyzer. I didn’t want the logic for analyzers to be coupled with note converters.

00:09:06 I also decided that this application was going to be 100% test-driven development (TDD). Only having six months of programming professionally, I'd already dealt with enough untested legacy code to last me a lifetime.

00:09:24 If you're anything like me when you were a new developer, you might have had a difficult time determining what a single responsibility actually means. Focusing on testing helped me narrow each class's responsibility down. Anytime I found a class hard to test, I thought maybe there’s too much going on here.

00:09:53 So what I ended up with is each class only taking one or two types of inputs, like the path to a song, and returning a single type of information, such as the serialized melody.

00:10:03 So, here we go! I faced some big challenges.

00:10:09 Challenge number one: finding a way to obtain the notes of only the melody from a song's audio file. Challenge number two: correcting the meow length to match the length of the note in the melody. And challenge number three: creating a multi-octave library of meows!

00:10:21 So, let's talk about problem number one—the melody. It’s pretty easy for a human, especially one with musical training, to pick up the melody of a song. The melody is the principal part of a song. However, most songs you hear on the radio are polyphonic, meaning there’s more than one note going on at the same time.

00:11:01 For example, in "Bohemian Rhapsody" by Queen, the melody would be the part that Freddie Mercury sings. I want to keep the app simple, so I don't want any bass lines or harmonies muddying things up.

00:11:20 What did I do? Compared to the human brain, computers are pretty unintelligent—and the same can be said for cats much of the time! Writing an algorithm that a computer can use to extract a melody from a song is incredibly complicated. You’re probably wondering if I wrote one. The answer is no; I did not even try to write an algorithm to extract the melody.

00:11:47 I did what every good programmer does: I googled and searched for a tool until I found one that might work so I wouldn’t have to write anything from scratch. It was really hard to find, but I eventually discovered something called Sonic API. It offers professional-grade audio technology using high-quality, world-class algorithms.

00:12:13 It seemed worth giving a shot, and it was close to free! I created a simple song parser class, and inside this class, I have a method. All I need to do is pass the proper parameters through an HTTP call, like the song file in the key, and it’s supposed to extract the melody for me.

00:12:37 Here’s an example of what the response looks like—except imagine it being hundreds and hundreds of lines long. This is just the first few notes of the Game of Thrones theme song.

00:13:03 Now, let's take a closer look. There are four pieces of data I don’t need in full, but for now, we’re going to focus on the first one: MIDI pitch. The MIDI pitch maps to the pitch of a note. Only whole MIDI pitch numbers map to what we think of as standard notes on a keyboard, so I had to round my note up or down before I could map it.

00:13:21 For example, this MIDI pitch would be rounded up to 36 in MIDI terms. MIDI stands for Musical Instrument Digital Interface, and it’s a technical standard that allows electronic musical instruments from different manufacturers to communicate with each other.

00:13:54 This is where 36 lands on the MIDI scale—two octaves down from middle C. For those of you who are musicians—I'm a pianist and percussionist—it helps to visualize this on a keyboard. This is the note that's the first note in the Game of Thrones theme song. Luckily, MIDI is standardized.

00:14:11 I made these constants in my note converter class, which come into play after I parse my song. 36 corresponds to C2, and I have a method in my song parser class that associates the standard pitch to each line in my collection using these constants.

00:14:43 This way, I get a big collection of notes, which will all be the same size, but now I have this extra piece of data at the end. Now, onto problem number two. I need to correct the meow lengths to match the length of the notes in the melody.

00:15:05 A melody is going to have notes of varying lengths, obviously. For instance, thinking back to Bohemian Rhapsody, Freddie doesn’t just hold each note for half a second; some of them are only a quarter of a second, an eighth of a second, or even one and a half seconds.

00:15:40 It’s not 'Mama just killed a man.' It’s more like 'Mama just killed a man' right? So obviously, I can't have a meow file folder filled with files of every conceivable length.

00:15:58 I needed to find a tool that could either cut or extend my meow files to fill the proper lengths. I didn't want to reinvent the wheel; I knew someone had to have built something like this. I just needed to find a tool that could alter the length of multimedia files, and I'm sure most of you have used this tool.

00:16:24 It's been around for a couple of decades now—FFmpeg. The only problem with this tool was that it almost had too many options. Many of you will probably agree that there can be quite a bit of stack overflow to find exactly what you need.

00:16:38 I went through many iterations, and I figured it out. Here's my very questionable code, which is a really messy method that I need to refactor one day. Just look at all those conditionals—it’s quite ugly, and it’s hard to determine what’s going on here.

00:17:12 But, I'm going to walk you through it. So I pass in the parsed song, which looks like this. This is my parsed song. At this point, I need to tell you that there is a library of meow files living in my application. We'll talk more in depth about those in a bit, but all you need to know is that there are a few dozen short audio files, each with a meow in a different pitch, like the ones you'd find on a piano keyboard.

00:17:47 This part of my code creates a meow with the correct duration. The first piece of logic in the if statement shortens a meow file, while the else statement lengthens one. My ultimate solution was pretty simple: for short notes, when the length of the extracted note was shorter than the library file, I made a copy of that file and trimmed the end.

00:18:09 If the length of the extracted note was longer than the library file, I kept duplicating that file until it exceeded the extracted note length, and then I combined them to create one audio file. Don't worry; there will be no ahead diagrams, just logic. This is how it works. I find the number of loops through simple math.

00:18:44 You take the note length divided by the file length and use the 'seal' method, which returns the smallest integer greater than or equal to a float. In this case, it's three. So, I use FFmpeg to loop my file three times, then combine and cut it down, saving it.

00:19:14 It was my first solution, which was okay. However, there was an issue. If you have a note that's three seconds long but your meows are only one second long, ideally, you would just want one three-second meow, right?

00:19:48 But the way I had it set up originally meant there would be some duplicate meows, so ideally, I would have made my note, cut it into three separate parts, duplicated the middle part, and then combined those with the first and last notes, trimming them together to create the perfect length.

00:20:27 Okay, I realize I’ve said 'meow' quite a lot during this presentation, but hey, my first solution wasn’t ideal, but my proof of concept is working!

00:20:47 I just grab all those files and store them in an array. Then, in another class that I have, I combine all the files together in my song builder class, which is the next step.

00:21:09 Now onto my third problem: I need to create this multi-octave library of meows. That’s a lot of meows! If you hadn’t noticed, very few cats meow in the tenor, baritone, or bass ranges, among others. So, I had to create my own custom meow library.

00:21:49 So how did this begin? You might imagine, I started out sitting at my piano, playing a note, and then trying to meow it. Unfortunately, I’m always a tiny bit flat. Plus, I really need to get my piano tuned! That method just wasn’t working out for me.

00:22:34 Next, I sang into my laptop while using a tuner app on my phone. I got about half an octave's worth of notes before realizing it would take forever because I was always flat. Turns out, I only recorded five notes before losing interest! But I wanted to see if it was working, so I manually entered those notes into my test.

00:23:24 So I decided to go back to the drawing board, diving into the depths of the Internet, hopeful that I could find actual cat sounds.

00:23:43 Finally, I found a few octaves of auto-tuned meows on a free song website, and I thought, 'Let’s go for it!'

00:24:00 Here’s the library it provided me with: a keyboard has 88 keys. If you're thinking this doesn’t look like 88 sounds, you're correct; this is only 49 notes, which is about four octaves worth of notes.

00:24:24 So what happens if one of the analyzed notes fell outside the range of my meow library? Instead of continuing my search for more sounds, I chose to write some code to adjust the octave number. For instance, if a note was F7, it would be adjusted down to F6.

00:25:15 Things don’t always work out the way you plan them, actually. But hey, nobody else needs to know that unless you publicly tell hundreds of people at a time!

00:25:56 Thank you so much for listening to my meow presentation! You can find my slides on Speaker Deck. Keep an eye out; I hope to release the Meowifier to the public sometime this year. You can reach me at all of these places.

RubyConf AU 2019