00:00:15.949
My name is Beth Haubert, and this talk is called 'Cats, The Musical! Algorithmic Song Meow-ification.' I am very excited to be here today, as this is my first time speaking at RubyConf. Thank you for being here.
00:00:26.689
I want to give you a little warning before we dive in: within the next 20 to 30 minutes, you are likely to encounter some poor singing, an abundance of cat GIFs, and an excessive amount of silliness. So, be prepared! Let’s get started.
00:00:47.129
I don't remember what my bio says on the website, but it may be a little out of date due to some life changes. I used to be a software engineer at Flywheel in Omaha, Nebraska, and now I am at Thoughtbot in San Francisco, which I am very excited about! If you happen to work at Thoughtbot and we haven't met yet, please come say hi afterward.
00:01:15.210
I've been told by a few people that one of my strengths is thinking outside the box, or as I like to call it, being really good at coming up with silly ideas. Some of my ideas have included 'Feces Book,' a social media website for your poop, and 'Kombucha,' a super boozy version of the fermented tea drink that's supposed to be healthy.
00:01:35.640
Another idea I'm proud of is 'Luka,' a U.S.-based anti-social network for your cats. However, I actually failed to release this project in a smaller market in Omaha, which I still affectionately call 'Omaha.' Now, on a change of subject but still very relevant, who here is familiar with 'Game of Thrones'? Okay, cool, most of you. It has been one of the most popular television shows.
00:02:22.780
To give you a bit of background on me, this is a picture of my husband and me. We got married in a bar, to the disappointment of both of our parents. These are our cats: Xiao, GUI, and Clementine. Yes, this is a professional photograph of my cat. This is also a picture of me and my husband on Halloween a few years back when we really got into the 'Game of Thrones' theme.
00:02:45.810
Now, you might be wondering where I’m going with this, and I will explain shortly. To set the scene, I’m going to play you the 'Game of Thrones' theme song right now.
00:03:01.890
If we could get the volume up a bit?
00:03:41.230
Since you're all familiar with the 'Game of Thrones' theme, we're going to do something together. We are going to meow the 'Game of Thrones' theme song! I’ll get us started and then you will all join in. Are you ready? Yum, yum, yum, yum, yum, yum, yum... (and so on).
00:04:01.560
Great job, everyone! We just meowified the 'Game of Thrones' theme song. I think we probably set some kind of world record for the most people meowing in a room. So, 'Meowifier,' the application I created, is just one idea in a long line of silly ideas I've had over the years, but it also seemed like a really interesting technology problem to solve.
00:04:50.410
How exactly does it work? Simply put, you upload a song's audio file, and 'Meowifier' outputs a new audio file with that song's melody sung by cats. I wish I could say there’s such a thing as a cat choir, but there isn’t. I searched online, and nothing came up. So it was all up to me to figure this out.
00:05:09.550
So let’s discuss the first big challenge I faced: how to obtain the notes of only the melody from a song’s audio file. For a human, especially someone with any musical training, picking up the melody of a song is relatively easy.
00:05:31.710
The melody, for those of you who may not be musicians, is the principal part of the song. Every song you hear on TV or the radio is polyphonic, meaning there is more than one note happening at the same time. For example, if you listen to 'Bohemian Rhapsody' by Queen, the melody is the part that Freddie Mercury sings, like the line 'Mama, just killed a man.' That's what I'm looking for, just the melody.
00:05:57.930
You may be wondering how I went about extracting the melody. Compared to a human brain, computers are pretty unintelligent; we have to tell them everything to do! Writing an algorithm for a computer to extract the melody is incredibly complicated.
00:06:05.010
I did not write my own algorithm; I’m not that ambitious. Instead, I did what most programmers do: I Googled it to find something that could work. After some digging, I came across a tool called Sonic API, which offers professional-grade audio technology and high-quality algorithms.
00:06:27.440
This API is free up to a certain point, so I decided to give it a shot. In my code, I created a simple song parser class with a parse method. All I needed to do was pass the proper parameters through an HTTP call to the API, including the song file, and it was supposed to extract the melody for me.
00:06:47.200
However, while I won’t expect you to read the tiny text in the example on the screen, this is just a collection of what the API sends back. It receives the first few notes of the 'Game of Thrones' theme song and returns several pieces of data per note.
00:07:17.020
So, for each note extracted, I have four pieces of data. The MIDI pitch here maps to the pitch of a note. Since only whole MIDI pitch numbers map to standard notes on a keyboard, I had to round the note up or down before I could map it.
00:07:40.400
MIDI, which stands for Musical Instrument Digital Interface, is a technical standard that allows all electronic musical instruments from different manufacturers to communicate with one another. For example, a MIDI pitch of 36 corresponds to a C, specifically two octaves down from middle C.
00:08:11.180
Let’s take a break and recap where we are. So, I need to ensure that my notes have the correct pitch mapped in MIDI, which allows me to construct my melodies correctly.
00:08:34.590
The next step is matching the length of the meows to the length of the notes in the melody. Melodies contain notes that vary in length—when you think back to 'Bohemian Rhapsody,' for instance, Freddie doesn't just hold each note for half a second; some are shorter and some are longer.
00:09:01.750
It’s important for me to have control over the exact length of my meows, but I can’t physically create every conceivable length from scratch. This means I need to find a tool that can cut or extend a meow to fit the proper timing. After some searching, I found a great tool called FFMPEG.
00:09:24.590
FFMPEG has been around for two decades, and while the plethora of options can be overwhelming, I finally got it to work! Unfortunately, this method I wrote is quite long and messy because I didn’t refactor it, and I've had it for two years.
00:09:46.870
What happens is I pass in the parsed song, which has the collection of notes extracted from the API, and we need to adjust our meow sound files accordingly. I have around 88 short audio files in my application, each with a different pitch.
00:10:03.140
The logic handles creating meows with the correct duration. If the extracted note is shorter than the meow file I have, I trim the end to get the correct length. If it’s longer, I duplicate the meow file until it reaches the desired length.
00:10:24.580
To illustrate, if the duration of the note is 0.48 seconds but the meow is only one second long, I can trim it down to match. For longer notes, I find how many loops of the meow file are needed to match the duration of the extracted note, and I combine them to create one audio file.
00:10:43.950
The next challenge was to create a multi-octave library of meows since very few cats meow in the bass or tenor range. Since I found no existing cat choir, I had to come up with my custom meow library using various tactics.
00:11:04.960
Initially, I attempted to record notes on my piano and sing along, but I always ended up a bit flat. Afterward, I tried a tuner app on my phone, and I managed to collect a few notes, but it wasn’t enough for a full library.
00:11:30.300
I finally found some octave recordings of an auto-tuned man meowing on a free sound website. While this collection didn’t span the full 88 notes of a keyboard, it provided a solid foundation for my work. However, I soon needed to create a workaround for notes that fell outside of that range.
00:11:56.980
Once I created my meow library, each note in the collection required a pitch and octave designation. Some of these notes had sharp designators like 'f-sharp 6.’ To ensure compatibility with my constants, I wrote a method that patched this in so I wouldn’t have to rename every single file in my library.
00:12:15.420
Despite my efforts, I wasn't fully satisfied with the library I developed. I wanted the best quality meows, so I eventually decided to auto-tune 88 more meows and create my custom library spanning almost the entire keyboard.
00:12:34.060
The moment you’ve all been waiting for is approaching. Remember the theme from 'Game of Thrones'? I’ll be using my new tools on this for a live demonstration.
00:12:54.330
However, I did encounter a few problems. It turns out that the melody analyzer I initially used wasn't of world-class quality, so I sought something more reliable.
00:13:01.400
I discovered a tool called Melodia, which is based on a Ph.D. thesis about melody extraction. Unfortunately, while it returns high-quality results, I found that it wasn't necessarily worth my time to integrate it.
00:13:19.320
Ultimately, I decided to return to Sonic API despite its imperfections. As this is a side project, I'm not constrained by deadlines or payment, so I intend to continue refining it.
00:13:45.420
Looking ahead, I’m considering branching out to create a reverse melody analyzer similar to Shazam. This would help identify which song a user is playing on my website. After determining the song, I could scrape the internet for a mini version of that track and convert it into a meow version.
00:14:02.380
I haven't built that yet, but it is next on my agenda. So, the next time you see this talk, perhaps I will have completed it.
00:14:24.680
In the meantime, I hope you’ve enjoyed my discussion and the cat GIFs I've shared. You can find my slides on Speaker Deck, and I’ll be tweeting that link out soon. My Twitter handle is @aHaubart.
00:14:42.220
Thank you all so much for participating and for yelling with me today.