Digital Audio
How music works, using Ruby
Summarized using AI

How music works, using Ruby

by Thijs Cadier

In this talk presented by Thijs Cadier at Euruko 2022, titled "How music works, using Ruby," he explores the intersection of music and programming, particularly using the Ruby programming language to understand audio creation and manipulation. The presentation begins with an introduction to music concepts and how technology has impacted music history. Cadier articulates that audio is fundamentally a waveform, which humans perceive as sound through pitch, timbre, and tempo. The discussion then transitions into the historical context of music technology, starting with early inventions like the phonograph by Thomas Edison and Berliner’s disc phonograph, which led to the use of records.

Key points include:
- The evolution of recording technologies from analog to digital, highlighting significant milestones such as the introduction of CDs and digital audio.
- An explanation of digital audio as comprised of samples, how they can represent waveforms, and how Ruby can be used to manipulate audio data.
- Techniques for altering sound, including making sounds louder, mixing multiple track sources, and applying compression to control audio peaks. Cadier articulates the process of using Ruby to read WAV files and visualize audio waveforms, showing practical examples of audio manipulation.
- Methods for sound synthesis in Ruby, including creating noise, generating sine and square waves, and mixing oscillators to achieve richer sounds.

Cadier concludes the talk by emphasizing the accessibility of music creation through simple programming techniques and the potential for further exploration in Ruby. He invites questions, engaging the audience on various topics like side chaining and synthesizing overtones. The overarching takeaway is that sound manipulation and music creation are achievable through programming, encouraging participants to explore these concepts further in Ruby and alternative programs like Sonic Pi.

00:00:07.220 Let me see if I can get sand from here. I don't know. All right, there it is. So welcome! Welcome to Helsinki, a lovely place. I'm really happy to be here today. Today, we're going to be talking about music.
00:00:19.080 What I love to do is understand the world through Ruby. You know, the world is a really big, complicated place, and since we are programmers, we can use the languages we like to understand things a little bit better. Once you can write some code for something, you sort of actually get what it's all about. That's what I did with music.
00:00:37.739 I've been making music as a hobby for a long time, and this talk is sort of the intersection between my two main interests: programming and music. This is what I like to do—understand it by figuring it out through writing code. Now I'm here showing the stuff to you all.
00:01:06.780 We're going to cover a few things today. First, we'll have a brief history of music, highlighting how technology has played a big role in shaping that history. Then, we'll look at digital audio, how we can visualize it, do a few mixing tricks, and create some actual sounds. There are a whole bunch of Ruby codes for this talk, which you can find afterwards to play with if you're interested.
00:01:50.520 Let's get started by discussing what audio actually is. Audio is a waveform, and we perceive it as sound. To create music, our brain uses pitch, timbre, and tempo to create the illusion of music. Here is a sine wave, which is a waveform that doesn’t often exist in nature, but it's the simplest thing that can have a pitch. A waveform is similar to what you would see in a pool—it's waves of air molecules bouncing against each other and making their way through a space.
00:02:30.299 These bouncing air molecules are coming out of the speakers right now, creating a chain reaction of air molecules bouncing against each other in a wave, eventually reaching your ears. Inside your ears, there are tiny hairs called hair follicles that vibrate in response to these air molecules. This creates an electrical signal that travels to your brain, where it magically creates meaning out of these vibrations.
00:03:03.180 Humans can process language and interpret audio waves as music, but scientists haven’t fully figured out how this works yet. However, somehow, some magic happens in our heads, and this music that most people love just appears. The brain does this in three different ways.
00:03:49.500 The first way is pitch. Pitch is the number of times a wave oscillates. This is a pitch, and it’s the foundation of the musical scales we learn in primary school. The second aspect our brains use is timbre, which is what makes a sound specific, like what makes a violin sound like a violin. Lastly, we have tempo; if you take sound sources and introduce breaks in between them, we perceive that as rhythm. Combining these elements creates a simple piece of music.
00:05:00.479 Despite its simplicity, this piece already has all the components in place. Back in the day, music was always live, often around a campfire where everyone joined in to sing together. However, this concept has been somewhat lost as we moved everything into recordings.
00:05:44.940 The process really began in the early 20th century with Thomas Edison, a name you've probably heard of. He invented many things, one of which was the phonograph. You would shout loudly into that hole, and a tiny stylus would vibrate, making an etching on a cylinder.
00:06:15.720 Unfortunately, this phonograph didn’t catch on because Edison patented it, which made it exclusive to him, and others didn't want to pay him a license fee. A man named Berliner came up with the concept of the disc phonograph, which is how we ended up with records. If Edison hadn't patented his machine, we might have been using cylinders for 100 years instead of flat discs.
00:06:41.700 As we progressed, we began to develop technologies, especially during and after the Second World War. These advancements led to the creation of high-quality microphones and multi-track recordings. For example, consider the Beatles, who used multiple microphones and mixing desks.
00:07:13.620 Then we move into the 80s when everything became digital with the advent of CDs. For the younger people in the audience, CDs are something we used to play music from. Eventually, everything has moved into computers, and this brings me to the point of my talk today. Now, people use software to make music.
00:08:07.920 I will take a few components out of a program called Ableton Live and recreate them in Ruby. To do that, we first need to understand what digital audio actually is, which consists of many samples in a realm. If you look at the waveform, every little point represents where a sample is being taken. Essentially, digital audio measures the intensity of a waveform at given times, thousands of times per second.
00:08:57.060 When we recreate a waveform from these samples, the result is so close to the original that humans typically cannot hear the difference. These numeric samples can be positive or negative as the waveform oscillates around zero. The simplest format for audio is the WAV format. I'm using a Ruby gem to read WAV files, process them, and then write them back to disk, allowing us to manipulate them in Ruby.
00:10:04.620 This requires some helper code to read the WAV file and create a large array containing all the audio numbers. Once we have this big array, we can manipulate it. This could look like a second of audio; there are many numbers involved, and it's too much to grasp just by reading. Hence, we’ll create some visualizations using a gem called ChunkyPNG to generate images.
00:11:07.860 As we loop through all the samples, we draw a dot on a PNG image depending on whether it’s a positive or negative number. The images we create will represent real-world audio. For instance, here is an image of a noise made by a car passing by. The sine wave, on the other hand, appears more uniform, functioning as a mathematical function. To analyze audio more thoroughly, we utilize Ruby's enumerable method to compress data into manageable slices, which gives us clearer visual images.
00:12:26.760 With that foundation, we can begin manipulating audio. The first thing we’ll work on is making sounds louder, starting with a little drum loop. By comparing visual representations of the audio, we can see that raising peaks correlates to louder sound. To achieve this, we can process the samples by multiplying them by a ratio, writing the altered data back to disk for a louder output. However, if the multiplication exceeds the range, we encounter clipping, leading to distortion, which is evident in the sound.
00:13:40.860 This distortion is what audio engineers refer to when talking about clipping. Next, let's discuss mixing. Mixing involves combining multiple sound sources from separate channels into a more complex waveform. We have bass, drums, and piano tracks of the same length starting at the same time, which we can mix in Ruby by summing their data.
00:15:02.279 This is how we create mixed sounds in Ruby, which results in a combination of sample tracks. Thus far, it seems to work nearly effortlessly, highlighting the magic involved in audio manipulation, as it doesn't have to be overly complicated.
00:16:09.360 Now let's delve into compression, a technique that makes sounds louder by controlling their peaks. Compression has been used since the analog era, picking a threshold to identify and reduce the volume of these peaks to prevent distortion when amplifying the entire signal.
00:16:57.700 When applying compression in Ruby, we identify samples above a threshold, reducing their volume before amplifying the overall sound again. This two-step process enables us to achieve a compressed sound, causing the peaks to blend better with the rest of the audio.
00:17:54.660 At this stage, I believe I’ve made a simple compressor in Ruby. It may not be perfect, but here’s what it sounds like after applying the compressing method. You can hear the initial hit is clear, but there’s an extended ringing on the tail, which could become a trendy sound in the future.
00:19:20.480 Next, we’re exploring sound generation techniques. Historically, analog synthesizers used electrical circuits to generate waveforms. Today, we can recreate similar effects using Ruby, focusing on four primary approaches.
00:20:00.760 The first method is to create noise using Ruby’s random function within a certain range. This randomness can be visualized and utilized in various ways. For example, by applying filters, we can shape the noise into something more controlled, like a snare drum.
00:21:27.540 The second method involves generating a square wave, which can be achieved by creating an oscillator in Ruby. This oscillator produces alternating high and low values, which you can hear when combined with other samples.
00:22:26.820 The sine wave is another essential element in sound generation. By creating a sine wave using the principles of trigonometry, we yield waves that sound pure and distinct.
00:23:15.780 Combining multiple oscillators allows us to create chords, which consist of sounds at different pitches that harmonize together. In Ruby, mixing these oscillators gives us a richer sound, just as playing multiple strings on a guitar would.
00:24:13.920 At this point, we've covered the essentials. If we apply the Fourier Transform concept, we realize that all real-world sounds can be synthesized by combining multiple sine waves. This is valuable in data analysis and signal processing.
00:25:46.680 To add timbre, we can combine both sine and square oscillators, leading to an interesting mix. Even though the waveforms might not be aesthetically pleasing right now, creating these sounds using simple Ruby code is quite achievable.
00:26:17.400 To demonstrate further, I created a cover of what I believe is the best song ever made, showcasing the potential to create music with this approach. Thank you for joining my talk!
00:28:07.620 Please stay with us, and I’ll be happy to take your questions while we move on to the next segment.
00:28:19.020 The first question is about side chaining kicks to bass in Ruby. Sidechaining involves using a kick drum signal to control compression. If you have further questions, feel free to find me later.
00:29:08.100 There are common misconceptions about music, like the notion that musicians use esoteric math, while in reality, it’s quite simple. It annoys me when people complicate a straightforward process.
00:30:07.140 Another question is if it's possible to play with vocals in Ruby. Yes, the code can be applied to various audio forms. Regarding overtones, they are produced by vibrations in real-world mediums, and you can synthesize them in Ruby, although it can get complex.
00:30:49.380 Another question about applying attack and release times to compression—these concepts involve monitoring if signals stay high above the threshold. I initially ignored making that example, but it can certainly be integrated.
00:31:33.360 Lastly, someone mentioned Sonic Pi, an excellent tool for creating music in Ruby. I recommend trying it out.
Explore all talks recorded at EuRuKo 2022
+3