Euruko 2019

Keynote: The Miseducation of This Machine by Laura Linda Laugwitz

Keynote: The Miseducation of This Machine by Laura Linda Laugwitz

by Laura Linda Laugwitz

Summary of the Keynote: The Miseducation of This Machine by Laura Linda Laugwitz

In her closing keynote at EuRuKo 2019, Laura Linda Laugwitz explores the intricacies of machine learning (ML) and its relationship with human knowledge and biases. She uses the title "The Miseducation of This Machine" to draw parallels between Lauryn Hill's influential album and the current challenges of understanding and teaching machines.

Key Points:

- Historical Context and Framework: Laugwitz begins by referencing the roots of education's impact on knowledge production, emphasizing the importance of understanding who educates and what truths are produced. This sets the stage for discussing how both humans and machines learn.
- Defining Learning: Laugwitz defines learning in three steps: reproduction, remixing, and reflection.

- Reproduction: In machine learning, this involves the ability to replicate existing knowledge without understanding context, leading to biases, as seen in examples like Amazon's hiring tool.
- Remixing: This step pertains to the ability of algorithms to create new outputs from existing data, illustrating how machines can misinterpret data if not programmed correctly.
- Reflection: Highlighting the need for human involvement, she stresses that reflection is necessary to understand the limitations and implications of knowledge produced by machines.
- Challenges in Machine Learning:

- Laugwitz exemplifies the pitfalls of ML through humorous narratives and serious cases of biased algorithm outcomes. She shares an example where a music recommendation system misled a user's preferences based on skewed data.
- The complexity of tasks like identifying hate speech is discussed, showing that ML lacks the nuanced understanding required to interpret language contextualually.
- Research Insights: Laugwitz shares insights from her research on hate speech analysis, indicating how ML classifiers often fail to recognize misleading data patterns and require human oversight for effective outcomes.
- The methodology blends communication science and computer science, emphasizing the need for definitional clarity in detecting hate speech.
- Human's Role in AI Development: Throughout her talk, Laugwitz advocates for a reflective approach when engaging with machine learning tasks, urging programmers and users to be vigilant about the biases and limitations of their systems.

In her conclusion, Laugwitz asserts that the knowledge produced by machine learning is reflective of the data it is trained on, prompting a call for responsibility among those who design and utilize such systems. A closing quote from Lauryn Hill encapsulates the central message regarding awareness and accountability in the realm of machine learning.

Important Takeaways:

  • Understanding machine learning requires appreciating the roles of data, algorithms, and human reflection.
  • There is a critical need for human oversight to mitigate biases and improve machine understanding.
  • As machines learn from human-generated data, knowing the source and context of that data is vital for ethical and effective AI development.
00:00:06.080 Laura is going to come up now. Fun fact about Laura: she started out studying social and cultural anthropology, and then, as it often happens, she ended up in tech because, well, one needs to pay rent somehow, and rent is getting expensive even in Berlin. So, let's give it up for our final talk of the day! Thank you.
00:01:04.799 Thanks! I haven't even said anything of importance yet. My name is Laura and I've been studying various topics for the past 11 years, and I’m still not done. I've also helped organize Rails Girls Berlin for quite a while. If you’re involved in Rails Girls or similar projects, either as a learner, coach, or organizer, please raise your hand. Awesome! That’s a lot of you and thank you for keeping this community growing—it’s really important. Give yourselves a round of applause!
00:01:50.399 Now, let’s get this ship started. In 1998, Lauryn Hill released her debut album, 'The Miseducation of Lauryn Hill', which has a connection to today’s title. The title references several works that point out how the American educational system has indoctrinated Black communities with white supremacy instead of teaching them about Black history and the Black present. The question of who educates and who produces knowledge is vital—it addresses who holds power over a seemingly universal truth.
00:02:31.840 The Miseducation of Lauryn Hill is still one of my favorite albums today. This title will guide us through the next 30 minutes—maybe just 25 if I forget to breathe between sentences. I want you to keep in mind how not just humans, but also machines gain and produce knowledge. That is, how machines learn. I promise that I will leave out all the math, so you should all be able to follow.
00:03:05.040 So, let's take one more step back in time to 1950, when Alan Turing published a paper. He asked whether machines can think. Defining 'thinking' was really hard for Turing, so he tweaked the question to whether machines can imitate thinking. This is what you might know as the imitation game, where we try to discern if a human can recognize a difference between interacting with another human or a machine. While it’s difficult to define thinking, defining learning seems a bit easier.
00:03:50.400 For the purpose of this talk, I define learning as a process in three steps. The first step involves reproducing knowledge, similar to learning vocabulary in a new language or programming. The second step is remixing knowledge; this is where we create something new from existing knowledge. And the final step, the most interesting part of acquiring knowledge, is reflection. Here, we learn to understand the limits of our knowledge, discerning what our knowledge can do, cannot do, and what it should do.
00:05:14.320 Now, let's discuss three key elements of machine learning: data, algorithms, and—perhaps surprisingly—humans. Data on its own can reproduce knowledge, and algorithms can remix information, but humans are necessary for the reflection part. So, now you understand what this is all about. Before we dive deeper, I want to give a quick content warning. When we get to the last part of this talk, I’ll be discussing hate speech that includes racist and sexual violence. I will let you know when it’s time to look away.
00:05:59.760 Let’s begin with the reproduction part of machine learning, or, more generally, knowledge. Machine learning is quite adept at reproducing what we as a human collective already know. It can identify spam messages, make music recommendations, or translate from one language to another. However, machines can also learn the wrong things. For example, Caroline Cinders once described how her music recommendation algorithm became skewed after she intensely listened to breakup music, which caused her Spotify to suggest Mumford and Sons constantly. Unfortunately, to fix this, she had to delete her account, create a new one, and avoid that band.
00:08:12.240 Beyond these humorous examples, there are more serious cases where reproducing human behavior can be detrimental. Machines often learn biases, evidenced by cases like Amazon's recruiting tool, which demonstrated discriminatory hiring patterns, or biases in the justice system. The artist Hito Steyerl suggests that those who own past data 'colonize the future', meaning that machine learning relies on historical data to predict future outcomes. Consequently, data alone cannot learn beyond the reproduction level.
00:09:20.240 Now, let’s touch on the remixing part of learning—though I don’t mean scratching DJ style. Before continuing my talk, I want to ascertain if you are human. Please select all the images with a bike; since I have no proper interface for this, I’ll suggest a fun agreement. Everyone knows what jazz hands are, right? Let’s use them! So, when I show you an image that contains a bike, give me your best jazz hands. If not, just stay still. Let’s begin: Does this image contain a bike? How about this one? Did someone bring their bike and hide it in the image? Now, let’s check this last picture; it contains Rotterdam architecture that you should check out.
00:11:22.159 Congratulations! You’re not robots. However, since I was the one asking the questions, we still don’t know if I’m a machine. Let's assume I am. Now that you've classified some bikes for me, I will learn how to classify bikes. First, I simplify the image because I'm not the most resourceful machine. I’ll lower the resolution and drop colors until it’s sufficient for me to identify parts based on basic shapes, like horizontal and vertical lines. Humans can identify these shapes easily, but I, as a machine, need to perform more rigorous comparisons throughout the entire image.
00:12:37.759 What I do is take one shape and start comparing it to the image progressively. If I find a match, I create a new image based on that comparison. This extraction process leads to what’s known as a convolutional neural network. My purpose is to determine which shapes are significant for identifying a bike. Once I’ve completed this, I can confirm whether a new image contains a bike or not.
00:14:11.360 However, there are issues with this approach. For instance, a study last year aimed at tricking machine classifiers by creating images that have the basic shapes of a bike, despite the images themselves being nonsensical to humans. So, here’s the learning point: algorithms can do well in reproducing knowledge and extracting shapes. However, they often fail to see the bigger picture, focusing too much on individual elements without recognizing their contextual significance.
00:15:29.200 Now, let’s talk about reflection. Imagine if I asked you to identify hate speech instead of bikes. Identifying hate speech is much more complex, involving nuanced understanding rather than simple object recognition. Currently, I’m assisting in a research project analyzing hateful communication on social media and in German news media concerning migration. Our goal is to find methods and software that can recognize hate speech early and suggest de-escalation strategies. Finding hate speech is significantly harder than identifying bikes.
00:17:05.200 In our project, we blend communication science with computer science to tackle this challenge. When we identify hate speech, we employ what’s known as supervised learning, which uses a labeled dataset that indicates whether statements constitute hate speech. Our data is then divided into training and testing sets. During training, we use convolutional neural networks to recognize patterns similar to hate speech statements and create a classifier based on this understanding.
00:18:41.600 However, machine classifiers often yield unsatisfactory results, especially early on. We must tweak parameters until we achieve better accuracy. This means adjusting various elements repeatedly until we are satisfied with our classifier’s predictive capabilities. Nonetheless, it’s crucial to emphasize that machines cannot inherently recognize their failures; there must be a human component scrutinizing the output.
00:20:39.200 An example from our research project involved analyzing a massive dataset from Wikipedia and Twitter that was not labeled for hate speech but for toxicity. A notable incident involved a phrase that was flagged as toxic by our algorithm. The classifier strongly associated swear words with hate speech, yet in this context, it was merely an individual apologizing. This demonstrates how classic indicators for hate speech can mislead classifiers.
00:22:34.720 Moreover, hate speech can manifest without using defamatory language; subtle rhetorical questions and ironic remarks often escape detection. Words can evolve in meaning, making it challenging for both machines and humans to stay abreast of their significance. Unlike machines, humans can adapt their understanding to changing language contexts, which is fundamentally crucial to effectively comprehending hate speech.
00:24:56.000 In our research, we concentrate on defining hate speech clearly, as it consists of intentionally discriminatory messages. This definition transcends mere emotions and is not confined to spoken words; hate can manifest in rational ways and must be identified contextually. We contextualize hate speech through methodologies, forming a coding guide to aid humans in understanding it significantly better than a machine could.
00:27:07.760 During discussions, coders analyze statements, debate their meanings, and refine the codebook for clarity. This collaborative process allows for valuable exchanges, ensuring the classifiers are well-grounded in the practical applications of our definitions. It is crucial for coders to recognize hate speech through various indicators, such as legitimizing violence and making dehumanizing comparisons.
00:30:06.720 Now, let me conclude with a quote from Lauryn Hill’s album: 'Consequences are no coincidences.' The knowledge produced by machine learning is no coincidence; it reflects the data we choose to feed into the machine, how we build the algorithm, and ultimately, how we question and reflect on the outcomes. If you program machines or use them, I urge you to be the reflective human in this process.