Keynote: Consequences of an Insightful Algorithm

by Carina C. Zona

The keynote presentation "Consequences of an Insightful Algorithm" by Carina C. Zona at RubyConf 2015 emphasizes the ethical responsibilities of coders in the context of algorithm development and data use. Zona discusses the implications of utilizing algorithms to extract precise insights about individuals, raising crucial questions about consent and the potential unintended outcomes of such data-driven practices.

Key points discussed include:
- Definition of Algorithms: Zona describes algorithms as step-by-step processes designed to predict outcomes, applicable in both computer science and everyday scenarios.
- Consequences of Algorithms: The talk highlights that algorithms can lead to consequences, often without the direct explanation or consent from affected individuals.
- Ethical Considerations: Zona argues for the necessity of balancing human needs with business specifications, as uncritical programming can lead to painful real-world effects.
- Case Studies: Zona provides several notable examples:
- Target's Pregnancy Ads: A father's outrage over targeted ads illustrates the moral implications of data misuse when a retailer inferred personal information without explicit consent.
- Shutterfly's Congratulatory Emails: Highlighting the discomfort caused by automated messages sent to customers, some of whom were dealing with infertility or loss.
- Facebook's Year in Review: Discusses Eric Meyer’s experience with algorithmic insensitivity when the platform reminded him of painful memories related to his deceased daughter.
- Bias in Algorithms: Zona covers various instances where algorithms have perpetuated bias, such as racial profiling in Google Ads and abusive uses of data by companies like Uber.
- Call to Action: Zona urges coders to remain humble, recognize failure modes within their algorithms, and audit outcomes for biases. She emphasizes the need for transparency and accountability in algorithmic processes to create fairer, more ethical technology.

- Conclusion: The ultimate takeaway from Zona's talk is a call for coders to advocate for users' best interests, ensuring that their work consciously considers the implications on individuals while aiming to avoid past mistakes.

00:00:14.530 Alright, good morning to the 800 people who did that! This talk is called "Consequences of an Insightful Algorithm."

00:00:20.510 This talk serves as a toolkit for empathetic coding, and we will delve into some very specific issues and examples of uncritical programming, as well as the painful results that can arise from actions that are intended to be benign.

00:00:34.400 I want to start off with a content warning, as I will be discussing several sensitive topics, including grief, PTSD, depression, miscarriage, infertility, sexual history, consent, stalking, racial profiling, and the Holocaust. While these topics are not the main focus, they will come up as examples throughout the talk.

00:00:53.739 If anyone feels the sudden urge for coffee, please do not hesitate. I won't be offended. We've got about 10 minutes before we dive deeper into the content.

00:01:06.439 Algorithms impose consequences on people all the time, and we can extract remarkably precise insights about individuals. This raises an important question: do we have the right to know information that someone did not consent to share, even if they willingly provided the data that leads us to that conclusion? Additionally, how do we mitigate unintended consequences that may arise from this knowledge?

00:01:31.429 To begin with, let's ask a basic question: What is an algorithm? It is a step-by-step set of operations designed to predictably arrive at an outcome. This is a very generic definition. Usually, when we talk about algorithms, we refer to those in computer science or mathematics—patterns of instructions articulated in code or mathematical formulas.

00:01:56.539 However, we can also think of algorithms in everyday life; they can take the form of recipes, directions, or even crochet patterns. Currently, deep learning is the new trend in data mining; it is essentially algorithms designed for fast, trainable artificial neural networks. This branch of machine learning has been around since the early 1980s but has mostly remained locked in academia due to scalability issues.

00:02:17.690 Recently, breakthroughs have made it possible to apply deep learning in real production environments, allowing us to extract meaningful insights from big data. In essence, deep learning relies on an automatic discovery of patterns within a training data set. These patterns help draw intuitions about future inputs.

00:02:44.660 In terms of process, the inputs, which we call the training data, can include various elements such as words, images, sounds, and concepts. When I say this, I mean that they can total several terabytes of data. The execution process involves running a series of functions repeatedly on this array, and this iteration is referred to as layers. With each layer, we achieve greater precision, evaluating thousands of factors without needing to label or categorize the data.

00:03:25.850 Deep learning is based on a black-box premise, which drills down to tens or hundreds of thousands of factors that manage to provide predictive value, albeit without our knowing how they are generated. Currently, this technology drives significant advances in various areas, including medical diagnostics, pharmaceuticals, predictive text—such as voice commands with services like Siri—fraud detection, sentiment analysis, language translation, and even self-driving cars.

00:03:46.310 Today, we will specifically discuss concrete examples, including ad targeting, behavioral predictions, image classification, and facial recognition. To illustrate some of the concepts we will discuss, I'd like to start with a fun yet simple example: an AI that teaches itself to play Super Mario World. This AI has no prior knowledge of the game or its rules and begins simply by manipulating numbers.

00:04:51.230 Through experimentation, it begins to identify patterns and use them to predict outcomes, eventually achieving the ability to play the game. Now, let's play a game ourselves. It might look a bit like bingo, but it's called "Data Mining Fail." Insightful algorithms can lead us into pitfalls, and through case studies, we can explore some of the examples shown here on the board.

00:05:13.650 For the first case, let's consider Target in the retail sector. The second trimester of pregnancy is often referred to as the 'Holy Grail' in retail, because it is one of the few times when people’s shopping habits change significantly. All the spending habits we have, including store and brand loyalty, become notably different during this time.

00:05:40.110 This period represents a great opportunity for retailers to create lifelong customers. Target had previously relied on data to identify customers typically in their third trimester. One day, some marketers pondered a question: if they could figure out whether a customer is pregnant, even if she did not want them to know that information, could they create an algorithm for it?

00:06:05.569 After some experimentation, they came up with a remarkably reliable algorithm that allowed them to send maternity and infant care advertisements. However, an incident occurred when a father came into a Target store, livid and shouting, asking how dare they send pregnancy advertisements to his teenage daughter, implying she was being encouraged towards pregnancy.

00:06:22.169 The store manager, although not responsible for national mailers, apologized. The father returned the next day to apologize to Target because he found out his daughter was, in fact, pregnant.

00:06:40.010 This incident taught Target a lesson in manipulation; they decided to bank on deception. Instead of being transparent about their ad targeting methods, they opted to couch those pregnancy advertisements among unrelated products. For instance, a customer might see a diaper ad alongside cologne, keeping the intentions hidden.

00:07:06.189 This was great because as long as the pregnant woman felt she hadn't been spied on, it worked.

00:07:13.360 A similar situation involved Shutterfly, which sent out congratulatory emails to new parents, urging them to send thank you notes for their birth announcements. Not everyone who received these emails had just had a baby, and while this message was amusing to some, it caused discomfort for others.

00:07:30.120 One such individual who received this email facetiously replied, thanking Shutterfly for acknowledging her 'bundle of joy' was instead coping with infertility—she had lost a baby in November, whose due date was that week, which felt like hitting a wall all over again. Shutterfly responded that the intent was to target customers who had recently given birth, which was hardly an apology or explanation.

00:08:05.940 A few months ago, Mark Zuckerberg announced that he and his wife were expecting but also shared his personal experience with multiple miscarriages. He spoke about how they had begun imagining the lives of their future children and their dreams, only to find those hopes dashed. It’s a lonely experience.

00:08:31.880 Facebook's 'Year in Review' feature, which had been around for a few years, was primarily a self-selecting option allowing users to curate memories. However, last year, Facebook algorithmically filled users' newsfeeds with the best moments from the past year, neglecting to consider that life circumstances change. Joyous memories can become painful reminders.

00:09:04.790 Eric Meyer coined the term "inadvertent algorithmic cruelty" to describe situations where code works well in most cases but fails to consider edge cases. Eric experienced this firsthand when his year-in-review featured memories of his deceased daughter, repeatedly rotating through celebratory backgrounds, making it feel like a perpetual celebration of her death.

00:09:34.299 Eric calls on us to increase our awareness of failure modes, edge cases, and worst-case scenarios. I hope to further that awareness here today and encourage you to carry it forward to others. With this in mind, my first recommendation is to be humble. We cannot intuitively understand emotions or personal subjectivity—at least not yet.

00:10:07.060 Eric's blog post garnered significant attention within the industry and mainstream media, raising questions about how to avoid blindsiding someone with painful memories. Facebook took these concerns to heart and addressed them three months later by introducing a feature called 'On This Day', which provides daily reminders of trivial past events.

00:10:39.600 However, the unintended implications of this feature can still cause distress. It regularly reminds users of joyous moments that may later become painful reminders. For example, 'On This Day' might highlight memories from high school, which can be triggering for individuals who had negative experiences during that time. We, as programmers, must learn from our mistakes.

00:11:06.890 We need to learn from our own experiences and others' experiences, recognizing that harmful and harmless consequences are not equivalent. Fitbit faced a challenge early on with its public sex tracker, which illustrated the need for appropriate data privacy.

00:11:52.690 This example touches on the generic definition of algorithms mentioned earlier—step by step toward a predictable outcome. Fitbit's algorithm treated all data the same, leading to public exposure without warning. Not all data is equally sensitive, and we cannot simply spill everything publicly.

00:12:26.560 When Uber began using an internal operation tool called 'God View,' access was not limited to admins and was abused by some employees who tracked passengers for non-operational purposes. This was a significant lapse, showcasing an abuse of algorithmic tools rather than mere negligence.

00:12:55.600 In another instance, the research team at OkCupid blogged about trends they observed from aggregate data—aiming to improve user experience. However, Uber exploited customer data for non-essential purposes, reinforcing bias instead.

00:13:31.799 Google AdWords also experienced bias; a Harvard study found that ads for individuals with names commonly associated with African Americans were 25% more likely to imply that they had an arrest record. This example highlights the reinforcement of collective biases through algorithms.

00:14:17.870 Joanne McNeil referred to accidental algorithmic run-ins, which can occur when recommendation systems classify individuals too similarly. This results in scenarios that are often challenging to manage.

00:14:52.150 If you're stalked by someone, the recommendation algorithm may connect you more closely to that individual, increasing your exposure to reminders of your past trauma. This emphasizes the need for control over algorithmic recommendations.

00:15:40.090 Facial recognition technology has faced scrutiny for misunderstanding context and intuiting data inaccurately. Products like iPhoto and Microsoft's age estimator use deep learning to analyze images, often leading to false positives.

00:16:14.900 Not all instances of algorithmic failure are innocent. For example, Flickr mistakenly tagged a picture of the Dachau concentration camp as a children's play area and incorrectly tagged individuals. This demonstrates the issues that arise when human understanding is overly minimized in algorithmic interpretations.

00:17:00.200 The same pattern occurred with Google Photos, which infamously mislabeled individuals and revealed a legacy issue rooted in the calibration of imaging technology against a narrow representation of society.

00:17:41.030 In summary, the problems we face are not unique to any one technology or company; we are susceptible to the same pitfalls, needing to remain vigilant about bias and discrimination.

00:18:17.210 Consumer lending firm called a firm utilizes technology that could further entrench privilege by evaluating applicants based on their social media presence and behavioral factors, introducing built-in biases against those without traditional advantages.

00:19:01.110 They gather data with the intent of predicting creditworthiness, yet factors like overall typing speed during terms of service can lead to misjudgments based on unrelated influences. This is a challenging problem that must be addressed to ensure equitable outcomes in lending.

00:19:43.440 Moreover, utilizing social media relationships as indicators of creditworthiness suggests an alarming turn; the potential for deeply invasive algorithms lurking beneath a friendly facade is concerning. Facebook’s efforts to model friendship networks for lending criteria poses grave ethical questions.

00:20:27.220 The CEO’s insistence on avoiding human assumptions adds to this layering of bias that requires scrutiny. Data itself is never objective; every action taken based on the data collected must consider the context and interpretation.

00:21:06.030 It’s crucial to audit outcomes and maintain transparency, employing techniques similar to those used in discrimination audits in housing and employment to identify hidden biases in algorithms.

00:21:48.509 By using identical applications to assess diverse factors—such as race, gender or income—we can identify discriminatory practices without having to interrogate every underlying algorithm. Such measures cultivate accountability and ensure ethical scrutiny.

00:22:35.190 Additionally, we must hold ourselves accountable to data and algorithmic transparency. While proprietary technology may feel like a competitive advantage, increasing shares insight and better products in the long run.

00:23:19.890 The commitment to transparency creates trust and understanding, resulting in superior product outcomes. As Amy Hoy pointed out, if one’s product leads to significant consequences for users, the urgency to act responsibly becomes essential.

00:24:00.340 In conclusion, creating algorithms responsibly requires critical thinking, shared ownership over project outcomes, and a decision to prioritize users’ perspectives. If we are going to write code that infers people’s intimate lives, we must advocate on their behalf.

00:24:45.970 This means resisting the implementation of algorithms that impose consequences without enthusiastic consent from the individuals affected. We must refuse to repeat the mistakes of the past by critically examining the power of our code.

00:25:29.440 In summary, it is our duty as coders to ensure that our work serves the best interests of people—designing algorithms that always keep the implications for users front and center, while remaining faithful to the ethical considerations we uphold in our practice.