ArrrrCamp 2015

Summarized using AI

Consequences of an Insightful Algorithm

Carina C. Zona • October 01, 2015 • Ghent, Belgium

In the talk "Consequences of an Insightful Algorithm," Carina C. Zona discusses the ethical responsibilities that come with the creation of algorithms, particularly how these algorithms can lead to unintended and often harmful consequences for individuals. Zona emphasizes the importance of consent and the ethical implications of extracting personal insights from data that users have willingly shared.

Key points in the talk include:

- Definition and Understanding of Algorithms: Zona begins by illustrating what algorithms are—step-by-step procedures that arrive at outcomes. She presents deep learning as a contemporary approach in this space, explaining how it extracts insights from large datasets and its application in various fields, including behavioral predictions and image classification.

- Unintended Consequences: She provides three vivid case studies, such as Target’s pregnancy prediction algorithm, which accidentally revealed a teenager’s pregnancy to her father, demonstrating how insightful algorithms can lead to painful scenarios. Another instance discussed is Facebook's year-in-review feature becoming painful for a father grieving his daughter, highlighting what Zona refers to as “inadvertent algorithmic cruelty.”

- The Importance of Context and Sensitivity: The necessity for developers to be sensitive to the diverse emotions and contexts of users is emphasized. For instance, Shutterfly's congratulatory messages to users who may be experiencing infertility or loss exemplify insensitivity. Zona urges programmers to adopt humility and consider the emotional ramifications of their coding decisions.

- Bias and Discrimination: Zona discusses how algorithms can reinforce systemic biases within society, exemplifying this through the disparities observed in Google AdWords and racial profiling results. She discusses how data collection and interpretation can reflect existing prejudices if not critically examined.

- Ethical Guidelines and Practices: The talk concludes with recommendations for developers, emphasizing the need for audits on algorithms for bias, the importance of diverse teams to avoid echo chambers, and advocating for transparency in data collection and decision-making processes. Zona calls for a paradigm shift towards integrating empathy into software design, asking developers to be proactive in questioning the ethics of their code.

In conclusion, Zona highlights that as technologists, it is crucial to approach coding responsibilities seriously, ensuring that algorithms serve humanity thoughtfully. The overarching takeaway is that technology professionals must strive to create systems that are conscious of their users' rights and intricacies of human emotion, fostering systematic empathy in coding practices.

Consequences of an Insightful Algorithm
Carina C. Zona • October 01, 2015 • Ghent, Belgium

We have ethical responsibilities when coding. We're able to extract remarkably precise intuitions about an individual. But do we have a right to know what they didn't consent to share, even when they willingly shared the data that leads us there? A major retailer's data-driven marketing accidentially revealed to a teen's family that she was pregnant. Eek. What are our obligations to people who did not expect themselves to be so intimately known without sharing directly? How do we mitigate against unintended outcomes? For instance, a social network algorithm accidentally triggering painful memories for families grieving their child's death. We design software for humans. Balancing human needs and business specs can be tough. It's crucial that we learn how to build in systematic empathy. In this talk, we'll delve into specific examples of uncritical programming, and painful results from using insightful data in ways that were benignly intended. You'll learn ways we can integrate practices for examining how our code might harm individuals. We'll look at how to flip the paradigm, netting consequences that can be better for everyone.

Help us caption & translate this video!

http://amara.org/v/H4Oc/

ArrrrCamp 2015

00:00:08.150 This talk is a toolkit for ethical coding, and we'll delve into some specific examples of what that means and uncritical programming that can lead to painful results from actions that were intended to be benign.
00:00:14.190 I want to start with a content warning because I will be discussing some sensitive topics, including grief, post-traumatic stress disorder, depression, miscarriage, infertility, racial profiling, Holocaust surveillance, sexual history, and consent. None of these are the main topics, but I will touch on them. If this is something you're uncomfortable with, you have about 5 to 10 minutes before these topics come up, so you have a little time to prepare.
00:00:31.910 Algorithms impose consequences on people all the time. We can extract remarkably precise insights about individuals, but the real question is: do we have a right to know what they haven't consented to share, even when they willingly shared the data that leads us there? We also need to address how we can mitigate against unintended consequences.
00:00:53.120 To understand algorithms at a basic level, it’s just a step-by-step set of operations for predictably arriving at an outcome. We usually view this through the lens of computer science or mathematics, represented by patterns of instructions in code or formulas. However, algorithms can also be seen in everyday life, such as in recipes, maps, and crochet patterns.
00:01:26.040 Deep learning is currently a hot topic in machine learning. Essentially, it consists of algorithms for fast, trainable artificial neural networks. This technology has existed since the 1980s but has mostly been theoretical, locked in academia until more recent advances in the last couple of years. Now, deep learning can realistically extract insights from vast amounts of data in production.
00:01:39.720 It is a specific approach to building and training artificial neural networks, which can be thought of as decision-making black boxes. Inputs are arrays of numbers representing various things—objects, words, or even more abstract concepts—and functions run repeatedly on these arrays refine the analysis. This process generates outputs predicting properties useful for drawing intuitions from future datasets, as long as they resemble the training data.
00:02:03.700 This technology drives advances in various domains, including data analysis, data visualization, and natural language processing (NLP). It is even applied in self-driving cars. Today, we will examine practical applications of deep learning like behavioral prediction, image classification, face recognition, and sentiment analysis.
00:02:30.790 If you're intrigued, I recommend trying out a little experimentation with deep learning in the browser using tools like TensorFlow.js. While you won't experience the speed benefits, it’s an opportunity to experiment with different models. Various frameworks and libraries are available; however, Ruby is one of the few exceptions without robust support in this area. I hope that if this interests you, one of you might initiate the use of these technologies.
00:03:36.060 Deep learning depends on artificial neural nets for the automated discovery of patterns within training data, applying those patterns to make predictions about future inputs. To understand this better, let’s look at a concrete example.
00:03:54.670 This is MARV, an AI that teaches itself to play Super Mario World. It starts with no understanding of its world, the rules, or even gaming itself—all it does is manipulate numbers and observe the outcomes of its actions. Through continuous self-training over a 24-hour period, it identifies patterns and starts predicting insights about the game. By the end of the training, it can play the game effectively.
00:04:27.080 Now let’s play a game. It resembles a very unusual bingo game, but I want to demonstrate that insightful algorithms are riddled with pitfalls. By examining case studies, we can explore some examples that arise from these situations.
00:04:41.470 In the retail sector, the second trimester of pregnancy is sometimes referred to as the 'Holy Grail' for marketing. This is because women start to change their purchasing habits significantly during this time. Brand loyalty and store loyalty can be renegotiated, and retailers have an opportunity to influence purchasing decisions leading not just to immediate sales but potentially to lifelong consumer habits.
00:05:38.129 Target, a U.S. department store chain, developed a predictive algorithm that could reliably detect when a customer was in her second trimester based solely on her purchasing habits. This capability was powerful, as most retailers wouldn't typically become aware of this until the third trimester. However, one day, a father entered the store, furious about coupons his daughter received for pregnancy-related products.
00:06:08.010 He yelled at the manager, accusing the store of pushing his teenage daughter into getting pregnant. The manager, though not responsible for this, apologized and expressed regret for the misunderstanding. The next day, the father came back to apologize because he had a conversation with his daughter and discovered there were aspects of her life she had not yet shared with him.
00:06:40.900 This situation illustrates how Target’s data-driven marketing could create unintended consequences, putting individuals in deeply uncomfortable situations. In response to customer backlash, Target decided to modify their advertising strategy. They began placing pregnancy-related ads next to unrelated items like lawn mowers and aftershave, which created the illusion that these ads were coincidental rather than targeting.
00:07:07.800 The conclusion drawn was that as long as pregnant women believed their privacy was intact, the marketing strategies would be effective. Another example is Shutterfly, a photo processing service, which sent congratulatory messages to customers about their 'new bundle of joy.' This sparked ridicule and disappointment as some recipients, including men and those who had lost babies, did not fit the expected demographic.
00:07:50.840 Shutterfly later clarified that the intent of the email was to target customers who had recently welcomed a new baby. However, the false positives had a real emotional impact on people, showcasing the dangers of data misuse.
00:08:14.180 In another instance, Mark Zuckerberg announced he was going to become a father while sharing the emotional struggles of dealing with miscarriages. He explained the joy and hope that accompanies learning about a new child, followed by the deep emotional pain when those dreams vanish. Facebook’s 'Year in Review' feature has also drawn criticism due to its failure to account for users' emotional changes over time.
00:08:59.990 Eric Meyer, who experienced the loss of his daughter, highlighted this issue when he described how the photos of his daughter would appear in his feed in celebratory formats, as if celebrating her death. He advocates for raising awareness of the failures and edge cases associated with algorithmic outcomes.
00:09:35.110 Meyer’s recommendation is simple: be humble in our approach to coding. We cannot predict individual emotional struggles nor private subjects enough yet. For example, early Fitbit devices included a public sex tracker, which led users to unknowingly share personal information that was not intended to be public.
00:10:13.190 People were often unaware of the public nature of this data, leading to unexpected and potentially embarrassing outcomes. This is an example of engineers not considering the different implications of data collection and sharing. While people might be willing to share fitness data, such as steps taken or calories consumed, that does not extend to more private matters.
00:10:51.160 Similarly, Uber's internal monitoring tools allowed operations to track cars and passengers, including a feature known as 'God View.' This tool did not restrict access only to operational personnel, which meant that employees could freely monitor any passenger’s movements in real-time, leading to potential abuse and harassment.
00:11:44.070 Concerns grew when it became evident that even job applicants could access those private records. This abuse of power reflects a fundamental misunderstanding of the implications of such technology, especially when it’s leveraged for non-operational purposes, like celebrity stalking.
00:12:31.010 In another case, the dating site OkCupid often shared findings from aggregate trend data that could help users better navigate the platform. In contrast, Uber’s data analytics focused more on user behavior that was intrusive, without providing any meaningful context, which only served to invade users' privacy.
00:13:07.060 These kinds of approaches point to not just breaches of privacy but also inherent biases in how data is approached. For example, a study revealed that when using different names in Google AdWords, black-identifying names were 25% more likely to return ads implying criminal records, highlighting how algorithms can reproduce and reflect biases.
00:13:51.290 Data is generated by people, which means it is not objective; it is shaped by our limited perspectives and assumptions. Algorithms can replicate our flaws and preconceptions, particularly in image recognition processes. Platforms like Flickr and Google Photos have demonstrated how deep learning models can misidentify or categorize images in harmful ways.
00:14:45.230 Tragic instances, like Flickr mislabeling images of individuals with racial biases or Google Photos misclassifying individuals of color, reflect the consequences of failing to account for diversity in data input. The history of photography and film stock development further illustrates how biases have been built into technology from the ground up.
00:15:29.640 When Kodak created film stock, the technology emphasized detail in white skin while neglecting representation of darker skin tones. This bias has persisted, with contemporary algorithms continuing to reflect these outdated perspectives, resulting in persistent challenges in accurately representing diverse populations.
00:16:16.980 A company called Afirm uses a very narrow assessment for determining creditworthiness based on minimal factors like name and email. Afirm's algorithm fails to acknowledge the complexities of personal circumstances and behavior, effectively perpetuating privilege across societal lines.
00:17:10.050 For instance, it can treat anyone who is easily distracted as a credit risk. Moreover, only around 2% of open-source contributors are women, indicating that underlying biases persist in wider industry practices. Afirm's algorithm utilizes over 70,000 factors, raising concerns about how many of these could result in discrimination.
00:18:12.000 In 2012, a German credit rating agency considered using Facebook relationships to assess applicants, while Facebook has since defended a patent promoting credit decisions based on the unrelated credit history of a person’s Facebook friends. Such approaches ignore the differences between real-life friendships and online connections, further reinforcing bias in lending.
00:19:25.290 Algorithmic biases are often hard to detect, only revealing their implications to those capable of accessing the 'black box' of decision-making. This makes it essential for us to advocate for fairness and oversight as financial institutions and regulators begin to recognize the challenges these models pose.
00:20:07.680 Consequently, there’s a collective responsibility to ensure diverse representation within decision-making teams. The tech industry's obsession with 'culture fit' must not overshadow the need for genuine diversity, inclusive of an array of experiences and backgrounds, to effectively address algorithmic shortcomings.
00:21:04.120 We also need to prioritize informed consent, ensuring that users are actively allowed to opt into data usage rather than passively being included by default. Emphasizing the importance of obtaining explicit consent as the default allows users to engage on their terms.
00:21:52.360 We must constantly audit outcomes and consider potential biases, regularly testing our systems to ensure fairness. For example, this could mean sending in identical resumes differing only in names that suggest race or gender. Consistent outputs would indicate impartiality, while disparities would highlight potential bias.
00:23:00.710 Transparency in data utilization and algorithmic processes is essential, as is committing to taking a holistic perspective on the systems we construct. This kind of transparency can yield legitimacy, accurately reflecting the best practices in our industry.
00:23:54.920 As Amy Howie poignantly states, if your product profoundly impacts people's lives, you either need to genuinely care about that or step away from such technologies. We are not simply 'code monkeys'; we are professionals with responsibility to advocate against using individuals’ data without their informed consent.
00:24:38.790 In conclusion, we must focus on the ethical implications of our work, affirming the value of preserving user privacy. We bear the responsibility of avoiding unauthorized consequences in individuals' lives, and we should do so collectively. We must refuse to operate in isolation.
00:25:23.920 Thank you.
Explore all talks recorded at ArrrrCamp 2015
+12