Augmenting Human Decision Making with Data Science

00:00:08.630 I think we're going to get started. The music has kind of turned down, so I think that's my cue.

00:00:14.780 Today, I'm going to be talking about augmenting human decision-making with data science. My name is Kelsey Pedersen.

00:00:26.270 If you want to get in touch with me, my Twitter handle is @Kelsey_Pedersen. I actually have a confession to make—I rarely use Twitter. I think I signed up for it about eight years ago, but I'm trying to get back into it.

00:00:38.930 So feel free to tweet at me or share this presentation; I'd love to get in contact with you all.

00:00:51.860 I'm a software engineer at Stitch Fix, a personalized styling service for both men and women.

00:01:00.140 Out of curiosity, how many of you here today have heard of Stitch Fix? Whoa, that's awesome! And how many of you have used Stitch Fix yourself?

00:01:09.380 Okay, sweet! For those of you who haven't used Stitch Fix, here’s how it works.

00:01:15.439 You go to stitchfix.com, fill out your style profile by answering questions about your size, fit, style, and price preferences.

00:01:21.649 Then, you are matched with a personalized stylist. That stylist works within our internal software to see information about the client and is served potential inventory they can choose from.

00:01:34.069 I specifically stated that I am a software engineer at Stitch Fix, and I work on the styling engineering team.

00:01:38.930 I build and maintain the software that stylists use. When stylists make decisions about inventory, we have algorithms to help guide those decisions.

00:01:51.319 The stylist then hand-selects five items for each client; that box of items is then shipped directly to your door. You can try on these items in the comfort of your own home, keep what you want, and return the rest.

00:02:12.210 Over the course of this talk, while we're discussing augmenting decisions, I'll break it down into three sections.

00:02:29.190 First, we’ll talk about how humans make decisions; second, the limits of human decision-making; and third, ways to help users make decisions within our software.

00:02:35.400 The first question is: how do humans make decisions? The prominent psychological theory on human decision-making is called dual process theory.

00:02:50.370 This was popularized by Daniel Kahneman's book, "Thinking, Fast and Slow," which came out in 2011 and has been at the top of the charts ever since.

00:03:04.740 Dual process theory breaks down human decision-making into a two-system approach. System one is fast and automatic, like the hair shown in this photo.

00:03:12.269 In contrast, system two is more slow and effortful, like the tortoise. System one fuels our impressions and feelings—it’s fast, automatic, and intuitive.

00:03:22.140 It is driven by associative memory and continuously creates an impression of what's going on around us.

00:03:29.340 For example, you can see a man on the screen and automatically assess his mood as happy, a woman as sad, or a dog looking excited playing catch with its owner.

00:03:47.580 We didn't intend to assess the moods of these images; it just happened—that's what system one does—it effortlessly jumps to conclusions, judgments, and decisions.

00:04:02.220 What I found surprising while working on this talk is that 95 percent of human decisions are made instantaneously within system one.

00:04:08.940 System one automatically generates our intuitive reactions and instantaneous decisions that govern the majority of our lives.

00:04:26.070 In contrast, system two involves more effortful and deliberate thinking.

00:04:31.140 It's used for complex math problems, exercising self-control, or performing physically demanding tasks. Let's try an exercise to get comfortable with what system two feels like.

00:04:44.820 You can see this multiplication problem on the board. You know you could solve it with pen and paper. You also probably know that the answer is not zero, as it isn't being multiplied by zero.

00:05:05.640 An answer of ten million would also be somewhat implausible, but the precise solution didn't automatically come to mind.

00:05:12.600 When interacting with system two, we sometimes feel a physical response—maybe our heart races, our pupils dilate, or our stomach tenses up a little bit because we aren't automatically able to reach a conclusion.

00:05:32.040 In the example of carrying out a computation, we found that it demanded mental energy, requiring us to keep track of where we were going. Perhaps you even gave up and pulled out your iPhone.

00:05:50.370 So, this framework will help us think about human decision-making. What we've learned so far is that system one can be somewhat unreliable, while computations of our brains can sometimes be limited.

00:06:09.060 In discussing system one and system two, it’s also important to note that they've been talked about in the context of left brain versus right brain.

00:06:20.820 The left brain is considered more logical and analytical, while the right brain is more driven by feeling and intuition. Now that we understand the ways in which we make decisions, let's dive into the limits of decision-making.

00:06:31.660 I think it’s important to frame this conversation as a partnership—a chance for data science to help. The first opportunity to augment decisions is that human decisions are unpredictable.

00:06:49.740 Human decisions are highly dependent on environment and mood, especially within system one. Our environment has a substantial influence on our thoughts and feelings.

00:07:08.380 Studies have shown that given the same set of information, we often make different judgments. For example, radiologists are in charge of examining X-rays to determine whether they are normal or abnormal.

00:07:22.720 Studies reveal that when radiologists are given the same X-ray twice, they contradict themselves 20% of the time.

00:07:34.300 We’ve also seen this within the legal system. Studies show that if you’re being sentenced for a crime—hopefully not—you should hope you’re being sentenced right after lunch.

00:07:47.440 Judges have been shown to be more lenient after eating, showing how our environment and physical states can influence decision-making.

00:08:04.730 The second aspect is that human decisions are driven by individual past experiences. Our decisions are based on those personal experiences, which can sometimes limit or alter our decision-making process.

00:08:24.730 Additionally, we are unable to store large data sets in our minds. Storing even one Google spreadsheet's worth of data in our brains is impossible for most people.

00:08:38.530 This limitation leads us to make decisions based on limited information. The third aspect is that human decisions are driven by personal views and preferences.

00:08:57.850 Since most decisions are made quickly and effortlessly and often outside of our awareness, even if we know we have biases in our decisions, they don't always go away.

00:09:17.620 Studies have shown there are nearly 200 known cognitive biases that distort our thinking and actions. For instance, anchoring bias is the tendency to weigh past reference or one piece of information too heavily while making a decision.

00:09:34.950 Another example is optimism bias, which may cause someone to believe they are at lesser risk of experiencing a negative event compared to others.

00:09:48.390 What we learn is that human judgments are often made with limited knowledge, are biased and inconsistent, which makes them prone to being risky and unreliable.

00:10:09.900 Stylists are humans too. We see this within our styling organization, examining the ways stylists make decisions and how those decisions can be prolonged or risky.

00:10:23.880 First, inconsistent judgments. If stylists see the same set of inventory twice, it's likely they will choose a different assortment each time.

00:10:29.860 We can observe the first assortment, but it could easily be the second assortment as well. We can't expect stylists to consistently make the same decision over time.

00:10:42.040 Stylist find it challenging to absorb a lot of information at once. Since they are limited to what they know, when expected to style a client, it takes time to gather context about the person they’re styling.

00:10:55.500 This process is time-consuming and requires significant mental energy. Additionally, stylists only know the outcomes of the clients they style.

00:11:14.829 If they relied purely on gut feelings and not on data science, there would be a whole wealth of data they wouldn't have access to that could support outcome predictions.

00:11:36.310 Stylists can also be biased by their own experiences and preferences. While we train stylists to understand their clients, these biases don't always disappear.

00:11:50.319 Fortunately, data science can help make these decisions less risky and more predictable over time.

00:12:08.350 In what ways can data science enhance human decisions?

00:12:14.709 First, let's clarify what data science means. There’s much debate within the academic community regarding data science—what it is, how we define it, and how it differs from traditional data analytics.

00:12:27.819 For the purposes of this talk, I define data science as the use of mathematics or statistics to answer a business question.

00:12:39.039 It differs from data analysis because it's not just about analytics; it also involves the collection, modeling, and training of large data sets.

00:12:52.919 We can guide decisions through computations and train decisions with feedback. We guide decisions by offloading part of the decision-making process to data science algorithms.

00:13:09.850 Algorithms can help suggest items of clothing to our stylists. Second, we can train decisions with feedback, either in the moment or after the fact.

00:13:20.930 At Stitch Fix, we utilize data science at all levels of our styling process. We use data science to suggest an individual item of clothing and also make significant business decisions.

00:13:32.600 Before the styling session even starts, the stylists are matched with clients based on predicting the likelihood that the stylist will satisfy the client.

00:13:45.520 First, data science helps stylists select each clothing item. Our data science team automatically filters inventory based on client preferences.

00:14:03.860 For instance, if a client indicates they do not want jeans, we automatically filter that out, which means stylists don’t have to make that initial decision.

00:14:16.790 We calculate match scores for each clothing item compared to the client's preferences. The higher the score, the greater the likelihood the client will like the items.

00:14:30.680 We also regulate the number of items the stylist sees, only showing the top percentage to minimize feelings of being overwhelmed.

00:14:46.640 Because algorithms are better at predicting future events than humans, they can better identify and weigh predictors of success.

00:15:01.149 However, it’s important to note that stylists ultimately have the power to make the final decision and can override any recommendations made by the algorithms.

00:15:20.079 The second layer involves data science assisting stylists by guiding them on expected outcomes for all five items combined.

00:15:37.259 After stylists select the items for the fix, we calculate the likelihood that the client will like all those items together in real-time. If it's above a certain threshold, everything is fine; if below, a warning is shown.

00:15:55.220 Again, stylists have the final say and can choose to override our algorithms.

00:16:10.759 The third layer is that client feedback helps train stylists over time. Once clients receive their items, they fill out feedback on each item and the overall fix.

00:16:30.520 Stylists have access to this feedback at any time, and they are expected to spend an hour a week reviewing it. This information is invaluable for improving their future decisions.

00:16:47.710 The fourth layer allows us to train stylists with feedback over time or over certain periods. Information regarding their performance is stored in the app's stats section.

00:17:07.370 Each stylist can see related stats for fit, style, price, etc. If any of these metrics are too low, they have visibility into that, enabling them to alter their decision-making process.

00:17:20.500 Ultimately, we aim to use feedback to hone expert intuition. Malcolm Gladwell made the concept of 10,000 hours famous, referring to the need for prolonged practice to build intuition.

00:17:37.800 With assistance from our styling leads, each stylist has a manager responsible for helping train and coach them.

00:17:57.290 The final layer is that we have insight into feedback for all 3300 other stylists across the organization. We utilize these performance metrics to shape overall business decisions.

00:18:14.060 One way this information drives decisions is through stylish training. Regular training occurs every few months, tailored to stylist segments needing improvement.

00:18:29.920 The second application of this feedback is to inform inventory decisions. If we're constantly receiving feedback about the quality of items, we may need to reassess our inventory levels.

00:18:43.240 Now that we've discussed how algorithms can augment human decisions, it’s worthwhile to consider how humans can augment algorithms.

00:19:01.700 Today, although machines provide value in guiding and training stylists, they still lack essential human experience. For instance, computers are not great at interpreting the need for a stylish shirt for a club.

00:19:19.040 While machines can suggest multiple options, specificity can sometimes be an issue. Furthermore, machines lack ethical standards.

00:19:36.500 Recently, reports showed that Facebook is hiring 10,000 people to monitor content to ensure ethical standards are met.

00:19:56.890 This highlights the necessity for human oversight in assessing ethicality and ensuring the quality of algorithms.

00:20:11.270 There's also room for improvement with modeling and training data, as no training set is perfect. We need humans to aid in evolving business needs for algorithms.

00:20:28.380 Stylists maintain veto power within our system, enjoying creative liberty to act on their intuition. They can override computer recommendations when they feel strongly about a decision.

00:20:47.370 For example, if a stylist sees a low match score but believes a pair of green shorts meets the client's needs, they can add it to the client's box.

00:21:01.490 What happens is that when stylists override algorithms and their intuition doesn't match the algorithm's prediction, we learn from that experience.

00:21:22.960 This creates a feedback loop to the data science team, allowing us to improve our algorithms over time.

00:21:35.960 So now let’s discuss the future of data science and what that looks like.

00:21:52.580 Hollywood often glamorizes the idea that we're doomed by AI and that data science will take over the world. However, I don't subscribe to that viewpoint.

00:22:08.600 We tend to think about left brain versus right brain, or systems one versus two, but this dichotomy does not align with the reality.

00:22:21.610 We typically view the debate as humans versus computers, but I believe this is a partnership between data science and humans.

00:22:35.440 Separately, human and data science have their limits, but the true power lies in forming this feedback loop between both domains.

00:22:54.600 We are navigating towards what I call system three—a combination of predictive algorithms and expert intuition.

00:23:04.270 This relationship is mutually beneficial, enabling more informed and nuanced decision-making. We can utilize feedback from our algorithms to enhance expert intuition.

00:23:16.120 Moreover, we continually train algorithms based on the decisions made that fall outside our expected outcomes.

00:23:31.160 It’s essential to note that poorly trained algorithms can be just as damaging as relying solely on human decisions.

00:23:46.470 A partnership is vital to reach levels of performance that are unattainable by data or humans alone.

00:24:00.810 Earlier, I presented an image showcasing data science influencing stylists. When stylists override algorithms in favor of their intuition, they effectively train the model.

00:24:16.390 This becomes a fascinating feedback loop between data science and human intuition.

00:24:32.170 So, what does the future of data science look like?

00:24:49.350 I firmly believe that the partnership between humans and data science is fundamental.

00:25:06.210 Through this collaboration, we can hone human intuition alongside algorithms to ensure a more reliable process.

00:25:23.960 Ultimately, we aim to create a stronger synergy, becoming greater than the sum of our individual parts.

00:25:38.060 Thank you.

00:25:54.840 I also have a shameless plug— I work, as you now know, for Stitch Fix. We are hiring!

00:26:12.320 So if any of this piqued your interest or you're interested in working with an amazing group of people, please find me after—I’d love to chat.

00:26:23.800 That brings us to the Q&A section. Does anyone have any questions?

00:27:01.760 Yes?

00:27:06.350 Ah, oh, that's interesting.

00:27:10.000 The question was: when there are issues with system one, like voting against your best interests, are there ways for that to be alleviated with system three?

00:27:21.090 I think it’s interesting to consider system three beyond just technology and data science.

00:27:27.150 Can you come up after? I’d be happy to think about that further.

00:27:34.110 Yes?

00:27:39.180 So the question was: how do we handle comments from our customers?

00:27:44.130 Well, we handle it in a few ways. Are you asking how we interpret that or how we react when customers are dissatisfied?

00:27:57.290 When a client is unsatisfied with the items we send, we have an algorithm to automatically escalate such issues to our customer support team.

00:28:06.790 Then, a customer support representative can take care of it. The relationship we build with our clients is long-term.

00:28:20.090 If clients express dissatisfaction, it's important for stylists to address that directly. Also, stylists write a note to the clients each time they ship a fix.

00:28:34.680 This note is a great opportunity to address past concerns or issues.

00:28:42.560 Great questions! Yes?

00:28:54.550 Yep.

00:29:01.880 So the question was: what's the balance between creativity and human touch versus algorithms and more stoic behavior?

00:29:06.600 Ultimately, that's part of the question we continually explore. There’s considerable discussion about solely relying on data science.

00:29:21.370 At Stitch Fix, our priority is putting the client first. We evaluate whether a decision will negatively impact their experience.

00:29:36.070 If it would, we likely reconsider making that decision.

00:29:45.230 Yes, I believe that balance is key to ensuring our clients feel understood and valued.

00:29:51.260 And the person in the gray shirt?

00:29:57.620 Your question was: what data points go into our algorithm?

00:30:04.460 I'm not entirely sure I can answer that, but come talk to me afterward. I don't want to share any specifics that my PR team may not approve.

00:30:20.250 This is your question: how do you overcome the hurdle of not having enough data to feed the algorithm?

00:30:30.220 That's a challenging issue in data science. Stitch Fix introduced recommendations early in our journey, allowing algorithms to grow organically.

00:30:46.560 Once a company has gained some traction, it’s vital to utilize human feedback until sufficient data is accumulated for algorithms.

00:31:01.840 It's all about balancing human intuition with algorithmic processes.

00:31:08.790 Yes, your question relates to how human biases impact algorithmic predictions.

00:31:23.630 That's a fundamental challenge for data science: how to ensure unbiased datasets.

00:31:36.570 While I don’t have all the answers, I believe that as we refine our processes, we can identify biases in data.

00:31:45.890 You can then adjust your models to account for those biases over time.

00:31:59.310 Yes.

00:32:16.050 So your question was: do we have data to measure biases?

00:32:28.280 Absolutely. However, I’m unsure of the specific technical details, but feel free to approach me afterward!

00:32:45.180 I think your question ties into how data science captures predictive cues. Algorithms need to represent varying attributes important to different users.

00:32:59.700 We can make connections based on the success of similar items for different clients, which informs how we calculate match scores.

00:33:13.590 So I believe you had the last question?

00:33:23.890 For stylists who frequently override algorithms but aren't successful, is there a mechanism to guard against this?

00:33:35.000 Currently, there isn't an explicit guard against that, but it's definitely something we could rethink for the future.

00:33:41.180 Thank you!