00:00:08.630
I think we're going to get started. The music has kind of turned down, so I think that's my cue.
00:00:14.780
Today, I'm going to be talking about augmenting human decision-making with data science. My name is Kelsey Pedersen.
00:00:26.270
If you want to get in touch with me, my Twitter handle is @Kelsey_Pedersen. I actually have a confession to make—I rarely use Twitter. I think I signed up for it about eight years ago, but I'm trying to get back into it.
00:00:38.930
So feel free to tweet at me or share this presentation; I'd love to get in contact with you all.
00:00:51.860
I'm a software engineer at Stitch Fix, a personalized styling service for both men and women.
00:01:00.140
Out of curiosity, how many of you here today have heard of Stitch Fix? Whoa, that's awesome! And how many of you have used Stitch Fix yourself?
00:01:09.380
Okay, sweet! For those of you who haven't used Stitch Fix, here’s how it works.
00:01:15.439
You go to stitchfix.com, fill out your style profile by answering questions about your size, fit, style, and price preferences.
00:01:21.649
Then, you are matched with a personalized stylist. That stylist works within our internal software to see information about the client and is served potential inventory they can choose from.
00:01:34.069
I specifically stated that I am a software engineer at Stitch Fix, and I work on the styling engineering team.
00:01:38.930
I build and maintain the software that stylists use. When stylists make decisions about inventory, we have algorithms to help guide those decisions.
00:01:51.319
The stylist then hand-selects five items for each client; that box of items is then shipped directly to your door. You can try on these items in the comfort of your own home, keep what you want, and return the rest.
00:02:12.210
Over the course of this talk, while we're discussing augmenting decisions, I'll break it down into three sections.
00:02:29.190
First, we’ll talk about how humans make decisions; second, the limits of human decision-making; and third, ways to help users make decisions within our software.
00:02:35.400
The first question is: how do humans make decisions? The prominent psychological theory on human decision-making is called dual process theory.
00:02:50.370
This was popularized by Daniel Kahneman's book, "Thinking, Fast and Slow," which came out in 2011 and has been at the top of the charts ever since.
00:03:04.740
Dual process theory breaks down human decision-making into a two-system approach. System one is fast and automatic, like the hair shown in this photo.
00:03:12.269
In contrast, system two is more slow and effortful, like the tortoise. System one fuels our impressions and feelings—it’s fast, automatic, and intuitive.
00:03:22.140
It is driven by associative memory and continuously creates an impression of what's going on around us.
00:03:29.340
For example, you can see a man on the screen and automatically assess his mood as happy, a woman as sad, or a dog looking excited playing catch with its owner.
00:03:47.580
We didn't intend to assess the moods of these images; it just happened—that's what system one does—it effortlessly jumps to conclusions, judgments, and decisions.
00:04:02.220
What I found surprising while working on this talk is that 95 percent of human decisions are made instantaneously within system one.
00:04:08.940
System one automatically generates our intuitive reactions and instantaneous decisions that govern the majority of our lives.
00:04:26.070
In contrast, system two involves more effortful and deliberate thinking.
00:04:31.140
It's used for complex math problems, exercising self-control, or performing physically demanding tasks. Let's try an exercise to get comfortable with what system two feels like.
00:04:44.820
You can see this multiplication problem on the board. You know you could solve it with pen and paper. You also probably know that the answer is not zero, as it isn't being multiplied by zero.
00:05:05.640
An answer of ten million would also be somewhat implausible, but the precise solution didn't automatically come to mind.
00:05:12.600
When interacting with system two, we sometimes feel a physical response—maybe our heart races, our pupils dilate, or our stomach tenses up a little bit because we aren't automatically able to reach a conclusion.
00:05:32.040
In the example of carrying out a computation, we found that it demanded mental energy, requiring us to keep track of where we were going. Perhaps you even gave up and pulled out your iPhone.
00:05:50.370
So, this framework will help us think about human decision-making. What we've learned so far is that system one can be somewhat unreliable, while computations of our brains can sometimes be limited.
00:06:09.060
In discussing system one and system two, it’s also important to note that they've been talked about in the context of left brain versus right brain.
00:06:20.820
The left brain is considered more logical and analytical, while the right brain is more driven by feeling and intuition. Now that we understand the ways in which we make decisions, let's dive into the limits of decision-making.
00:06:31.660
I think it’s important to frame this conversation as a partnership—a chance for data science to help. The first opportunity to augment decisions is that human decisions are unpredictable.
00:06:49.740
Human decisions are highly dependent on environment and mood, especially within system one. Our environment has a substantial influence on our thoughts and feelings.
00:07:08.380
Studies have shown that given the same set of information, we often make different judgments. For example, radiologists are in charge of examining X-rays to determine whether they are normal or abnormal.
00:07:22.720
Studies reveal that when radiologists are given the same X-ray twice, they contradict themselves 20% of the time.
00:07:34.300
We’ve also seen this within the legal system. Studies show that if you’re being sentenced for a crime—hopefully not—you should hope you’re being sentenced right after lunch.
00:07:47.440
Judges have been shown to be more lenient after eating, showing how our environment and physical states can influence decision-making.
00:08:04.730
The second aspect is that human decisions are driven by individual past experiences. Our decisions are based on those personal experiences, which can sometimes limit or alter our decision-making process.
00:08:24.730
Additionally, we are unable to store large data sets in our minds. Storing even one Google spreadsheet's worth of data in our brains is impossible for most people.
00:08:38.530
This limitation leads us to make decisions based on limited information. The third aspect is that human decisions are driven by personal views and preferences.
00:08:57.850
Since most decisions are made quickly and effortlessly and often outside of our awareness, even if we know we have biases in our decisions, they don't always go away.
00:09:17.620
Studies have shown there are nearly 200 known cognitive biases that distort our thinking and actions. For instance, anchoring bias is the tendency to weigh past reference or one piece of information too heavily while making a decision.
00:09:34.950
Another example is optimism bias, which may cause someone to believe they are at lesser risk of experiencing a negative event compared to others.
00:09:48.390
What we learn is that human judgments are often made with limited knowledge, are biased and inconsistent, which makes them prone to being risky and unreliable.
00:10:09.900
Stylists are humans too. We see this within our styling organization, examining the ways stylists make decisions and how those decisions can be prolonged or risky.
00:10:23.880
First, inconsistent judgments. If stylists see the same set of inventory twice, it's likely they will choose a different assortment each time.
00:10:29.860
We can observe the first assortment, but it could easily be the second assortment as well. We can't expect stylists to consistently make the same decision over time.
00:10:42.040
Stylist find it challenging to absorb a lot of information at once. Since they are limited to what they know, when expected to style a client, it takes time to gather context about the person they’re styling.
00:10:55.500
This process is time-consuming and requires significant mental energy. Additionally, stylists only know the outcomes of the clients they style.
00:11:14.829
If they relied purely on gut feelings and not on data science, there would be a whole wealth of data they wouldn't have access to that could support outcome predictions.
00:11:36.310
Stylists can also be biased by their own experiences and preferences. While we train stylists to understand their clients, these biases don't always disappear.
00:11:50.319
Fortunately, data science can help make these decisions less risky and more predictable over time.
00:12:08.350
In what ways can data science enhance human decisions?
00:12:14.709
First, let's clarify what data science means. There’s much debate within the academic community regarding data science—what it is, how we define it, and how it differs from traditional data analytics.
00:12:27.819
For the purposes of this talk, I define data science as the use of mathematics or statistics to answer a business question.
00:12:39.039
It differs from data analysis because it's not just about analytics; it also involves the collection, modeling, and training of large data sets.
00:12:52.919
We can guide decisions through computations and train decisions with feedback. We guide decisions by offloading part of the decision-making process to data science algorithms.
00:13:09.850
Algorithms can help suggest items of clothing to our stylists. Second, we can train decisions with feedback, either in the moment or after the fact.
00:13:20.930
At Stitch Fix, we utilize data science at all levels of our styling process. We use data science to suggest an individual item of clothing and also make significant business decisions.
00:13:32.600
Before the styling session even starts, the stylists are matched with clients based on predicting the likelihood that the stylist will satisfy the client.
00:13:45.520
First, data science helps stylists select each clothing item. Our data science team automatically filters inventory based on client preferences.
00:14:03.860
For instance, if a client indicates they do not want jeans, we automatically filter that out, which means stylists don’t have to make that initial decision.
00:14:16.790
We calculate match scores for each clothing item compared to the client's preferences. The higher the score, the greater the likelihood the client will like the items.
00:14:30.680
We also regulate the number of items the stylist sees, only showing the top percentage to minimize feelings of being overwhelmed.
00:14:46.640
Because algorithms are better at predicting future events than humans, they can better identify and weigh predictors of success.
00:15:01.149
However, it’s important to note that stylists ultimately have the power to make the final decision and can override any recommendations made by the algorithms.
00:15:20.079
The second layer involves data science assisting stylists by guiding them on expected outcomes for all five items combined.
00:15:37.259
After stylists select the items for the fix, we calculate the likelihood that the client will like all those items together in real-time. If it's above a certain threshold, everything is fine; if below, a warning is shown.
00:15:55.220
Again, stylists have the final say and can choose to override our algorithms.
00:16:10.759
The third layer is that client feedback helps train stylists over time. Once clients receive their items, they fill out feedback on each item and the overall fix.
00:16:30.520
Stylists have access to this feedback at any time, and they are expected to spend an hour a week reviewing it. This information is invaluable for improving their future decisions.
00:16:47.710
The fourth layer allows us to train stylists with feedback over time or over certain periods. Information regarding their performance is stored in the app's stats section.
00:17:07.370
Each stylist can see related stats for fit, style, price, etc. If any of these metrics are too low, they have visibility into that, enabling them to alter their decision-making process.
00:17:20.500
Ultimately, we aim to use feedback to hone expert intuition. Malcolm Gladwell made the concept of 10,000 hours famous, referring to the need for prolonged practice to build intuition.
00:17:37.800
With assistance from our styling leads, each stylist has a manager responsible for helping train and coach them.
00:17:57.290
The final layer is that we have insight into feedback for all 3300 other stylists across the organization. We utilize these performance metrics to shape overall business decisions.
00:18:14.060
One way this information drives decisions is through stylish training. Regular training occurs every few months, tailored to stylist segments needing improvement.
00:18:29.920
The second application of this feedback is to inform inventory decisions. If we're constantly receiving feedback about the quality of items, we may need to reassess our inventory levels.
00:18:43.240
Now that we've discussed how algorithms can augment human decisions, it’s worthwhile to consider how humans can augment algorithms.
00:19:01.700
Today, although machines provide value in guiding and training stylists, they still lack essential human experience. For instance, computers are not great at interpreting the need for a stylish shirt for a club.
00:19:19.040
While machines can suggest multiple options, specificity can sometimes be an issue. Furthermore, machines lack ethical standards.
00:19:36.500
Recently, reports showed that Facebook is hiring 10,000 people to monitor content to ensure ethical standards are met.
00:19:56.890
This highlights the necessity for human oversight in assessing ethicality and ensuring the quality of algorithms.
00:20:11.270
There's also room for improvement with modeling and training data, as no training set is perfect. We need humans to aid in evolving business needs for algorithms.
00:20:28.380
Stylists maintain veto power within our system, enjoying creative liberty to act on their intuition. They can override computer recommendations when they feel strongly about a decision.
00:20:47.370
For example, if a stylist sees a low match score but believes a pair of green shorts meets the client's needs, they can add it to the client's box.
00:21:01.490
What happens is that when stylists override algorithms and their intuition doesn't match the algorithm's prediction, we learn from that experience.
00:21:22.960
This creates a feedback loop to the data science team, allowing us to improve our algorithms over time.
00:21:35.960
So now let’s discuss the future of data science and what that looks like.
00:21:52.580
Hollywood often glamorizes the idea that we're doomed by AI and that data science will take over the world. However, I don't subscribe to that viewpoint.
00:22:08.600
We tend to think about left brain versus right brain, or systems one versus two, but this dichotomy does not align with the reality.
00:22:21.610
We typically view the debate as humans versus computers, but I believe this is a partnership between data science and humans.
00:22:35.440
Separately, human and data science have their limits, but the true power lies in forming this feedback loop between both domains.
00:22:54.600
We are navigating towards what I call system three—a combination of predictive algorithms and expert intuition.
00:23:04.270
This relationship is mutually beneficial, enabling more informed and nuanced decision-making. We can utilize feedback from our algorithms to enhance expert intuition.
00:23:16.120
Moreover, we continually train algorithms based on the decisions made that fall outside our expected outcomes.
00:23:31.160
It’s essential to note that poorly trained algorithms can be just as damaging as relying solely on human decisions.
00:23:46.470
A partnership is vital to reach levels of performance that are unattainable by data or humans alone.
00:24:00.810
Earlier, I presented an image showcasing data science influencing stylists. When stylists override algorithms in favor of their intuition, they effectively train the model.
00:24:16.390
This becomes a fascinating feedback loop between data science and human intuition.
00:24:32.170
So, what does the future of data science look like?
00:24:49.350
I firmly believe that the partnership between humans and data science is fundamental.
00:25:06.210
Through this collaboration, we can hone human intuition alongside algorithms to ensure a more reliable process.
00:25:23.960
Ultimately, we aim to create a stronger synergy, becoming greater than the sum of our individual parts.
00:25:38.060
Thank you.
00:25:54.840
I also have a shameless plug— I work, as you now know, for Stitch Fix. We are hiring!
00:26:12.320
So if any of this piqued your interest or you're interested in working with an amazing group of people, please find me after—I’d love to chat.
00:26:23.800
That brings us to the Q&A section. Does anyone have any questions?
00:27:01.760
Yes?
00:27:06.350
Ah, oh, that's interesting.
00:27:10.000
The question was: when there are issues with system one, like voting against your best interests, are there ways for that to be alleviated with system three?
00:27:21.090
I think it’s interesting to consider system three beyond just technology and data science.
00:27:27.150
Can you come up after? I’d be happy to think about that further.
00:27:34.110
Yes?
00:27:39.180
So the question was: how do we handle comments from our customers?
00:27:44.130
Well, we handle it in a few ways. Are you asking how we interpret that or how we react when customers are dissatisfied?
00:27:57.290
When a client is unsatisfied with the items we send, we have an algorithm to automatically escalate such issues to our customer support team.
00:28:06.790
Then, a customer support representative can take care of it. The relationship we build with our clients is long-term.
00:28:20.090
If clients express dissatisfaction, it's important for stylists to address that directly. Also, stylists write a note to the clients each time they ship a fix.
00:28:34.680
This note is a great opportunity to address past concerns or issues.
00:28:42.560
Great questions! Yes?
00:28:54.550
Yep.
00:29:01.880
So the question was: what's the balance between creativity and human touch versus algorithms and more stoic behavior?
00:29:06.600
Ultimately, that's part of the question we continually explore. There’s considerable discussion about solely relying on data science.
00:29:21.370
At Stitch Fix, our priority is putting the client first. We evaluate whether a decision will negatively impact their experience.
00:29:36.070
If it would, we likely reconsider making that decision.
00:29:45.230
Yes, I believe that balance is key to ensuring our clients feel understood and valued.
00:29:51.260
And the person in the gray shirt?
00:29:57.620
Your question was: what data points go into our algorithm?
00:30:04.460
I'm not entirely sure I can answer that, but come talk to me afterward. I don't want to share any specifics that my PR team may not approve.
00:30:20.250
This is your question: how do you overcome the hurdle of not having enough data to feed the algorithm?
00:30:30.220
That's a challenging issue in data science. Stitch Fix introduced recommendations early in our journey, allowing algorithms to grow organically.
00:30:46.560
Once a company has gained some traction, it’s vital to utilize human feedback until sufficient data is accumulated for algorithms.
00:31:01.840
It's all about balancing human intuition with algorithmic processes.
00:31:08.790
Yes, your question relates to how human biases impact algorithmic predictions.
00:31:23.630
That's a fundamental challenge for data science: how to ensure unbiased datasets.
00:31:36.570
While I don’t have all the answers, I believe that as we refine our processes, we can identify biases in data.
00:31:45.890
You can then adjust your models to account for those biases over time.
00:31:59.310
Yes.
00:32:16.050
So your question was: do we have data to measure biases?
00:32:28.280
Absolutely. However, I’m unsure of the specific technical details, but feel free to approach me afterward!
00:32:45.180
I think your question ties into how data science captures predictive cues. Algorithms need to represent varying attributes important to different users.
00:32:59.700
We can make connections based on the success of similar items for different clients, which informs how we calculate match scores.
00:33:13.590
So I believe you had the last question?
00:33:23.890
For stylists who frequently override algorithms but aren't successful, is there a mechanism to guard against this?
00:33:35.000
Currently, there isn't an explicit guard against that, but it's definitely something we could rethink for the future.
00:33:41.180
Thank you!