Database Design

Schemas For The Real World

Schemas For The Real World

by Carina C. Zona

In her talk "Schemas for the Real World," Carina C. Zona explores how app developers, particularly in social media, must acknowledge and adapt to the complexity of human identities. The main theme revolves around the inadequacy of traditional database schemas to represent the multifaceted nature of relationships, gender, and sexual orientation. Zona discusses how users often push back against rigid schemas, which fail to capture their realities and identities.

Key Points Discussed:
- Overlap of Roles: Zona merges her experiences as a Ruby developer and sex educator, emphasizing how cultural changes constantly introduce new labels for relationships and identities.
- User Experience Challenges: The frustrations users face when forced to select from limited options on apps lead to feelings of alienation. This struggle exemplifies the initial assumptions embedded in database schemas.
- Case Studies: Zona references Facebook's evolution in relationship status options and Google Plus’s initial missteps in this area, illustrating how user feedback can extend but not wholly resolve the issue of categorization.
- Limitations of Traditional Schemas: She highlights the inherent problems with relational databases in modeling real-life complexities and advocates for the exploration of graph databases as an alternative.
- Schemas and UX Alignment: Zona emphasizes the importance of aligning mental schemas (individual perceptions) with database schemas (technical frameworks) to improve user experiences.
- Free-form User Inputs: Examples from MetaFilter and Diaspora demonstrate the benefits of allowing users to have free-form fields rather than restricted lists, enhancing authenticity and user engagement.
- Trade-offs in Design: She discusses the necessity of accepting trade-offs in user experience design to balance structured data and personal expression, exemplifying this with the use of auto-suggest features in gender input fields.
- Consequences of Poor Schema Design: Zona warns that restrictive schemas generate misleading data, impacting the development cycle and user trust.
- Concluding Insights: The talk wraps up with the recognition that models of reality are complex, and a flexible approach to schema design fosters trust and engagement among users. Developers are encouraged to gather diverse initial data to understand user preferences better, without coercively limiting their options.

Ultimately, Zona urges developers to start with an open stance towards user identity and experience, which can lead to richer data collection and a more engaged user base. The flexibility in design, instead of rigidity, aligns better with the evolving way people understand and describe their identities.

00:00:08.599 Um, I'm actually really excited about this talk. This is something that is going to make my life easier because I'm dealing with these issues myself. Our speaker is Karina Zona. She's a San Franciscan and has been a Ruby developer for five to six years. She does a lot of work with RailsBridge and Women Who Code, and she's also a sex educator with the San Francisco Sex Information phone line and service, which is nationally known. She's going to be speaking to us about schemas for the real world. So thank you, Karina.
00:00:53.239 Well, hello! As Josh noted, I am both a developer and a sex educator. I think a lot about how these two things overlap. Research, culture change, and self-reflection have stirred an increasing range of labels that people ascribe to their important relationships, sexual orientation, and gender. Social apps, in particular, are now being pressed to adjust. Imagine walking through the world knowing that everyone's first assumptions about how you see yourself, who you love, and what feels right for you are completely wrong. Now imagine signing up for a cool website and then being required to select an option from a drop-down menu that doesn't include anything that represents you.
00:01:34.760 You'll feel defeated. You'll want to argue that whatever they think they're learning from you is not really true. You want to tell them that they're adding to your humiliation by making you do this. You'll want to tell them that they're missing a huge part of you. Users are giving us pushback against ill-fitting assumptions, and if you feel out of depth in dealing with these issues, well, you're not the only one.
00:02:09.200 So, what would a canonical set of relationship statuses look like? Three years ago, Facebook figured this was a pretty good model and arguably even pretty progressive—right? We got stuff here like "open relationship." Users disagreed, and they disagreed strongly. So, under pressure, Facebook had to nearly double the options in just two years. When Google Plus launched last year, it largely adopted that list, although for some reason, they left out 'separated' and 'divorced.' We still don't know what a canonical list looks like.
00:02:44.560 Notice that they also added something else: the choice to opt out of being labeled at all. This allows users to identify their relationships using labels of greater personal significance. This shift was driven by people rejecting a user experience that wasn't working for them. It's not about monetization; Facebook doesn't even care, and they're not providing the option to advertisers. We're dealing with three core problems.
00:03:09.200 First, there's a premise that deeply personal stuff about humans can be reduced to lists. Second, there's the assumption that canonical lists for these things do and must exist, or at least that we can create them. And the third problem is our faith that the first two can be easily solved just by adding more list items. This is not going to work, and worse yet, it makes you look like a fool. This is what happens when we try to throw more labels at the problem instead of examining the assumptions in the database schema.
00:03:54.080 For example, an open relationship is definitionally a one-to-many join. The usual understanding is that the actual number of relationships could be zero, one, or many. But this is Facebook failing at modeling relationships in a relational database. The schema forces the user to choose, so you either look evasive or are inauthentic. And it's not just about relationship statuses; it's not just for people in open relationships. They're being made to feel like they don't exist or have to lie.
00:04:41.359 It's the tip of the iceberg. In 2008, Sam Hughes set out to create a modern relational schema for marriage. He went through about 14 iterations in SQL before realizing that the problem was never going to be solved that way; the foundation itself was wrong. Relational databases couldn't model real life's complexity. A graph database is what would be needed. So how do we bring modern realities into data views and logic? We start at schema.
00:05:27.000 Developers are being tugged in two directions: keep that code base manageable, right, and design for modern complexity. We can build a foundation for great user experience and sound development. Let's look at what the approaches are. First, we have to get schemas into alignment, and I said schemas plural because we have two different types. We have mental schemas, which are an individual's set of preconceived ideas—a framework for representing some kind of aspect of the world and our personal system for organizing information, especially as new information comes in. Then we have database schemas, which are essentially the same thing, just a mental schema translated into blueprints for a database.
00:06:24.080 When we use rails generate scaffold or rails generate model, we're creating a migration. And maybe you created something like this. Those migrations ultimately get translated into a unified schema to be used by the environment's database. So, when you're looking at all this UX, these are just manifestations of mental schemas. Our schemas and our UX are leaving people behind, and we can fix it by starting with a question: What benefit will the user notice?
00:07:05.840 When developing a schema that asks a person about their experiences, feelings, sense of self, and identity, there isn't going to be a single right way for me to tell you to solve this. But what we can do is evaluate the trade-offs and ask what benefit the user will notice. That's not equivalent to asking how the user will benefit. That's a question that grants us a lot of latitude to assume that whatever we want to do will be to their benefit, right? Because we are making a product that's awesome and will work for them.
00:07:40.080 Evaluating from the user's perspective is how we keep focus. I've got this lovely little chart here. The problem is that there's absolutely no point on it where everyone is going to see maximum benefit. All we can do is really look at choices and try to work out the trade-offs.
00:08:30.320 First off, let’s agree that it feels terrible to be told, "No, you are wrong about who you are." Checkboxes, radio buttons, and select menus imply that all possible values can be represented. The message is, "Hey user, just pick the right ones and everything will be fine." But think back to those opening slides I showed you—does that stuff look like it maps to checkboxes or radios? No way! It's real world being rejected for not matching our mental schemas. Checking a box is a one-step action and that's really great for usability, while entering text is not. It can be seen as a free-form solution that doesn't automatically seem exciting.
00:09:32.200 Nevertheless, that form field can deliver striking user benefits. MetaFilter has had a text field for gender for over a decade. Initially, there were programmers who shuddered at the thought, but they soon jumped on board because it allowed users to express themselves fully and authentically. That text field grew into a beloved institution at MetaFilter. What users put into that field reveals something about them. The freedom to input anything or nothing says something about what MetaFilter is meant to be.
00:10:32.120 The schema's trust in users was the foundation for users to start asking, "MetaFilter, can we please share more?" Now, MetaFilter's users are trusted with many free-form fields, including ones most developers would instinctively constrain or thoroughly validate. Even nonsensical values are fine, and field values can blatantly contradict each other without objection. The message is, "Hey user, this is your profile. Put whatever you want in it. Make it comfortable, make it personal, make it messy if you want to. Because we as developers can handle it."
00:11:30.320 It's easy to get into the habit of structuring data for easy analysis, but we need to step back and wallow in the user's perspective for a moment because data does not have to be for analysis; it can be for sheer expressiveness. It can have character, individualism, and distinctiveness. Diaspora is a project that many in this room are familiar with. A couple of years ago, Diaspora turned gender into a text field. Just like on MetaFilter, users had fun.
00:12:14.080 However, on the other hand, developers did not enjoy this change. One major complaint raised was about the effect on the internationalization of gendered pronouns. Here's what I have to say about that—it's a rabbit hole, and it gets messy hell quick. When you deal with internationalization, it comes with a level of complexity that English hasn't wrapped its head around.
00:12:48.320 Grammatical gender and what you and I would call gender are completely independent of each other. You're not going to be able to extrapolate, so internationalization based on gender is going to cause significant issues. If it's a requirement, then the best way to cope is to straight up ask. Here's Randall Munroe from XKCD who has examined this problem thoroughly in relation to English language projects. He found that asking straight up, "Which pronouns do you prefer?" is truly the best approach.
00:13:17.040 While this matrix may look a bit crazy, it gets so much worse in any other language. If you insist, there are some Ruby gems you can check out, though I refuse to recommend them. One of them, Sex Machine, uses a database of first names from countries worldwide and tries to predict a name's gender based on associated probabilities. But it’s still just a guess, and many would be offended if you called them by the wrong pronoun, especially if you could have gotten it right easily.
00:14:19.440 The second gem is l18n inflector, which uses Rails inflector. This operates more like a whitelist—essentially allowing you to pick out words that should be detected in a gender field and mapping them to male or female, based on your judgment. However, this method is likely to fail miserably.
00:15:11.760 As developers, we have this vision of what a good code base should be: structured, orderly, and predictable. That's why we love relational databases, with their easily grabbable and sortable qualities. We want lists to be neat and exhaustive so we can provide nice, clean analytics. But when we try to do this with personal identity data, we end up doing tons of code, validations, and throwing exceptions because people don't fit our assumptions.
00:16:05.960 Then we have to deal with all these conditionals and partials to meet conditions, resulting in tons of code for premature optimization. We won't be able to keep up because cultures vary so much in how they use these labels, and whether they exist at all is highly subjective. It's a moving target; many of these labels were not even popularly known just 5, 10, or 20 years ago.
00:16:52.920 We're never going to stop making lists, which means all our decisions are going to be based on false premises. As engineers, we instinctively recoil from not structuring data for easy analysis. This feels wrong, and I truly understand—it's unsettling. But again, the foundational question is: what benefit will the users notice? This is not about serving our interests.
00:17:47.160 If necessary, we can find a middle ground in cases where an auto-suggest feature can help. If it's necessary to get some kind of structure, we can try minimalist suggestions rather than offering every possible value we have in the database. For instance, for gender, you could suggest two values that you think are interesting. As soon as someone types something that doesn't match, it becomes completely free form. Structured data will still be there if needed, making this a balanced solution when you’re willing to tolerate some ambiguity.
00:18:38.360 Of course, there are trade-offs to consider: data quantity may be lower since people often opt out of providing personal information, but data quality will likely improve when users can choose when to share and when not to share. This has been proven by MetaFilter's experience—despite the fact that people can enter anything, 40% of users choose to provide nice, simple structured data.
00:19:32.520 It's fine to mix and match these solutions to find what’s right for your app, your users, and your business objectives. Facebook, for example, makes relationship status completely optional, but it's coercive for those who choose to opt in—they have to set a value. In fact, most users do opt in, with 60% selecting some relationship status.
00:20:01.520 So the bottom line is that we want everyone to feel excited about what we build. We want users to feel passionate about their involvement and, of course, we want analytics, monetization, and all of that to be based on sound premises. Collecting data through coercive approaches carries risks—some people may lie because lying has become a requirement to get past barriers.
00:20:50.520 Conclusions drawn from this bad data misdirect our decision-making for the next stage of development. Restrictive options might not have to be marked as required. However, the way we set up schema often embeds assumptions that we should, and will. A field that's not allowed to be null is essentially destined to become mandatory.
00:21:37.560 And if you have a short value set, such as indicating only male or female, this implies that transgender identities aren't reasonable values. People who identify as transgender will likely be coerced into giving an inauthentic response due to the very first line of the schema.
00:22:13.560 This moment showcases how easy it is to solve this issue. It lays a foundation for a completely different user experience. Notice the ninja move here: doing absolutely nothing. You can create flexible fields upfront, optimizing for storage and indexing later.
00:22:39.040 Ultimately, a discretionary field allows the user to decide whether to respond or not. As developers, we may consider this somewhat redundant; however, making it explicit that null is allowed communicates intent and documents a product decision that can be revisited later.
00:23:15.840 Here are the key takeaways I hope you’ll have today: First, modeling the real world is complex, and it’s going to be okay. Second, assuming we know who users are surrenders our opportunity to learn who they really are. Early constraints in the schema met with misleading data, so keep constraints out of the user schema at first.
00:23:52.760 Gather enough initial responses for some raw data mining and watch that data for a while to spot trends. Right now, we're being pushed from behind. This should not become mere innovation; rather, we should adapt to what’s happening, anticipating rather than waiting for users to express frustration.
00:24:37.240 Finally, freeform does not have to harm us. Data quality improves when our lies are optional, not required. We don’t have to be dishonest, and this yields specific data that reveals undetectable patterns when the data is generic. We want users to feel trust toward us because that fosters their engagement, passion, and loyalty—all foundations for a great user experience.
00:25:07.080 Any questions?
00:25:15.360 (Audience member) I think this is a really interesting topic. I was curious; there are a lot of ad-driven revenue models, like Google and Facebook. How do aspects like this play into the demographics used by advertisers?
00:25:56.040 (Karina) Well, I'm not sure. If NBC wants to advertise a show targeting 18 to 49-year-old males and the database options are limited, it could pose an issue. But MetaFilter is financed purely by Google ads and has never run analysis on gender data—despite users showing curiosity back in 2010. Clearly, Google ads don’t seem to care, which suggests it might not matter.
00:26:39.560 (Audience member) Given that many users are gender normative, would it be worth using simple checkboxes instead of text fields for a better user experience?
00:27:45.840 (Karina) I agree that these trade-offs are necessary based on your user set's and your app's needs. The ambiguity in this information must be tolerated to provide users with authentic experiences. If you handle this upfront, it's less polarizing and allows for richer interaction.
00:28:11.680 (Another audience member) In dating apps, users often seek gender-specific matches. If many users have gender normative expectations, how would you handle a freeform input for gender seeking functions?
00:29:46.720 (Karina) Using auto-suggest can be very helpful here. Users who don’t fit into gender normative categories often appreciate the ability to find matches reflecting their individuality. This provides an opportunity to enhance the experience for everyone.
00:30:19.360 (Audience member who made Diaspora's change) I can attest to the user base’s positive reception of introducing a text field for gender. Even users fitting the gender normative categories appreciated the ability to customize their input. They felt empowered to express themselves freely, which enriches the overall user experience.
00:31:00.520 Thank you all for the engaging discussion. Let's grab lunch and perhaps continue this conversation.