00:00:19.930
Alright, hi everybody! I am Coraline Ada Ehmke. I work at Apartments.com, and we're hiring. I'm a principal developer there. Like our previous speaker, I did not have a Computer Science degree; in fact, I'm a college dropout. But I'm an autodidact, which means that I know the word "autodidact." Basically, I taught myself everything that I care about. One more thing: I'm from Chicago. There are no fewer than five developers in Chicago who do Ruby, and I'm the one with the fuchsia hair.
00:00:50.270
So, what is business intelligence and what do I mean by it? This is what we're going to be talking about today: basically, the stages of adoption to business intelligence, whether or not we can build it ourselves, and whether we should build it ourselves. Of course, I have to start with the definition, right?
00:01:14.390
Business intelligence means taking mission-critical knowledge inside your company, organizing it to provide a historical perspective, and using it for real-time decision-making. It sounds pretty straightforward. It's reporting, essentially. Sometimes it's called data science, sometimes it's called data warehousing, and there are all sorts of names for it. In the end, it's about taking the data that is essential to running your company, making sure it's accessible to the people who need it, and putting it in a form that will support the business in making decisions moving forward.
00:01:36.920
I like to talk about the three pillars of technology supporting business: technology that's in business, which we're not always partners in, and we sort of take for granted now. A software developer said, "If you're a company, of course you have a development team," and of course you have these other things. So, I think there are three main parts of how technology supports business: the first is infrastructure, which includes everything from the hardware that the applications are running on to the accounting system, the HR system, and all the stuff that I find quite boring.
00:02:01.350
The second pillar is applications, and as developers, this is where we spend most of our time. We enjoy writing applications that deliver business value. These applications are what let the business scale and hopefully attract customers. The third tier is data. For a developer, data is often an afterthought; we use our data stores to store objects that we need in our applications. Once we're done with those objects, we often stop thinking about the data. Maybe you have a report to build for the accounting team or something similar, but once it's built, you might not think about it again.
00:02:34.470
This is recognized as a flaw, especially as businesses grow larger. Typically, the business will want to address how to collect and use the massive amounts of data that have built up and how to actually turn it into an asset. When developers make friends with data, though, they can turn it into something useful. I want to talk about how you go about adopting business intelligence systems, and I'll do it in three acts.
00:03:02.570
The first two acts are how most companies typically approach it, and unfortunately, they often stop after the second act, throwing up their hands in despair and wandering off in search of enlightenment. The third phase is the one that I hope all of you will build upon.
00:03:52.880
Now, this is a Fiji mermaid. I don't know if you're familiar with it. In the late nineteenth and early twentieth centuries, there were many traveling sideshows and circus displays. The Fiji mermaid was basically the torso of a monkey sewn onto the tail of a fish, with some paper mache to make the transition smoother. The legend was that it was caught by a fisherman in Fiji and is now on display for everyone to see. A lot of people believed in it. So, if you're the kind of person who believes in the Fiji mermaid, you're probably also the type who thinks you can report straight out of your transactional database.
00:04:28.970
Everyone has done this at some point in time. Don't feel embarrassed about it; I have done it many, many times. What happens when you do this? Our transactional data stores are built for transactions. We have distinct tables for each of our objects, with complex relationships between them, possibly including join tables or models. When it comes time to report on those, you get some really gnarly SQL, which brings with it performance issues. One of the workshops is actually addressing this topic a little later, so you might want to check into that.
00:05:36.980
To get around the performance issues, you might decide that you can write better SQL than someone else. Good luck! What you'll find is that when you're doing reporting, you're impacting your production resources. Your servers will slow down, so you might conclude that you should only run these reports at 1 AM. I hope you're not a global company, but in the end, you might just throw up your hands, thinking that stakeholders just don't know what they want. You might even give them access to the database, which I consider to be problematic.
00:06:03.820
In the end, after you lose all your data because someone in accounting messed up their connection to an Excel spreadsheet, you're going to enterprise and bring in specialists. When a company reaches a certain size, they often deploy what I call the blue-and-khaki army. I don't mean to offend anyone in blue-and-khaki today—it's not all blue- and-khaki people—but they'll bring in consultants to create a data warehousing and enterprise reporting system.
00:06:49.990
What gets built is generally an entirely separate stack from everything else, usually running on a Windows server, often using Java. Nightly background jobs load the data from your transactional database into your data warehouse. Generally, these are run with ETL scripts: Extract, Transform, and Load scripts. They have no tests, and they aren't in source control. I'm sure everything will be just fine.
00:07:27.830
The schema in a data warehouse is optimized for reporting, which is a good thing, right? But it's reporting on day-old data. Maybe that's acceptable if you're sewing, but for me? I want fresh data, and I don't know about you guys. Also, there was this old process, called waterfall data warehousing, which requires a waterfall approach. Due to the complexity of everything, you need to do all of your planning upfront. If you get it wrong, you may have to start all over again and pay the consultants for another three to six months.
00:08:01.190
These consultants are not cheap. In the end, you get something really enterprise-oriented and, if Edward Tufte were dead, he'd be spinning in his grave right now. So, don’t trust these guys with your data either. While they may be nice, I'm sure, and yes, they wear khaki and blue, their sandals are a mass demonstration of individuality that I've just never seen before.
00:08:54.840
Most businesses will give up at this point. They have a data warehouse that emails reports once a week, and they forget about it. They forget about the promise of it, but there is a third option, and I call it lightweight BI. When a company realizes that their data is an asset and too important to outsource, it may be time to revisit the build versus buy decision. The promise is real-time data, put together by people who understand the data. People who wrote the structures that the data is based on can actually provide real-time decision support using your existing development team and stack.
00:09:43.830
You won't have to maintain those Java packages on the Windows servers because everything is built by your existing dev team. You also get the ability to change your mind, investing resources that will understand how it works and be able to adapt when your business needs change. You don’t have to pay the army to come back and fix things for you. You can be iterative and agile. You won’t have to design every single report and enforce an entire schema from the start.
00:10:13.050
This is what being iterative and agile is all about. That's not what business intelligence or data warehousing is about. You can trust your developers. I realize it's hard to do because data warehousing, business intelligence, and data science sound like big, scary things. The concepts may be a little difficult to grasp at first, but in the end, it's something that any competent team of developers can wrap their heads around.
00:10:53.890
They can deliver value in this domain and ultimately enhance your company by going through that process. So maybe I’ve piqued your interest in the idea of lightweight business intelligence. How do you actually go about getting started?
00:11:25.570
You want to collaborate, and I don't mean the sort of collaboration we regularly do as agile developers, where we work with our stakeholders and agree on things, then run within an iteration. I mean sitting down, getting out of your comfort zone, stepping away from technology for a while, and figuring out what the business is really about. What are the important pieces of data? They're not your user model; they're your customer.
00:11:55.110
So, work hand in hand with the people who will be served by the data that you present to them. This is not only to understand their needs, but it also allows you to communicate back to them what sorts of things are possible. They may not know every aspect of what's being recorded. For example, they might not realize that we're automatically recording the last login time. You can actually ask how loyal your customers are by observing how many times they log in.
00:12:38.730
This data exists and may be of value to them, so this is a two-way conversation. Next, formulate your questions. I like to think data is there to answer questions. Some people see it as a historical record or an audit trail, but I think its main role is to answer questions. If you don't know what your questions are, there's no way you can find the answer.
00:13:04.930
Work with your stakeholders to determine what questions you want to answer with the data. Generally, these are the same questions managers are asking their employees every day. For instance, how many signups did we get? Or is the lifecycle of our customers getting shorter or longer for onboarding purposes? As a developer, you'll start creating a solution by thinking about the data you already have. You might consider: if I took data from one place and merged it with data from another, what conclusions could I draw?
00:13:47.310
You want to take the inferences you make and turn them into facts. A fact has a specific meaning in data warehousing and business intelligence. I think it can be somewhat convoluted. We all know what a fact is; a fact is an answer to a question—a truthful answer to a question. If you didn't state your questions, you won't be able to discern where the facts lie. Thus, you should base your database schema on these facts.
00:14:22.830
You're not storing objects and state anymore; you're storing facts. Ideally, you have a table or document collection that answers one question, which means you'll want to normalize your data. We're afraid of denormalization because we're taught from an early age not to denormalize, but to be successful in reporting it may actually be helpful. It's fine for data to be stored multiple times in different places as long as the facts stored are necessary. Skip the clever graphs. In my experience, graphs can be pretty distractions that deliver minimal value. The more ink that's used on a graph, the less informative it usually is.
00:15:45.270
I prefer columns of data side by side if you’re doing a comparison. When not comparing, grouping data logically can communicate your ideas; it invites you to look at them and draw conclusions. With graphs, you can easily overlook the importance of the data itself.
00:16:25.040
So, here's a recipe for success in business intelligence. Your mileage may vary. Many people will have their opinions on what technologies you should use, but the important thing is that you choose technology you're comfortable with—something you know inside and out. This does not need to be the only way. I love Ruby! I think you all do too; Ruby is awesome because it's built for Test-Driven Development (TDD). Remember when I mentioned ETL jobs and that they typically have no tests? With Ruby, you can test the assumptions about how the data is working and how the facts are being collected.
00:17:02.640
Ruby is straightforward to deploy thanks to platforms like Heroku, which the previous speaker discussed. Ruby excels in data munging, and I often say that my experience as a Perl hacker significantly improved Ruby's data manipulation capabilities. Great visualization libraries exist for Ruby, though many these days are on the JavaScript side. Nevertheless, they are straightforward to integrate, allowing you to create aesthetically pleasing charts that communicate information effectively.
00:17:47.200
I want to talk briefly about something called a statistical model. Essentially, with this model, a calculation is run every time data changes—not on demand. This means the system will be very fast, anticipating the user's question and having the answer available almost instantly. We're not discussing 24-hour delays; at worst, you might have a 30-second delay. Stop thinking about data storage as a way to throw around objects. Instead, think about it as storing information that answers questions, focusing on the denormalized schema, which optimizes for reporting.
00:18:39.740
ETL processes can also be a bit fragile. They typically run every 24 hours, but now there's no reason they can't be performed on-demand or streamed in real-time. One of the workshop topics I'll cover covers actual implementations involving those elements. Let's say you have a service-oriented architecture, which I call a small app ecosystem.
00:19:10.270
In this ecosystem, you maintain a messaging server in the middle. You can have your data collection application listen to every message, recording it and performing calculations on data changes as a result of incoming data. This ensures that when someone checks a report, it's up-to-date; relevant calculations have already been run and updated.
00:19:50.220
These calculations are easily testable as you have inputs and outputs. This eliminates the need for someone to analyze terabytes of data and declare it "okay." The calculations run in the background, making data ready when you need it. You've probably heard that in SQL., you store the recipe, not the cake; I personally find happiness in storing both the recipe and the cake since you can't eat a recipe book.
00:20:32.740
In this model, you'll want to build APIs for everything. You never know where the data might be used. Don’t just develop a collection application and storage database for your reporting application. You might discover a need to showcase real-time data on your website, such as how many transactions per minute occur on your site.
00:21:15.790
It's crucial to ensure your APIs lend themselves to novel uses of the data, which is a significant aspect of software development—finding innovative ways to leverage code. Experiment at home, as it's not as daunting as it may seem.
00:21:39.670
Now, let me walk you through an example of a lightweight BI system. I like to give my projects mythological names because when I was in school as an English major, I was fascinated by comparative mythology and religion. I don't find naming a difficult problem, but I know others might question the significance of the names I choose.
00:21:44.970
This is a typical small app ecosystem. On the left side, we have Rails applications, possibly some non-Rails applications. In the middle, we have APIs, a messaging queue, and on the far right, our transactional data store, using PostgreSQL, and an event data store that records everything that happens.
00:22:28.540
Whenever an event is triggered, these APIs send messages to the messaging queue. The event API captures a copy, writing it to the event data store for events like user sign-ups, purchases, account closures, or preference changes. Ultimately, every action in your ecosystem needs to be recorded since data storage is inexpensive.
00:23:05.550
This event model emphasizes storing events rather than business objects. I gave this talk previously at Ruby Midwest and Windy City Rails, where someone approached me saying they attempted to create a similar system but failed. When I probed about their approach, they mentioned trying to alter an existing event, suggesting they were trying to go back in time and change history, a situation famous for causing problems in science fiction.
00:23:41.650
So again, don’t think in terms of business objects; think in terms of events, which do not change. Here’s an example of an event model. In this model, the schema is clearly declared, making it easier to traverse and analyze data.
00:24:18.660
Let’s say you create an event labeled as the customer signing up. The application within the ecosystem responsible for triggering this action will label it along with details encapsulated as a hash. You can input any kind of information as key-value pairs, enabling you to search for records with extensive versatility. For example, you could search by user ID, first name, or invoke behaviors based on arbitrary details.
00:25:07.420
One of the winning features of this event model is its ability to provide exploratory filters using user-friendly UI elements. On the left side, you can find a variety of filtering options—like searching by first name—allowing you to interact with data, regardless of how the schema is structured.
00:25:52.660
You can save these filters, representing a collection of criteria, and create groups of users based on those matching criteria. This formula applies to events to showcase trends or behaviors, which can be analyzed side by side without introducing any confusing graphs.
00:26:30.000
To conclude, business intelligence is emphatically not rocket science, but it does require significant endurance and possibly multiple attempts to master it. If someone suggests to you that it's impossible, consider my experience—I've built such systems successfully.
00:27:06.310
If you are struggling, don't discredit the process; keep persevering. Just like with any programming challenge, you will encounter failures along the way. Use the tools you’re familiar with to extract maximum value, and maintain genuine collaboration with your team throughout this journey.
00:27:31.060
Don’t overlook the potential of iterative development, advocating for small progress steps, questioning all assumptions, and ensuring accuracy in every endeavor. Remember: make friends with your data—the better your understanding, the more value you can deliver to your organization.
00:27:54.530
Thank you very much. You can follow me on Twitter @CoralineAda, and on GitHub, thank you all for your attention!