00:00:12.349
All right, let's get this party started! Whoo! So, this is Reporting on Rails—ActiveRecord and ROLAP working together.
00:00:21.029
My name is Tony, and I am a senior developer at Mobe in Indiana. We manage corporate cellphone accounts for Fortune 500 companies, everything from bill optimization to device procurement, including tier one tech support.
00:00:33.930
I work specifically on the billing and reporting team, where I handle the over 1 million devices that we have under management, along with all the billing data that goes back several years.
00:00:46.500
What I essentially do is take information from various sources—carrier billing, tickets to a call center, and orders placed from carriers—shove them into Postgres, sprinkle some magic on top, and out pops pretty charts and graphs that our clients love. Yes!
00:01:00.829
To better explain what reporting is, I can use my current experience with Mobe. Way back, around 2012, ownership came to the dev team and uttered the dreaded "D" word—they wanted a bunch of dashboards to report on all the data we have.
00:01:23.460
Not just one dashboard, but dashboards for our client administrators, dashboards for our internal support staff, and dashboards for ownership to actually see how the company is doing. Oh, and they wanted user-defined filtering—being able to slice and dice data any way they wanted—along with other bits of scope creep that arose over the years.
00:01:46.200
So, where do we begin on this? The dev team laid out a couple of notes: we didn't want bulk and bloat in our result set. This means we didn't want actual ActiveRecord objects coming back from the database, as they can be quite large, and we didn't really need all that data.
00:02:00.829
What we really wanted was generic, uniform data coming back from whatever system we built. This means a plain old bunch of arrays with rows of hashes containing the information we want, so we can call JSON on it, shove it into the flavor of the month’s JavaScript front-end framework for charts, and we'd be on our way.
00:02:19.260
So, what is reporting, anyway? I'm going to assume that most of our apps run against a relational database. A relational database holds the data, but data is completely worthless to you and me—humans can't really work with data; computers do.
00:02:40.200
What humans can work with is information. Reporting and analytics take data and convert it into information so humans can make decisions from it. More importantly, reporting answers questions, and this is probably the most important step in preparing your app to generate dashboards: figuring out upfront what questions you want to answer.
00:03:17.849
This means going to every primary user and asking, "What do you want to know?" or "What information do you need?" Because if you don't do that, you'll start randomly throwing queries against Postgres or MySQL, and you'll just end up throwing spaghetti at the wall without yielding a usable result.
00:03:54.260
Here's some examples of questions that we want to answer for each of the three stakeholder groups at Mobe. We want to answer all of these questions and more effectively.
00:04:07.169
Fortunately, there is an industry standard term called OLAP (Online Analytical Processing) that is built for data warehousing and analytics. In traditional OLAP products, everything is rolled up into memory in the form of data cubes, where all the information is pre-sliced and diced, pre-grouped together, allowing for very fast queries.
00:04:40.530
OLAP commonly deals with aggregates such as counts, maximums, minimums, and averages. Typically you don't deal with individual rows from your data; you care about the grand picture. However, traditional OLAP is often tied into Oracle products and MySQL products, which are enterprise-level solutions, and when you think enterprise, you think money. Unfortunately, we didn't have money, so we needed a better way: ROLAP to the rescue.
00:05:13.440
ROLAP (Relational Online Analytical Processing) runs with SQL, which is the language our normal databases talk. It also allows for dynamic queries to be generated on the fly, letting us get any information we want very quickly with just a little setup.
00:05:31.770
A nice feature of ROLAP is that it can work with both historical data (like billing information) and transactional data (like support tickets that come through our support center), which changes daily. ROLAP can handle both types of data, while traditional OLAP is more suited for historical information that doesn’t change.
00:06:03.650
When working with reporting, you want to think in OLAP terms, not just simple SQL. Let’s go through some terminology, starting with the fact model, also known as a fact table. This is the primary table that holds the information you want to extract from the database.
00:06:55.030
In Rails, the fact model is typically a standard model. By looking at the questions our users want to answer, we can define what a fact model is. For our purposes, the support tickets table and the lines table represent our fact models.
00:07:22.839
A dimension is a way to take your data and slice and dice it into various chunks, commonly through relationships with other tables in the database. These often correspond to foreign keys linking to other tables, but can also be columns on the fact model itself.
00:07:46.460
For example, in Mobe, we have a cost center living on our lines table, so we can group by this cost center. We also have a carriers table, allowing us to join it to get additional information.
00:08:01.720
When focusing on reporting questions, I always look for the word "by." For instance, if I want to sum up support tickets by type, I would group by type. If I have active lines of service, I would group by carrier, joining against the carriers' table.
00:08:32.859
Next, we have dimension hierarchy. This allows for navigating up and down a hierarchy of data. A common example is dates; if I want to group orders by day, I can group by day. However, I might want to go up a level and request everything grouped by month, quarter, or year.
00:09:05.010
In Mobe, for example, devices comprise various hierarchies, including model number, manufacturer, operating system, and wireless technology, allowing for different levels of grouping.
00:09:30.080
Dimension members, or dimension labels, are actual pieces of information that you display in a report. For instance, if a line has a carrier ID, we can group by that. However, to present it in a table or a pie chart, we should use human-readable labels like the name of the carrier.
00:09:56.250
Another concept is filters. While OLAP typically is set up for static queries, ROLAP allows us to use a 'where' clause in SQL to further refine our data set. In SQL, filters correspond to the 'where' clause, and in Rails, this is achieved through the 'where' method, standard ActiveRecord scopes, or the Ransack gem.
00:10:48.580
Now, a measure in OLAP terms refers to any aggregate function, such as average, sum, max, min, or count. A measure could be the sum of total charges for an order; it is the column that you plug into the aggregate function.
00:11:32.260
On the other hand, the metric is essentially the report itself. The metric represents the result of the question you want answered. In SQL, this typically consists of the entirety of the query, while in Rails, it encompasses everything built with ActiveRecord to execute against the database.
00:12:28.280
I know that’s a lot of information, but here's how everything ties together. In this case, the entire metric is the question we want to answer. The sum for mobile charges could represent a fact model, with the last billing period as a filter and grouping by cost center as a dimension.
00:12:56.560
Another thing to consider is that the more filters and dimensions you add, the more complex your queries become. This is essentially the gist of ROLAP, paired with its SQL and Rails equivalents.
00:13:36.480
So, how do you implement this? First, your data must be organized in a way conducive to reporting. The most common schema is called a star schema, which is simpler and easier to visualize than the snowflake schema.
00:14:00.340
Essentially, when you chart all your tables, you will have fact models in the center, branching out into each possible dimension.
00:14:07.639
However, one important note is that reporting on "has many" relationships can be quite difficult. While it's technically possible, using the 'distinct' keyword can lead to invalid SQL in databases like Postgres due to how aggregations affect row duplication.
00:14:30.860
To illustrate this with Mobe as an example, we identified three fact tables: support tickets, line of service, and a row on a billing statement, along with various dimensions they can relate to. Notice that in the lines table we have a carrier field that can relate to another carrier table.
00:15:17.720
To avoid multiple joins whenever possible, in ROLAP, we often want to denormalize things by putting data directly on the fact model instead of relying on multiple joins. This approach makes it easier to filter and group data.
00:15:56.790
Now, creating a date dimension is essential. Instead of making a date column on support tickets, we create a created_at ID that links to a date dimension. This allows us to execute queries such as, 'Give me all support tickets broken down by quarter,' by joining the date dimension and grouping by the quarter.
00:16:30.140
Since each row in the date dimension includes several dates broken down into parts, you can navigate through hierarchies as you see fit. ActiveRecord can indeed handle all these functionalities.
00:16:59.420
However, the underlying reason I'm here discussing this is that, while ActiveRecord provides the basis for constructing ROLAP queries, it does have limitations. For example, there's no effective way to programmatically group by aggregate columns without manually specifying them.
00:17:43.910
Moreover, ActiveRecord isn't particularly adept at describing models in ROLAP terms, like stating that these fields are dimensions, nor does it have an efficient method for storing pre-made queries.
00:18:02.480
What could we do instead? One option would be to hard-code all of our queries, but this would make it difficult to define custom 'where' clauses and joins. This brings us to the solution we implemented.
00:18:35.170
We ended up defining our own reporting library, which I recently adapted into an open-source gem called Active Reporting. This provides a DSL-like system that allows you to define queries using ROLAP terms.
00:19:00.830
This lightweight DSL interacts with ActiveRecord to acquire information about the database and tables, helps build the necessary queries, and directly executes them. Instead of returning ActiveRecord objects, you receive simple arrays of hashes containing the data you need.
00:19:24.030
This keeps the integration with ActiveRecord lightweight, with only a couple of new methods added to the model. While I haven't yet built a demo app, the gem's version 0.1.1 is released, and the supporting DSL is largely operational.
00:19:41.760
Every model in your app that you want to report on should have a corresponding fact model. By naming the fact model appropriately, you ensure that it links to the correct model and separates reporting functionality from standard model logic.
00:20:03.530
Each fact model defines its own dimensions, allowing for a whitelist of dimensions for user input. This provides flexibility in changing how we group and filter reports without cluttering the main model.
00:20:25.750
This user interface can allow for dynamic reports; for example, if I want my lines grouped by carrier account instead of carrier, I can change that easily in the reporting layer.
00:20:50.380
Dimension filters work as where clauses, and the fact model contains a whitelist of acceptable filters so users don’t call unsafe methods. This precaution ensures control over what can be modified, maintaining data integrity.
00:21:29.560
The execution within the gem involves creating a metric that defines the question we want answered. The metric is associated with a fact model and can include dimensions and filters applied dynamically as needed.
00:22:01.190
This allows us to transform a simple metric into a more complex one as user input can change it. Then, you take that metric and shove it into a report object, generating the SQL query and executing it.
00:22:28.790
The report object allows merging in user input for dynamic reports, while the SQL generated targets those specifications. The power of ROLAP shines here because it supports defining where clauses on the fly.
00:22:54.300
Here's an example of the SQL we built, which includes selecting specific columns, aggregating total charges, and joining on dimensions while applying filters. At the end of the process, we get back simple arrays of hashes.
00:23:19.150
Ultimately, reporting is about acquiring the aggregates of information we care about, not merely returning rows of data. In conclusion, I'd like to share some pro tips for database reporting.
00:23:42.750
First, avoid 'double jumps' in queries as much as possible; try to normalize only when necessary. For instance, if you want active lines by carrier, keep the ID directly on the table to reduce complexity.
00:24:15.580
Another method to counteract challenges with 'has many' relationships is through counter caches—whether via a built-in plugin or pre-computed values handled by background jobs.
00:24:37.500
Make sure to index wisely—if you plan to dimension by something, index the foreign key. Use efficient filtering columns based on user input for faster retrieval.
00:25:00.760
As your application grows, consider read-only replication for reporting databases to offload read operations from the master. In particular, Postgres or sharding can help manage workload efficiently.
00:25:31.080
Thank you for listening! Feel free to check out the gem on GitHub, copy the slides if you're interested, and reach out to me on Twitter if you have any questions.