Open Source

BI Tooling with Rails

BI Tooling with Rails

by Solomon Kahn

The video titled 'BI Tooling with Rails' features speaker Solomon Kahn at the GoRuCo 2014 event, where he discusses how non-technical employees in organizations struggle to access necessary data. At Paperless Post, a Ruby on Rails application was developed to empower these individuals, transforming their data access experience. Kahn outlines the common challenges companies face regarding data retrieval by non-technical staff, such as reliance on developers for SQL queries and inadequate tools. The main themes of the talk include:

  • Identifying the Problem: Many employees do not know SQL, making it difficult to access data for informed decision-making.
  • Traditional Solutions: Most companies resort to inefficient methods like cron jobs sending overwhelming emails or developers running routine queries, which do not scale well as organizations grow.
  • Rails Insights: Kahn highlights two critical insights from Rails: ActiveRecord’s ability to connect to various data sources and the use of ERB for SQL templates, making data queries more accessible and flexible for users.
  • The BI Application: The application developed at Paperless Post includes four sections—questions, KPIs, reports, and visualizations. This allows users to execute queries dynamically with user-defined parameters.
  • User-Friendly Interface: Non-technical users can easily run predefined queries, making timely data access straightforward and intuitive.
  • KPIs and Visualizations: The application facilitates tracking of key performance indicators and offers visual dashboard features for complex reporting needs.
  • Deployment Insights: Kahn explains the simple setup process for the app and emphasizes the necessity of authentication, although that feature was not yet implemented at the time of the talk.
  • Cultural Benefits: Addressing the data access needs of non-technical individuals promotes better relationships between teams and enhances overall workplace morale.

The video concludes by encouraging attendees to implement this BI tool within their companies to foster a more data-driven culture and improve collaboration among staff. Kahn illustrates that by providing access to data, organizations can empower non-technical users, leading to more efficient decision-making and a positive working environment.

00:00:16.560 Okay, so here's the problem. Lots of people in your organization need data, but they can't get it. Generally, this is because the data they need is in a database and they don't know SQL. At Paperless Post, we built an application to solve this problem, which has completely transformed the way we provide data to non-technical people. Like most companies, we found this to be a hard problem, and Paperless Post hasn't always done a good job solving it.
00:00:40.000 Initially, we had a hodgepodge of different tools that we built and cobbled together to get data to people, but it wasn't really working well. We had a cron job that ran over a thousand queries every night against a replica database and sent emails to people, overwhelming them with data that they didn't care about. Meanwhile, the crucial data that they needed was still inaccessible. We also had external GUI tools that weren't effective due to our database schema being too complicated for simple drag-and-drop operations.
00:01:12.000 Developers were periodically running important but routine queries every month, perhaps many of you do this as well—just changing the date, putting it into a CSV, and sending it via email. Most importantly, people often didn’t ask for data because it was really difficult to get. The non-technical individuals we wanted to empower to make data-driven decisions weren’t able to do that.
00:01:40.000 We needed to solve this problem, and there were two key insights from Rails that allowed us to build an application, which we have just open-sourced, to address this issue. The first insight is that ActiveRecord can connect to any data store from any application. If I define a class that inherits from ActiveRecord Base, I can set the abstract class to true and establish a connection from any Rails application in the world to any database.
00:02:01.000 The second key insight is that SQL queries are just text, and we can use ERB (Embedded Ruby) with them. Instead of treating them as very specific SQL queries, we can think of them as SQL templates, which opens up a huge range of possibilities for how people can actually use this data.
00:02:30.000 For example, when I write a query, I don't need to specify a date directly. I could say something like 'created_at equals goruco_date' and then the next year, whoever runs this query just has to change the goruco date without needing to rewrite the entire query. This allows for arbitrary inputs, enabling users to run the same query for any start or end date they want.
00:02:57.000 Optional parameters can also be included, allowing filters for specific conditions if wanted. The same query can be structured to group results by various time periods, such as day, week, month, quarter, or year. By taking the same data, we open up new possibilities for delivering information to non-technical users.
00:03:22.000 The beauty of this approach is that it only takes one line of code to combine the SQL template with user-input options, resulting in a fully formed SQL string that can then be executed. This creates incredible leverage because a developer only needs to write a query once, and then anyone in the company can run that query at any time with any parameters they like.
00:03:43.000 No longer will someone have to calculate sales tax every month for all the different states, which I used to do regularly. We have open-sourced this application, named BI, which you can find on GitHub at Paperless Post slash BI. It is a Rails application divided into four main sections: questions, KPIs, reports, and visualizations.
00:04:07.000 Questions act as the workhorse of the app, where most users spend their time. They involve execution of ad hoc SQL templates at runtime depending on the parameters people provide. For instance, one example could be a question about campaign finance: 'How much money do politicians raise?’. Each question includes fields for name, description, and tagging.
00:04:27.000 The BI app can connect to multiple data sources. So if you have your production replica database, a clickstream database, or others, you can easily configure it with just a couple of lines of code to query from these sources. You can also connect the app to HTTP-based data stores, integrating it with tools like Graphite, Mixpanel, and Elasticsearch.
00:04:46.000 Another aspect to note is that inputs within the ERB tags will be shown to the user, who can then enter their desired parameters. The non-technical user interface is designed to be straightforward; for example, a user can input 'John Boehner' and request to see how much money he raised during each election cycle. With just a click, they can see the results.
00:05:06.000 This query is executed on the database, and if they wish to download the results to a CSV file, it’s only a click away. There are individuals in newspaper organizations who have executed this query a thousand times for various reporters wanting that data, so it becomes easily accessible.
00:05:27.000 This tool saves a substantial amount of time. The next component of the application is KPIs. KPIs are the metrics you want to track daily—like daily sales, new users, and cancellations—that are stored in a specific format in the database. The system tracks performance and change metrics, facilitating quick graphing and analysis of important trends.
00:05:46.000 All headers on the graph are clickable, making it straightforward to access a seven-day rolling average of any metrics from the beginning of the year to the present. Although KPI query formats are somewhat unique, they are manageable once you become familiar with them.
00:06:06.000 Reports and visualizations are designed for complex requirements. Visualizations serve as dashboards, for example, providing live purchase maps or interactive performance reports. For reports involving file uploads and external API connections, these can be handled as well.
00:06:25.000 In summary, deploying this application was slightly rushed, and we launched it without authentication. Paperless Post employs a specific authentication mechanism, which will be finalized soon. So for now, it's advised not to deploy this in production, but getting it set up on your laptop will take about 10 minutes if you already have a Ruby environment.
00:06:45.000 Normally, deploying business intelligence infrastructure is a challenging systematic process, but this is very simple. You can test it out, show it to people in your company, and ask for their interest. The whole setup costs virtually nothing in terms of time—just around 10 minutes for what it would take to write one query.
00:07:03.000 Lastly, I want to emphasize that non-technical people in tech companies often are not treated as first-class citizens. When you prioritize the needs of the individuals making significant decisions, it fosters better relationships between technical and non-technical staff. Even if we didn’t face significant issues related to this, it has enhanced collaboration and positivity within the organization when you genuinely address the needs of those who require support.
00:07:34.000 Thank you very much.