Data indexing with RGB (Ruby, Graphs and Bitmaps)

In this talk delivered at RubyConf 2022, Benjamin Lewis from Zappy presents the development of a custom data indexing system called RGB (Ruby, Graphs, and Bitmaps), aimed at improving the accessibility and speed of querying survey data. The presentation details the journey beginning from a disjointed data set with serialized data frames to an in-memory index that allows real-time querying of data.

Key Points Discussed:

Background Context: Benjamin introduces the necessity for a custom measure store by outlining the limitations of the existing system which utilized serialized data frames stored in SQL, hindering real-time analysis and connection between data points.
Main Challenges:
- Context: Challenges in ensuring data semantic accuracy while querying specific measures.
- Storage: Inefficiencies due to repeated data storage across multiple surveys, leading to slow data retrieval.
- Harmonization: The need for establishing equivalencies between different measures and stimuli to enable meaningful comparisons.
The Measure Store: The solution introduced, which allows for straightforward querying involving context, measures, and dimensions. The measure store is built to handle large volumes of data and provides a user-friendly API.
Demonstrations: Benjamin illustrates the effectiveness of the measure store through live examples, showcasing its speed and efficiency in querying large datasets (e.g., querying 800,000 respondents in 17 milliseconds).
Storage Optimization: Discussion about the use of roaring bitmaps for efficient data storage and retrieval, significantly reducing the size of data stored compared to traditional SQL databases.
Graph Database Utilization: The integration of RedisGraph for managing semantic relationships in data, enabling complex queries and connections between data points.
Performance Metrics: Benjamin presents impressive performance improvements, with queries completing in a fraction of the time previously required, highlighting a 180x speed increase in some queries.
Future Directions: Zappy plans to scale the system, conduct further testing, and eventually open-source the measure store to share the harmonization capabilities with the wider community.

Conclusions:

Benjamin concludes by emphasizing the importance of having a robust data indexing system that combines Ruby, graphs, and bitmaps to overcome traditional data querying limitations. He encourages sharing and engaging further on the topic at the conference.

Through this innovative solution, Zappy aims to leverage the full potential of their datasets, allowing for rapid and contextually aware insights.

Data indexing with RGB (Ruby, Graphs and Bitmaps)
Benji Lewis • November 13, 2022 • Houston, TX • Talk

In this talk, we will go on a journey through Zappi’s data history and how we are using Ruby, a graph database, and a bitmap store to build a unique data engine. A journey that starts with the problem of a disconnected data set and serialised data frames, and ends with the solution of an in-memory index. We will explore how we used RedisGraph to model the relationships in our data, connecting semantically equal nodes. Then delve into how a query layer was used to index a bitmap store and, in turn, led to us being able to interrogate our entire dataset orders of magnitude faster than before.

RubyConf 2022