JRuby

Norikra: SQL Stream Processing in Ruby

Norikra: SQL Stream Processing in Ruby

by Tagomori Satoshi

The video titled "Norikra: SQL Stream Processing in Ruby," presented by Tagomori Satoshi at RubyConf 2014, focuses on Norikra, an open-source server software designed for processing data streams using SQL within a Ruby environment. The presentation underscores the increasing significance of real-time analytics and the challenges posed by large, complex data streams. Norikra operates on JRuby and utilizes the Java Virtual Machine (JVM), enabling users to execute and manage queries seamlessly without reliance on heavy development tools or processes.

Key points discussed include:
- Objective of Norikra: Norikra facilitates query writing and management for non-programmers, making it accessible in the era of big data.
- Use at LINE Corporation: Satoshi highlights how LINE Corporation handles a substantial volume of logs and metrics as it supports a messaging application with approximately 130 million users. This involves monitoring various sub-services and ensuring effective data management.
- Data Processing Needs: The company’s analytics platform relies on collecting data from numerous servers and storing it in distributed systems like Hadoop's HDFS, which necessitates efficient and timely data processing to address operational issues.

- Comparison to Other Tools: The speaker critiques existing solutions like friendly, which require cumbersome configuration and restarts for complex datasets. Norikra is presented as a more flexible alternative that supports dynamic query adjustments.
- Interactive Features: Demonstration of how to install and utilize Norikra via command line, emphasizing its ability to handle JSON data inputs flexibly and process them using SQL queries. Integration with different output formats facilitates customizable data visualization.

- Real-time Application: Norikra is employed in summarizing error logs effectively, enabling LINE Corporation to manage API interactions without overwhelming partners with too many notifications.

- Scheduled Analytics: The platform also supports generating reports based on larger datasets, providing insights crucial for decision-making.

In conclusion, Norikra is presented as a robust tool for stream processing, adaptable to real-time analytics demands and capable of integrating with other data platforms. The speaker encourages viewers to explore Norikra via its GitHub documentation, citing its potential to enhance data processing capabilities significantly.

00:00:17.940 Hello, everyone. It's time for my talk. Today, I will discuss Norikra, an open-source server software for processing data streams using SQL.
00:00:22.680 Norikra is written in JRuby and runs on the JVM, allowing for easy addition and management of stream queries without the need for extensive editors, compilers, or deployments.
00:00:26.619 I will cover several topics today, including the implementation of Norikra and its applications at LINE Corporation. First, it’s important to understand what Norikra is and how it works.
00:00:41.260 My name is Satoshi Tagomori, and I am based in Tokyo, Japan. I work at LINE Corporation, an internet service company that provides a messaging application similar to WhatsApp or Facebook Messenger.
00:01:11.290 LINE has about 130 million users worldwide, primarily in Asia and South America, with some presence in Europe. Besides messaging, we also host many sub-services on our platform, such as Japanese manga, e-publishing, Q&A services, news, and games.
00:01:36.790 As a result, we handle a huge volume of logs and metrics. Our data analytics platform, while simple, is vital for monitoring and analytics. We need to collect and cleanse data from many servers and store it in distributed storage solutions like Hadoop's HDFS.
00:02:22.360 Once data is stored, we process it and visualize the results in various formats, such as graphs and charts. I am not currently a committer for the project, as I was involved in a project called Kyoto Tycoon related to friendly logging management.
00:02:50.040 Roughly speaking, friendly is a log management system that aggregates logs into a remote storage system. We use it alongside Hadoop to ensure we maintain a seamless data processing pipeline.
00:03:16.120 This is crucial as we need real-time processing of web service traffic to quickly identify any issues or changes in our service. This includes monitoring HTTP response codes, request rates per second, and response times.
00:03:31.660 Currently, we are generating these graphics using several tools, which have various plugins that allow us to customize our streaming data-processing capabilities.
00:04:02.920 While friendly offers simple data processing and visualization, it becomes cumbersome for complex scenarios. We often need to adjust configurations, which requires a restart and can interrupt processes.
00:04:35.000 Friendly is not ideal for managing complex datasets or environments subject to frequent schema changes. We need tools that allow for a dynamic adjustment of processing query structures.
00:05:02.240 Application engineers may not be software engineers, yet they understand what metrics are important for our services. Therefore, we need a system that enables these stakeholders to construct queries themselves.
00:05:38.020 This is where Norikra can help. Norikra is a stream processing middleware that enables processing using SQL, allowing for flexibility and responsiveness in adjusting to business needs.
00:06:03.079 It is distributed and can be easily installed via the Ruby gem system. Moreover, it launches servers that can be controlled through various client interfaces and a web UI.
00:06:22.500 Let me show you a demo of how to set up and use Norikra. First, you can install it using the command line, and once installed, you can use the JSON interface to feed data into it.
00:06:43.580 Norikra can process JSON objects with specified fields like name and quantity. We can select specific fields from the event streams with simple SQL queries, which allows us to efficiently visualize the results in the console.
00:07:20.890 This flexibility is significant. If we change the input data schema, Norikra can still handle the new input effectively without necessitating significant adjustments in the query structure.
00:08:04.360 For example, if we receive additional fields in our input, we can dynamically manage these changes within Norikra. This adaptability simplifies managing multiple input schemas.
00:08:47.640 Moreover, we can aggregate data using various SQL commands. Norikra supports counting and summing operations, making it straightforward to extract useful insights from real-time data.
00:09:16.340 In addition, we can customize our queries and push the results directly to different output formats, creating a flexible environment for data processing.
00:09:46.620 The tools we use with Norikra allow for varied and complex data manipulation while ensuring that output can be delivered quickly and effectively.
00:10:25.439 In our production environment, we utilize Norikra to summarize error logs. When messages are sent via our API, they can be monitored and errors are aggregated to prevent flooding our partners with too many error messages.
00:10:49.279 This summarization helps us manage our interactions with partners effectively while ensuring critical issues are highlighted through various means, such as email notifications.
00:11:30.290 Our partnership relies on timely responses to issues, and Norikra's data handling capabilities have allowed us to streamline this process.
00:12:11.560 We’re also using Hadoop to analyze larger datasets concurrently, creating reports and insights across services while maintaining high performance.
00:12:54.150 These reports are generated on a scheduled basis, providing visibility into key metrics and performance measures essential for our team's decision-making processes.
00:13:34.120 In summary, Norikra is a versatile tool for managing real-time data streams effectively. It supports the rapid development of analytics solutions tailored to specific business requirements.
00:14:04.509 The architecture we've developed using Norikra complements other platforms like Google BigQuery, allowing us to analyze and visualize data efficiently.
00:14:51.310 Moreover, the integration of processing tools with a focus on SQL capabilities provides a robust framework for both stream and batch data analytics, aligning with current industry practices.
00:15:33.020 To sum up, if you are interested in Norikra or stream processing, I highly encourage you to check the documentation available on GitHub and give Norikra a try. It can significantly enhance your data processing capabilities.