Catching Waves with Time-Series Data

In the presentation "Catching Waves with Time-Series Data" by Liz Heym, the intricacies of managing time-series data are explored, drawing parallels between surfing and data storage, access, and representation.

Key Points Discussed:

- Understanding Time-Series Data:

- Time-series data comprises observations recorded at consistent time intervals, differing from traditional relational data.

- An example includes usage rates for devices tracked every five minutes at Cisco Moroi.

Selecting Appropriate Tools:
- Just as surfers choose boards based on conditions, database selection depends on data quantity and access needs.
- Options for managing time-series data include:
- Utilizing existing databases with proper techniques.
- Adding extensions to existing databases (e.g., Postgres with PG TimeSeries, TimescaleDB).
- Adopting entirely new database solutions (e.g., ClickHouse).
Organizing and Querying Data:
- Stress on the importance of structuring data by timestamp to enable efficient queries.
- Employing composite keys for access optimization, where data is grouped for speed.
- Performance varies significantly between different database technologies based on data arrangement and access methods.
Data Retention and Compression:
- Challenges include managing storage and providing extensive data access.
- Aggregation and compression are key strategies, utilizing time-to-live (TTL) policies for data retention.
- TimescaleDB's compression converts raw data into a compact format for reduced memory usage.
Designing API Endpoints:
- Structure of the API should reflect the database's organization, providing valid querying formats for users.
- Important to establish documentation for request structures, acceptable time spans, and accessible aggregate data intervals corresponding to TTLs.

Significant Examples & Anecdotes:

- The journey of Liz, a fictional surfer tracking her performance over time, highlights how proper data management leads to insightful trends.

- The creation of "Little Table,” a proprietary relational database optimized for time-series data used internally at Cisco Moroi.

Conclusions and Takeaways:

- Understanding specific time-series data management techniques is crucial for performance and usability.

- Choosing the right tools and optimizing data storage, organization, and access is akin to selecting the right surfboard for the waves, allowing users to derive meaningful insights from vast data sets.

- The evolving landscape of time-series databases offers solutions that enhance performance and facilitate deeper data analysis.

Liz's surfing data story illustrates the practical application of these concepts, paving the way for more effective data strategies in varied use cases.