Generate Anonymised Databases with MasKING

The video titled Generate Anonymised Databases with MasKING, presented by Chikahiro Tokoro at Ruby Unconf 2024, focuses on the importance and methodology of creating anonymised databases using a technique called masking. The speaker discusses various scenarios in which anonymising data is essential, particularly emphasizing the need to protect sensitive information in development environments. Key points from the presentation include:

Reasons for Anonymisation: Tokoro shares a story where the main website faced downtime due to delays in database migration. He highlights various reasons for needing anonymised data, such as complying with regulations like GDPR, optimizing SQL performance, and identifying bugs from production data.
Strategies for Implementing Anonymisation: The speaker outlines multiple strategies for generating anonymised datasets, including:
- Copying production databases and updating records.
- Utilizing database triggers to anonymise data during replication.
- Using a proxy connection to modify data on-the-fly.
- Generating dumps of anonymised databases directly.
Evaluation of Strategies: Tokoro evaluates these methods based on effort, complexity, and performance, concluding that a simpler, preferable method is copying the database and updating records due to its lower complexity and risk of data leakage.
Introduction to MasKING Tool: The video presents MasKING, Tokoro’s open-source tool designed for database anonymisation. He explains its configuration and ease of use, demonstrating how to set it up with YAML files to define how sensitive data, such as email addresses and phone numbers, should be anonymised.
Design Principles: Tokoro elaborates on design philosophies guiding the development of MasKING, including keeping it simple and prioritizing non-dependency on external libraries. He emphasizes the importance of Test-Driven Development (TDD) in ensuring code quality and reliability.
Challenges and Solutions: The speaker discusses challenges faced when parsing SQL data for anonymisation, describing a complex issue he encountered with production data and how community resources helped him find a solution.

In conclusion, Tokoro underlines the successful implementation of MasKING, now stable for production use. He invites feedback and support as he continues to enhance his project. The session not only provides insights into practical anonymisation techniques but also reflects on the code development philosophy that can bolster software quality.