Talks
Speakers
Events
Topics
Sign in
Home
Talks
Speakers
Events
Topics
Leaderboard
Use
Analytics
Sign in
Suggest modification to this talk
Title
Description
Five Sharding Data Models and Which is Right by Craig Kerstiens Sharding is a heated topic and many who have tried it have come away with a bad taste in their mouth. But it's also well proven that sharding your database is the true way to scale the data layer. From Instagram to Google to Salesforce, with large-enough data and with sufficiently demanding performance requirements, you need to shard in order to scale out. Using the lens of the Postgres open source database, I'll walk you through things to keep in mind to be successful with sharding, and which data model is the right approach for you. This is a sponsored talk by Citus Data.
Date
Summarized using AI?
If this talk's summary was generated by AI, please check this box. A "Summarized using AI" badge will be displayed in the summary tab to indicate that the summary was generated using AI.
Show "Summarized using AI" badge on summary page
Summary
Markdown supported
In this session at RailsConf 2018, Craig Kerstiens from Citus Data addresses the challenging yet essential topic of sharding in databases, particularly using the Postgres system. Sharding, defined as the practice of breaking a database into smaller parts to enhance performance, is crucial for scaling applications as seen in major platforms like Google and Instagram. Craig explores five different sharding data models, emphasizing the importance of proper data modeling for successful sharding. Key points include: - **Understanding Sharding**: Sharding allows for better performance through the distribution of data, enhancing write and read capabilities. - **Key Considerations**: It’s vital to define the right shard count upfront, favoring a higher number of shards to manage growth and prevent future migration complications. - **Five Data Models**: The models discussed include: - **Hash-based Sharding**: Uses a hash function on IDs to evenly distribute data across shards, improving access times and minimizing data skews. - **Range-based Sharding**: Efficient for time series data, where data is segmented by preset ranges (e.g., daily, weekly) for better management. - **Geographical Sharding**: Applicable if clear geographic boundaries exist, though caution is advised with data that spans these boundaries. - **Multi-Tenant Sharding**: Ideal for SaaS applications where each customer's data is kept isolated, ensuring privacy and performance. - **Hierarchical Sharding**: Optimizes queries for parallel processing; suitable for applications with extensive data processing needs. - **Practical Recommendations**: Craig advises on the importance of planning shard distribution appropriately and maintaining a robust structure in the initial design phase to avoid complications in scaling. - **Conclusion**: The talk stresses that with the right approaches and early groundwork, sharding can be a manageable and essential tactic for scaling database applications effectively, paving the way for future growth without the constant fear of hitting limits. Overall, this session provides invaluable insights for developers and database administrators looking to implement sharding strategies successfully in their applications.
Suggest modifications
Cancel