Data Governance
Data Unleashed: A Developer's Perspective on Navigating the Architecture Maze

Summarized using AI

Data Unleashed: A Developer's Perspective on Navigating the Architecture Maze

Bronwen Zande • April 11, 2024 • Sydney, Australia

In the presentation titled "Data Unleashed: A Developer's Perspective on Navigating the Architecture Maze," Bronwen Zande discusses the crucial aspects of data architecture from a developer's viewpoint. The talk emphasizes the value of data as a strategic asset in organizations, especially in making informed decisions and enhancing operational efficiency. Zande outlines various data architectures including databases, data warehouses, data lakes, lake houses, and data meshes, providing insights into their strengths and weaknesses. The key points discussed include:

  • Importance of Data: Data holds immense value but must be used effectively to drive insights and informed decision-making.
  • Critical Factors for Data Value:
    • The necessity for accurate and timely data to support operational decisions.
    • Speed in accessing data is crucial; slow data can become irrelevant.
    • Trust in the data's reliability to engage users and facilitate usage.
  • Data Architecture Principles: Zande highlights that actionable insights derived from data are what truly add value, focusing on return on investment (ROI) when re-architecting data systems.
  • Overview of Data Architectures:
    • Databases: Common starting point; require maintenance and organization.
    • Data Warehouses: Used for aggregated data reporting but increase complexity and cost.
    • Data Lakes: Handle unstructured data but can create operational complexities.
    • Lake Houses: Integrate aspects of lakes and warehouses to minimize operational overhead.
    • Data Mesh: Allows for decentralized data ownership, promoting agility across teams.
  • Case Study of Jenny: Zande illustrates a scenario in a rental car company where a basic database setup evolved to utilize a data mesh effectively, highlighting the journey of integration and efficiencies achieved within the organization.
  • Key Considerations for Data Strategy Evolution: Simplicity is fundamental; organizations should start with what they have before migrating to complex architectures. Cloud solutions can be economical, but maintaining control over costs is essential.
  • Final Insights: Value comes from quick and responsive delivery of data insights. Organizations should anticipate the need for centralization as they grow while focusing on low latency and robust monitoring of their data architecture.

Zande concludes by stressing the transformative potential of AI in data management and how organizations must adapt their architectures to meet evolving demands in decision-making processes. This talk, presented at RubyConf AU 2024, aims to equip professionals with the knowledge to navigate the complex landscape of data architecture effectively.

Data Unleashed: A Developer's Perspective on Navigating the Architecture Maze
Bronwen Zande • April 11, 2024 • Sydney, Australia

In today's data-driven world, organisations recognise the immense value of data as a strategic asset. With the potential to revolutionize decision-making, enhance operational efficiency, and provide a competitive edge, effective data management has become paramount. However, the ever-expanding range of data architectures and philosophies presents a challenge when determining the most suitable approach.
In this talk, we will look at the landscape of data architectures and philosophies from the perspective of a developer. We delve into the key players: databases, data warehouses, data factories, data lakes, and data meshes. We aim to illuminate the strengths and weaknesses of each architecture, enabling you to make informed choices for your organisation's data strategy.
Join us as we compare and contrast these architectures, unveiling the unique capabilities they offer. Choosing the right data architecture and philosophy is no easy feat. That's why we'll equip you with the necessary insights and learnings to navigate this complex decision-making process. Learn how to align your organisation's unique needs as we discuss the factors to consider, such as scalability, practicality, data retention, integration requirements, and latency.

RubyConf AU 2024

00:00:03.480 Good morning, everyone. My name is Bronwen. I wear a lot of hats, much like many of you. I'm a daughter, a sister, a business owner, and an employee. But deep down, at heart, I am a developer. I have been a developer for longer than I haven't, which means, among other things, that I'm old. Over the years, I've seen a lot of things, and I love digging through data. It fascinates me to navigate businesses, find a spreadsheet here, a little database there, and discover what insights I can extract.
00:00:18.480 Today’s talk comes from my perspective as a developer. Many of you may not be developers; some might be data professionals, business analysts, or report developers. Regardless, I hope to provide insights that you can take back to your teams and converse with the developers about these different types of data architectures.
00:00:35.680 It is now almost indisputable that data holds immense value—so much so that many organizations are starting to include it on their balance sheets. But what does that actually entail? What are the key factors that drive the value of data? Simply having data doesn’t mean it’s worth anything. Here are a few critical factors that I believe drive data value.
00:01:02.480 First, the ability to make informed decisions based on data is crucial. For instance, while in Sydney, I asked people the best way to get back to the airport. Who would say to catch the train? How many would suggest jumping in an Uber? That decision is influenced by context—what if I told you I was a terrible overpacker with three giant suitcases? That would likely sway your recommendation, right? Conversely, if I had just carry-on luggage and three friends with me, you might suggest the train. It's essential to consider different scenarios, like if the trains were delayed due to flooding. In such cases, our decision might shift to alternatives like taking an Uber or even a helicopter—if the budget allows. Hence, making decisions based on accurate data is vital.
00:01:51.960 Second, speed is significant. It's all well and good to have data, but if you receive it too late, it becomes useless. For example, in stock trading, if your data is a week old, it's not valuable. You want to make well-timed decisions—buying when prices are low and selling high. Third, it's important to trust that the insights derived from data are accurate. What's worse than having no data is having data that isn't reliable. If you’re unsure whether your garage door is closed or not because the button doesn’t seem to work, that uncertainty can cause worry.
00:02:42.040 Data accuracy and user confidence are fundamental. If a user doesn’t trust your data, they won't use it. Additionally, if the process to access that data is convoluted, it further devalues it. Many clients I've worked with have faced difficulties accessing data due to excessive forms or restrictions, rendering the data almost pointless.
00:03:02.240 Next, let's look at the principles to consider regarding different architectures. Data alone does not create value; it’s how we turn that data into actionable insights that empower decision-makers that truly adds value. In assessing whether to re-architect a platform, I focus on return on investment (ROI), generally preferring to see a tenfold return, whether through increased revenue or reduced costs. Exceptions may arise when addressing situations like impending support expirations or significant security issues.
00:03:30.639 For instance, if you are undertaking green computing initiatives aimed at bettering the world long term, your ROI measurement shifts. You might invest in something more costly for a greater good that isn’t immediately profitable. However, in routine circumstances, I recommend using the existing architecture until it no longer meets your needs.
00:04:01.200 To summarize, we need data, insights from that data, and actions taken by the right people in a timely manner. If actions aren’t taken quickly, even the most valuable data loses its worth. Moving on to the landscape overview, I will outline several key data architectures.
00:04:34.320 Today, I will cover databases, data warehouses, data lakes, and lake houses, identifying key components within these architectures. Each architecture involves ecosystems that play various roles, including data factories for ingestion and streaming hubs for facilitating data flow. I will also touch on the growing trend of data meshes, as well as the data governance and AI integration, especially relevant in larger architectures.
00:06:06.199 Now, in evaluating data architectures, we must consider certain metrics: speed, integration complexity, skill requirements, cost, scalability, latency, and AI integration. Speed reflects how quickly data travels from the source system to users. Integration complexity addresses how intricate the architecture is.
00:06:38.360 Additionally, we must analyze the types of skills required—what talent is needed to manage these systems? Cost encompasses expenses for operation, maintenance, and personnel. Scalability involves how easily a system can expand to handle increasing data volume. Latency is a crucial consideration— lower latency is preferable because if data is delayed, it diminishes its utility.
00:07:31.920 Lastly, AI integration is vital, particularly where specialized skills are necessary to leverage AI technology effectively. With this context, let’s explore each architecture more closely.
00:08:10.440 First, let’s start with databases. Most of us begin our journey here. Who has used a database? Hands up!
00:08:30.480 Many databases remain operational for years, with some users still utilizing databases that are over a decade old. How many have engaged with databases that are 15, 20, or even 25 years old? These databases often outlast their original environments. They typically perform well initially but may require maintenance over time as data and usage patterns shift.
00:09:01.920 Transitioning to a policy where efficient database use is essential is imperative. I often see clients transitioning too quickly to new systems without addressing the existing database's issues, such as data quality. Ensuring that data is properly organized, indexed, and maintained increases the chance of future ease.
00:09:43.320 Addressing database architecture fundamentals is vital. A simple example is having a web application communicating with business logic that interfaces with the database. This foundational layer is crucial for taking the next steps.
00:10:01.400 As we progress, organizations typically deploy data warehouses. This is usually after aggregating data from multiple sources for reporting purposes, tracking historical data, and running analytics. A key distinction is when organizations stop performing CRUD operations directly on these databases, pushing them into warehouse models.
00:10:30.200 Moving from a database to a data warehouse often entails a slight increase in skills due to the need for structured querying while analyzing historical data. The complexity also rises, meaning costs increase since a larger database requires more storage.
00:10:59.040 In many cases, organizations will move to data lakes, which accommodate unstructured data beyond the relational capacity. This allows for more data integration from various sources, enhancing insights available from both reporting and analytical perspectives.
00:11:29.760 Yet, many can face challenges where two systems are operational—leading to complexity. For reporting, while you may secure data lakes for their vast data sources, you often still require warehouse-like processing to run accurate reports. The costs involved tend to escalate as organizations manage multiple systems with possible inefficiencies.
00:12:22.880 A newer pattern emerging is the lake house model. This integrates the principles of both warehouses and lakes, designed to streamline operations without needing multiple systems. The approach still requires upskilling for users, but products are increasingly user-friendly. This standardization reduces complexity across architectures.
00:13:08.320 As we discuss these architectures, we cannot overlook the concept of the data mesh. Organizations might operate multiple lake houses across different teams or domains, allowing for tailored data ownership and control over their datasets. This flexibility supports agile operation in large environments, reinforcing data sharing and scalability.
00:13:49.560 In thinking about how to effectively evolve your data systems, we can consider a case study involving Jenny, a software developer at a rental car company. This example will highlight how an organization transitioned from a basic setup to utilizing a data mesh effectively.
00:14:34.160 Jenny started with a basic web application communicating with her company’s database to manage daily requests. Over time, as they grew, she identified synergies with maintenance data from her colleague, Toby. Utilizing common formats such as Excel for importing data allowed her to integrate insights effectively into their operations.
00:15:32.880 With collaboration from Debbie, who needed reporting without affecting production databases, Jenny and her team created a data warehouse solution to facilitate reports without the risks posed on production. The managers appreciated these insights about car utilization strategies from their integrated system.
00:16:04.440 As they saw improvements in operational efficiencies, they decided to scale further by integrating data science capabilities, providing predictive insights into vehicle maintenance and usage trends. By leveraging existing patterns, they transitioned towards a lake house architecture.
00:16:45.960 Managing telemetry data and real-time updates allowed further optimization around car use and maintenance schedules. They became advocates for adopting these new structures across the organization, leading to the implementation of a data mesh with team autonomy to manage differing datasets across each department effectively.
00:17:37.040 When evolving data strategies, several key factors should be considered: simplicity is essential. Begin with a basic database architecture. Many strive for advanced systems prematurely, often due to 'resume-driven development,' chasing trends rather than solving core problems.
00:18:24.759 Cloud solutions can be cost-effective, yet developers should receive access to production cloud data to maintain control over their costs and adapt to integration challenges without running into unexpected bills. Furthermore, prompt efforts emphasize delivering data solutions quickly to avoid delays in implementation and utility.
00:19:25.799 Latency is indeed a killer of effective solutions; strong responsiveness in data operations keeps users engaged and informed. Thus, aim to deliver actionable systems to meet organizational needs and prepare for future technological challenges.
00:20:05.200 In conclusion, quick delivery of data insights can yield a greater return on investment, preventing technologies from lagging in relevance. It is perfectly acceptable to start with a database, replicate data for early-stage needs, and anticipate a need for centralization as operational requirements grow. A focus on low latency is essential, as is actively monitoring your cloud architecture to maintain cost control.
00:20:13.880 AI will undoubtedly revolutionize data management and decision-making processes. As we observe how organizations manage data today, insights will emerge, demanding that organizations make appropriate investments in advancing their architectures. Thank you for your time. It has been a pleasure speaking at RubyConf!
Explore all talks recorded at RubyConf AU 2024
+14