00:00:03.480
Good morning, everyone. My name is Bronwen. I wear a lot of hats, much like many of you. I'm a daughter, a sister, a business owner, and an employee. But deep down, at heart, I am a developer. I have been a developer for longer than I haven't, which means, among other things, that I'm old. Over the years, I've seen a lot of things, and I love digging through data. It fascinates me to navigate businesses, find a spreadsheet here, a little database there, and discover what insights I can extract.
00:00:18.480
Today’s talk comes from my perspective as a developer. Many of you may not be developers; some might be data professionals, business analysts, or report developers. Regardless, I hope to provide insights that you can take back to your teams and converse with the developers about these different types of data architectures.
00:00:35.680
It is now almost indisputable that data holds immense value—so much so that many organizations are starting to include it on their balance sheets. But what does that actually entail? What are the key factors that drive the value of data? Simply having data doesn’t mean it’s worth anything. Here are a few critical factors that I believe drive data value.
00:01:02.480
First, the ability to make informed decisions based on data is crucial. For instance, while in Sydney, I asked people the best way to get back to the airport. Who would say to catch the train? How many would suggest jumping in an Uber? That decision is influenced by context—what if I told you I was a terrible overpacker with three giant suitcases? That would likely sway your recommendation, right? Conversely, if I had just carry-on luggage and three friends with me, you might suggest the train. It's essential to consider different scenarios, like if the trains were delayed due to flooding. In such cases, our decision might shift to alternatives like taking an Uber or even a helicopter—if the budget allows. Hence, making decisions based on accurate data is vital.
00:01:51.960
Second, speed is significant. It's all well and good to have data, but if you receive it too late, it becomes useless. For example, in stock trading, if your data is a week old, it's not valuable. You want to make well-timed decisions—buying when prices are low and selling high. Third, it's important to trust that the insights derived from data are accurate. What's worse than having no data is having data that isn't reliable. If you’re unsure whether your garage door is closed or not because the button doesn’t seem to work, that uncertainty can cause worry.
00:02:42.040
Data accuracy and user confidence are fundamental. If a user doesn’t trust your data, they won't use it. Additionally, if the process to access that data is convoluted, it further devalues it. Many clients I've worked with have faced difficulties accessing data due to excessive forms or restrictions, rendering the data almost pointless.
00:03:02.240
Next, let's look at the principles to consider regarding different architectures. Data alone does not create value; it’s how we turn that data into actionable insights that empower decision-makers that truly adds value. In assessing whether to re-architect a platform, I focus on return on investment (ROI), generally preferring to see a tenfold return, whether through increased revenue or reduced costs. Exceptions may arise when addressing situations like impending support expirations or significant security issues.
00:03:30.639
For instance, if you are undertaking green computing initiatives aimed at bettering the world long term, your ROI measurement shifts. You might invest in something more costly for a greater good that isn’t immediately profitable. However, in routine circumstances, I recommend using the existing architecture until it no longer meets your needs.
00:04:01.200
To summarize, we need data, insights from that data, and actions taken by the right people in a timely manner. If actions aren’t taken quickly, even the most valuable data loses its worth. Moving on to the landscape overview, I will outline several key data architectures.
00:04:34.320
Today, I will cover databases, data warehouses, data lakes, and lake houses, identifying key components within these architectures. Each architecture involves ecosystems that play various roles, including data factories for ingestion and streaming hubs for facilitating data flow. I will also touch on the growing trend of data meshes, as well as the data governance and AI integration, especially relevant in larger architectures.
00:06:06.199
Now, in evaluating data architectures, we must consider certain metrics: speed, integration complexity, skill requirements, cost, scalability, latency, and AI integration. Speed reflects how quickly data travels from the source system to users. Integration complexity addresses how intricate the architecture is.
00:06:38.360
Additionally, we must analyze the types of skills required—what talent is needed to manage these systems? Cost encompasses expenses for operation, maintenance, and personnel. Scalability involves how easily a system can expand to handle increasing data volume. Latency is a crucial consideration— lower latency is preferable because if data is delayed, it diminishes its utility.
00:07:31.920
Lastly, AI integration is vital, particularly where specialized skills are necessary to leverage AI technology effectively. With this context, let’s explore each architecture more closely.
00:08:10.440
First, let’s start with databases. Most of us begin our journey here. Who has used a database? Hands up!
00:08:30.480
Many databases remain operational for years, with some users still utilizing databases that are over a decade old. How many have engaged with databases that are 15, 20, or even 25 years old? These databases often outlast their original environments. They typically perform well initially but may require maintenance over time as data and usage patterns shift.
00:09:01.920
Transitioning to a policy where efficient database use is essential is imperative. I often see clients transitioning too quickly to new systems without addressing the existing database's issues, such as data quality. Ensuring that data is properly organized, indexed, and maintained increases the chance of future ease.
00:09:43.320
Addressing database architecture fundamentals is vital. A simple example is having a web application communicating with business logic that interfaces with the database. This foundational layer is crucial for taking the next steps.
00:10:01.400
As we progress, organizations typically deploy data warehouses. This is usually after aggregating data from multiple sources for reporting purposes, tracking historical data, and running analytics. A key distinction is when organizations stop performing CRUD operations directly on these databases, pushing them into warehouse models.
00:10:30.200
Moving from a database to a data warehouse often entails a slight increase in skills due to the need for structured querying while analyzing historical data. The complexity also rises, meaning costs increase since a larger database requires more storage.
00:10:59.040
In many cases, organizations will move to data lakes, which accommodate unstructured data beyond the relational capacity. This allows for more data integration from various sources, enhancing insights available from both reporting and analytical perspectives.
00:11:29.760
Yet, many can face challenges where two systems are operational—leading to complexity. For reporting, while you may secure data lakes for their vast data sources, you often still require warehouse-like processing to run accurate reports. The costs involved tend to escalate as organizations manage multiple systems with possible inefficiencies.
00:12:22.880
A newer pattern emerging is the lake house model. This integrates the principles of both warehouses and lakes, designed to streamline operations without needing multiple systems. The approach still requires upskilling for users, but products are increasingly user-friendly. This standardization reduces complexity across architectures.
00:13:08.320
As we discuss these architectures, we cannot overlook the concept of the data mesh. Organizations might operate multiple lake houses across different teams or domains, allowing for tailored data ownership and control over their datasets. This flexibility supports agile operation in large environments, reinforcing data sharing and scalability.
00:13:49.560
In thinking about how to effectively evolve your data systems, we can consider a case study involving Jenny, a software developer at a rental car company. This example will highlight how an organization transitioned from a basic setup to utilizing a data mesh effectively.
00:14:34.160
Jenny started with a basic web application communicating with her company’s database to manage daily requests. Over time, as they grew, she identified synergies with maintenance data from her colleague, Toby. Utilizing common formats such as Excel for importing data allowed her to integrate insights effectively into their operations.
00:15:32.880
With collaboration from Debbie, who needed reporting without affecting production databases, Jenny and her team created a data warehouse solution to facilitate reports without the risks posed on production. The managers appreciated these insights about car utilization strategies from their integrated system.
00:16:04.440
As they saw improvements in operational efficiencies, they decided to scale further by integrating data science capabilities, providing predictive insights into vehicle maintenance and usage trends. By leveraging existing patterns, they transitioned towards a lake house architecture.
00:16:45.960
Managing telemetry data and real-time updates allowed further optimization around car use and maintenance schedules. They became advocates for adopting these new structures across the organization, leading to the implementation of a data mesh with team autonomy to manage differing datasets across each department effectively.
00:17:37.040
When evolving data strategies, several key factors should be considered: simplicity is essential. Begin with a basic database architecture. Many strive for advanced systems prematurely, often due to 'resume-driven development,' chasing trends rather than solving core problems.
00:18:24.759
Cloud solutions can be cost-effective, yet developers should receive access to production cloud data to maintain control over their costs and adapt to integration challenges without running into unexpected bills. Furthermore, prompt efforts emphasize delivering data solutions quickly to avoid delays in implementation and utility.
00:19:25.799
Latency is indeed a killer of effective solutions; strong responsiveness in data operations keeps users engaged and informed. Thus, aim to deliver actionable systems to meet organizational needs and prepare for future technological challenges.
00:20:05.200
In conclusion, quick delivery of data insights can yield a greater return on investment, preventing technologies from lagging in relevance. It is perfectly acceptable to start with a database, replicate data for early-stage needs, and anticipate a need for centralization as operational requirements grow. A focus on low latency is essential, as is actively monitoring your cloud architecture to maintain cost control.
00:20:13.880
AI will undoubtedly revolutionize data management and decision-making processes. As we observe how organizations manage data today, insights will emerge, demanding that organizations make appropriate investments in advancing their architectures. Thank you for your time. It has been a pleasure speaking at RubyConf!