Data Integrity

Summarized using AI

The Babel Fish is Data: A Case Study

Norbert Wójtowicz • March 17, 2017 • Wrocław, Poland

The video "The Babel Fish is Data: A Case Study" presented by Norbert Wójtowicz at the wroc_love.rb 2017 event explores the significance of data in programming and how neglecting it can hinder software development. Drawing an analogy between data and the Babel Fish from 'The Hitchhiker's Guide to the Galaxy,' Wójtowicz argues that just as the Babel Fish facilitates communication among different alien languages, data allows programmers to interact with various systems effectively. The presentation addresses several key themes:

  • Historical Context of Scurvy as a Metaphor: Wójtowicz introduces the concept of scurvy and its historical implications, using it as a metaphor for the programming industry’s collective forgetfulness about the importance of data. Just as scurvy was preventable with vitamin C but resurfaced due to negligence, he suggests the programming world suffers from a lack of focus on data processing.

  • The Evolution of Programming: The talk highlights how programming has evolved since the 1930s, pointing out that while breakthroughs occurred in various aspects of software, many modern practices fail to address fundamental issues, particularly those surrounding data.

  • Data Types and Structures: Wójtowicz categorizes data into three main types: streams, trees, and meshes. Streams represent ordered data, trees illustrate interface structures, and meshes depict interconnected business data. Understanding these fundamental structures is vital for building efficient systems.

  • Core Business Objects: The speaker emphasizes that most systems are built around a few core business objects. For instance, in Spotify, listeners, artists, and songs form the basis of all functionalities, demonstrating how new relationships arise within these structures.

  • Microservices and Complexity: He discusses the complexities introduced by microservices and how they might inadvertently increase overhead due to fragmented data. The historical perspective reminds the audience that clarity in relationships within data can lead to simpler and more maintainable systems.

  • Importance of Data Characteristics: Wójtowicz stresses three key characteristics of effective data: it should be immutable, semantic, and recursive, which enables clear data definition and manipulation across systems.

  • Closure and Functional Programming: He explores Closure as a functional programming language, emphasizing its ability to handle data and testing efficiently through specifications that help in organizing data-driven development. This approach highlights the need for data processing and structured testing frameworks to maintain system integrity.

In conclusion, Wójtowicz advocates for a shift in how we approach programming by recognizing that data should be at the forefront of software development. By treating data as a first-class citizen, programmers can create robust systems that effectively tackle the complexities of modern applications. He encourages ongoing conversations around these ideas to foster innovation and collaboration in software development.

The Babel Fish is Data: A Case Study
Norbert Wójtowicz • March 17, 2017 • Wrocław, Poland

wroclove.rb 2017

00:00:11.500 Thanks, everyone, for coming so early to the after-party. I'm Norbert, and this is Babel Fish. I guess I have a cold, so I apologize.
00:00:18.970 Who knows what a Babel Fish is? Some hands go up. In 'The Hitchhiker's Guide to the Galaxy,' we learned two things: you must always bring a towel with you, and a Babel Fish is a really useful thing. You put it in your ear, and it translates all the alien languages spoken to you, so you can understand them.
00:00:25.990 My hypothesis is that data is the Babel Fish of programming because if you just use data, you can talk to systems that you otherwise would not be able to communicate with. However, we need to take a step backward and talk about scurvy. Who knows what scurvy is? More hands go up.
00:00:53.920 Scurvy is a terrible disease where your teeth fall out, your skin turns pale, and basically, your body is slowly disintegrating until you die. It's interesting because it is solely caused by a deficiency of vitamin C. Scurvy has plagued civilization for centuries because people would initially learn to eat fresh fruits and vegetables to avoid it, but many generations would pass, causing them to forget the reason and start dying of scurvy again.
00:01:11.710 This cycle of learning and forgetting repeated itself. The most recent relevant occurrence was in the 15th century when people began traveling long distances over open waters without access to fresh fruits and vegetables. By the 18th century, captains realized that packing lemons for sailors would prevent scurvy. However, in the 19th century, due to geopolitical reasons, limes were used instead of lemons, leading to scurvy returning because limes do not contain vitamin C.
00:01:41.829 People were not aware of the cause-and-effect relationship because they were not at sea long enough to experience symptoms. It wasn't until we started long-term expeditions in the late 19th century, such as to Antarctica, that we saw scurvy resurfacing. Despite it being a theoretically solved problem, during this time, the medical community was focused on bacteria, and many nutrients, including vitamins, were lost during processing of food. It wasn’t until 1932 that someone identified vitamin C as the key to preventing scurvy.
00:02:11.710 I find this historical tidbit intriguing, but why are we discussing this at a programming conference? I believe our industry is suffering from a form of scurvy. In the 1930s, we got lambda calculus, and by the 70s, we had functional programming, logic programming, constraint programming, and we understood relational databases and garbage collection. By the 80s, however, we experienced the AI winter, and everything was broken.
00:02:47.790 In the 90s, object-oriented programming was introduced, which catered to a large market, driving the need for programmers who may not be skilled enough in functional programming. We introduced scurvy into our projects and created a culture where software never shipped on time and was constantly buggy. People became accustomed to the idea that software could not be built to work correctly.
00:03:01.390 The irony is that we could land people on the moon, but now we struggle to deploy a web server without it crashing every few minutes. The worst part is that while trying to solve these non-shippable software issues, the hype has shifted to practices like TDD, Agile, Scrum, and so forth. These are helpful, but they fail to address the root issue: what we need is 'vitamin C,' which in this case is data.
00:03:47.520 Functional programming isn't cool simply because functions treat data equivalently; it is important because functional programming allows us to treat data as a first-class citizen. The bottom line is that in many projects, data processing is at the core, yet we often neglect it. Nowadays, you can't call yourself a data processor because it is less lucrative than being a software developer or architect.
00:04:06.830 However, it's crucial to remember that we are fundamentally data processors. The sooner we recognize this, the better our systems will be because we should all be focused on building information systems. An information system is essentially a collection of all stored data. Outside our systems, there is chaos generated from all kinds of data, and it is vital to secure this data in our systems because losing it is irretrievable.
00:04:32.350 At some point, users will come to us for data, and we need to aggregate it, draw conclusions, and provide the results. In effect, we build systems that are akin to chickens: they take data in and produce output from it. If you accept this analogy, you will realize that there are simpler ways to build systems. If an information system is just a large black box that stores data, there are essentially three types of data that exist: streams, trees, and meshes.
00:05:05.660 A stream is just ordered data; as soon as you process that data, you add an implicit order to your system. Trees represent how we construct interfaces, regardless of whether they are web, mobile, or desktop applications. In the end, all user interfaces are just trees. The faster you understand this and stop designing systems without this context, the more manageable your systems will be. The third type of data is graphs, which we often overlook. Most business data is structured as graphs, but if we begin with a relational database and project it onto a tree structure, we run into problems because our business models are indeed graphs.
00:05:54.380 Everything is interconnected, creating a mesh. I believe that in all the systems I've worked on, there are usually only three to five core business objects that define the domain. Everything else is merely new relationships between these existing objects in the graph. For instance, in a system like Spotify, the core business domain consists of listeners, artists, and songs.
00:06:20.430 An album is simply a representation of a new relationship between an artist and multiple songs. A playlist is a new edge connecting a user to specific songs. When you create a playlist and allow others to subscribe to it, you're forming a new relationship in the graph. As you add features to your system, the connections multiply, causing the complexity to grow. However, the core objects remain constant. This is often why microservices fail over time; it's not that microservices themselves are flawed, but often the separation of data leads to increased complexity, making it challenging to maintain.
00:06:56.320 When splitting microservices, every added feature creates new relationships between them, which can cause systems to break apart. Many implementations transition from a monolith to microservices only to find they've created more overhead than efficiency, leading to increased HTTP calls instead of database calls. It is essential to remember that business data will always form a mesh.
00:07:18.300 Once you internalize this idea, the programming language or architecture you choose to build your system becomes less relevant. For a sustainable system, you must recognize that your data consists of streams, trees, and meshes, and your architecture should facilitate easy transitions between these data representations.
00:07:43.020 For practical applications, look into lambda architecture as an example of how this idea is implemented in larger systems where various microservices come together. Atomic and sums illustrate how to utilize these concepts to build a real database. The frameworks like React, Redux, and Elm demonstrate how to incorporate these ideas into user interfaces. Now, regarding distributed systems—this conference has discussed them a lot—it's crucial to remember that good communication relies heavily on data.
00:08:08.580 Some things work well over the wire, such as data, while others, like objects, do not transfer well because they require context. Data needs to meet three key criteria: it should be immutable, semantic, and recursive. Immutability means that once data is created, it cannot be changed. Semantic implies that we aim to build foundational building blocks that can be scaled into more extensive systems. Recursive means that we can build complex systems from simple, smaller ideas.
00:08:39.660 To break it down into practical terms, data types include scalars like numbers, strings, and booleans, accompanied by forms of identity that are useful in context. Collections boil down to providing structure and meaning to your data. There are mappings (maps) providing the semantic of association, sets providing mathematical membership, and lists giving an ordered sequence. Collectively, understanding these concepts set the foundation for more extensive data structures.
00:09:05.090 Now that we've established that, let’s discuss programming languages. While this is a Ruby conference, you'll notice there are similarities with closures. Closure provides scalars like numbers, strings, and booleans, similar to Ruby. There are distinctions, however. In Ruby, symbols are treated differently; they are associated with keywords, which can be confusing in the context of different programming languages.
00:09:27.640 In terms of collections, we have lists for ordered data, maps for associative data, and unique sets for mathematical concepts. Each programming language offers various data structures that map back to these semantic themes, but our focus should be on the underlying principles. In practical application, many properties can be retrieved easily, so let’s explore how to leverage that.
00:09:53.120 Lisp is often viewed as a more abstract language. If you can manage the parentheses, you can learn it quickly. Defining things in Lisp contrasts with Ruby’s style of coding but allows flexibility. In Lisp, expressions and lists are manageable, and by exploring functionalities, you develop a deeper understanding without extensive syntactic overhead. The nuance is wrapped up in the functional iteration of expressions.
00:10:24.630 Of course, Closure brings more to the table. Its philosophy centers on developer experience, making it easy to adopt concepts from other programming communities. You can access libraries that pull from various languages and methodologies. ClosureScript retains these features while operating on a JavaScript layer. As a back-end Ruby developer transitioning to ClosureScript, I've found it simplifies frontend logic immensely.
00:11:01.470 Closure spec offers an intriguing area of study, allowing you to specify and structure data contracts dynamically. Its utility lies in modeling contracts without enforcing strict static typing, which we often see in satellite programming schemes. It opens a pathway for managing how we approach our data. Having specifications makes validating and generating data much more seamless, as well as providing flexibility.
00:11:39.520 When you define a specification for what constitutes valid data, you can then generate that data based on those expectations. This vastly improves your ability to create useful tests for functions and interactions, making it easier to spot errors early on. Specifications encapsulate the essence of what you're trying to achieve, thereby separating contract definition from implementation.
00:12:09.840 Understanding how to navigate this notion equips you with valuable skills for problem-solving. Spec gives you instant feedback on the functional properties of your code—like types that match specified criteria—and lets you define behaviors for given variables. In Closure, you can create combinations of predicates to tailor the specific requirements of what valid conditions entail.
00:12:49.560 In testing, it allows for assertions in an efficient manner. It raises the stakes of testing by establishing clear guidelines and objectives, preventing misinformation upon validations and failures. You can categorize data appropriately to ensure the output matches expected behavior. By treating data as crucial elements, constructing meaningful relationships empowers a system to make sense of the broader structure.
00:13:26.960 One of these guidelines relates to data findings based on specific parameters. As an illustration, in integration testing, you can produce deterministic outcomes based on given contexts or formats of input. Allowing it to dynamically represent allowed variations leads to effective testing. However, being clear about iterative processes remains vital, especially given how dependencies shift or evolve during execution.
00:14:05.020 Closure provides the ability to define properties from heterogeneous and homogeneous collections, capturing the essence of what a user may experience. As a result, understanding these dynamics enables effective strategies for implementation and validation. You can establish a baseline—the foundational performance metrics—allowing you to assess whether your new implementations align with existing performance.
00:14:41.320 The ultimate goal is to merge both worlds, where closure functions as a bridge, iterating or branching processes as necessary. A layered understanding allows you not only to validate what is operational but also challenge existing structures and methodologies. Decisions around test code should enhance readability without sacrificing efficacy. Each code segment contributes to the larger framework; hence, mapping and organizing elements accordingly leads to improved coherence.
00:15:09.550 Data-driven decisions empower development practices and hardware behaviors, creating better pathways to predictability across systems. When managing components across distributed platforms, these parameters and data contractual frameworks ensure ease of maintenance. The balancing act of assessing each unit's validity repetitively guides developers toward robust systems, where every change incrementally informs the overall architecture.
00:15:50.020 Grasping closure offers nuances essential for optimizing performance. Through emergent heuristics, you're not simply navigating through volatility; rather, you're creating spaces where systematic approaches uncover the way forward. Acting through integration, you can bolster communication ranks, expanding your understanding of application components while realizing the interconnectivity of elements can reveal patterns often overshadowed by goal-oriented pressure.
00:16:35.740 Structured testing regimes flow from the common understanding of both theories and practices. They fulfill educational gaps that many teams face as they develop functions across diverse needs. As critical processes converge, robust test frameworks are essential in maintaining structure, while prescriptive measures control the integrity of data-driven inquiries, ensuring compliance.
00:17:10.680 You can create robust services by developing contracts with unified structures, allowing modular additions—scalable improvements—without destabilizing core functionality. It’s essential to iterate solid frameworks that govern development cycles consistently. By profiling your systems this way, you get valuable insights on application coherence with each try, maintaining the modular integrity of functions as needed.
00:17:48.840 As we think about integrating tools into engineering phases, aligning policies with ongoing processes eliminates the white-knuckled pressure to prioritize deadlines over real goals. Creating balance within scope through closure opens doors to profound communication practices while adjusting granular methods to encapsulate relevant expectations inherent to user experiences.
00:18:21.640 Overall, closure presents enriching opportunities to reformulate data-centered paradigms, facilitating the alignment of conventional methods with contemporary expectations. Each adjustment made in systems narrows the margin of error while increasing operational integrity in test analyses. Ultimately, the cumulative effects empower consistent further execution, generating expectations while preserving the virtue of original designs.
00:19:00.450 Consolidation not only serves as a forward-looking approach; it transitions focus from fragmented views to cohesive whole systems. That realization incites a shift toward aspirations reflecting adaptable practices that motivate successful iterations. By structuring relationships among elements with functional flexibility, you generate a framework where data processing reigns supreme, intertwining into environments thriving on stability.
00:19:30.110 Acknowledging the nuances in functional programming exposes latent opportunities that better inform strategic decisions for data integration. The fabric of programming continues to evolve, pushing boundaries while embedding collaborative practices that emphasize the shared responsibility for creating sustainable solutions. Our collective understanding of data generates a formidable baseline for continuous innovation.
00:19:54.450 In summary, integration fosters environments supportive of exploration without compromising the need for efficiency. Bridging the gap across platforms through reconciliation becomes integral to the steady rise toward heightened performance. As we share our insights and knowledge, cultivating this ecosystem creates mutual benefits for all parties, instilling purpose-driven motivation that fuels the code to serve higher goals.
00:20:26.790 Thank you for your attention. I hope some of these concepts resonate with you, and I encourage further dialogue on navigating our digital landscapes with awareness and intention. I'm happy to take any questions.
00:21:04.700 So, grab coffee or ask your questions, and let's discuss.
00:21:13.200 One question I have: Is there any logic behind how the examples in the closure spec generate example test cases? Does it have custom implementations for particular functions or just generate random data?
00:21:35.800 Great question! The closures used in spec implement default generators for basic types. You can also create custom generators tailored for complex conditions or structures. Closure offers an API that lets you define these parameters, making the whole testing environment highly adaptable.
00:22:11.400 It's vital to set parameters—especially when working with primes, for example—where random number generation can become sluggish. Instead, provide guidance to help the system create results more effectively. Overall, there’s a level of flexibility to suit varied needs.
Explore all talks recorded at wroclove.rb 2017
+25