ETL

Summarized using AI

What We Can Learn From COBOL

Andrew Turley • June 21, 2014 • Earth

In the video "What We Can Learn From COBOL," Andrew Turley, a junior lead software engineer at The Ladders, discusses the historical context, design principles, and lasting impact of the COBOL programming language. Originally conceived in 1959, COBOL was developed to address the need for a common programming language for business applications specifically focused on data processing. Here are the key points covered in his presentation:

  • Origins of COBOL: COBOL was created in response to a need identified by Mary Hawes at a conference in 1959 for a programming language that could be universally applied across different hardware platforms, breaking away from machine-specific languages.
  • Development Structure: A long-range committee was established to guide the process, with a short-term committee that ultimately designed the language, culminating in the release of the first COBOL specification in December 1959.
  • Language Design Philosophy: The primary goal was to create a language accessible to non-programmers, resembling English, promoting readability, and ensuring ease of transcription on the early teletypes.
  • Separation of Concerns: COBOL’s structure mandates a clear separation of concerns which includes divisions for identification, environment, data, and procedures, fostering better organization than many modern programming languages.
  • Transformation of Programming Constructs: Initially, COBOL used simple verbs to operate, avoiding complex functions that were deemed too challenging for non-mathematicians, leading to a language that, while easy to read, often became verbose.
  • Extension Limitations: The initial ambition for COBOL to be extensible was rolled back due to a lack of implementation; the expansion only came much later with the introduction of built-in and user-defined functions.
  • Contemporary Relevancy: Turley highlights that the core concepts of ETL (extract, transform, load) that COBOL was designed to manage are still vital today, particularly with the rise of big data and event processing.
  • Comparison with Modern Languages: He draws parallels between COBOL’s structure and modern programming frameworks, showcasing how its principles remain relevant.
  • Learning from COBOL: The video emphasizes the importance of understanding COBOL’s legacy to appreciate ongoing developments in programming languages and data processing methodologies.

In conclusion, Turley encourages viewers to explore COBOL as a domain-specific language rather than a general-purpose one, understanding its contributions to data manipulation and its lessons for current programming practices. He also provides a link to resources for further reading on COBOL's history and significance.

What We Can Learn From COBOL
Andrew Turley • June 21, 2014 • Earth

COBOL was originally conceived as a programming language for building business applications. At the time this primarily meant processing large amounts of data and transforming it into useful information (commonly known at ETL). Interest in this kind of programming waned as the personal computing revolution swept through the industry, but it is waxing with the new focus on data science and "big data".

Help us caption & translate this video!

http://amara.org/v/FGYp/

GORUCO 2014

00:00:14.960 Hi, my name is Andrew Turley. I'm a junior lead software engineer at The Ladders. Today, I'm going to talk to you a bit about COBOL.
00:00:20.400 So, first of all, who here knows COBOL? All right, so this should be pretty easy to put one past you guys. I'd like to talk a little bit about what COBOL is and get into a little bit of the history of the language itself.
00:00:29.920 Back in 1959, the landscape in computer programming was that you bought a computer from someone like IBM or Honeywell and you got the programming language with it. There really wasn't a cross-platform language for writing code, which meant you were tied to whatever piece of hardware you bought. This made it difficult for customers, as they often faced the issue of not liking the language of the machine they purchased. In 1959, a woman named Mary Hawes was at a conference and started cornering people, saying there was a need on the business side for some sort of common programming language that could be used to write business applications.
00:01:04.000 She began assembling a team of people interested in creating this language. By the middle of 1959, they had started an organization to develop COBOL. They established a long-range committee responsible for strategic long-term planning, a medium-range committee for planning the next few years, and a short-term committee tasked with designing the language itself. The short-term committee was the only one that actually ended up doing anything. In December of 1959, they produced the first COBOL specification.
00:01:38.640 According to Gene Samut, who was on the short-term committee, the driving ideas behind COBOL were to create a language that was natural and could be read like English, making it accessible to business users. They wanted ease of transcription to the medium, which was critical at a time when codes were often entered on teletypes using five-bit character sets. There were academics who proposed languages that specified characters that were practically impossible to type on contemporary machines. The goal was to ensure that COBOL’s language specification matched what users could actually type and print out.
00:02:09.840 The third requirement was for a structure that allowed problem specification within the language, and finally, they aimed for implementability. The initial idea was that the first word of every sentence would be a verb so that you would effectively be programming in an imperative style with a limited number of verbs that had many options. The 'GO TO' command would be permitted after every statement, allowing easy jumps within the code. This was before Dijkstra's famous 'Go To Considered Harmful,' and back then, many believed the 'GO TO' command was a good idea.
00:02:59.680 It was also thought that new verbs could be added at any time, leading to initial beliefs that the language would be extensible. However, by 1965, this feature was removed from the specification because no one had actually implemented it. Quick word about functions—during that time, there was significant academic thought regarding computers, with many complex papers published that confused the business community. The prevailing belief was that functions were too complicated for non-mathematicians to grasp.
00:03:25.440 So rather than integrate functions, COBOL developed its own system of verbs that could be chained together, making the language easier to understand for non-programmers. This reliance on verbs continued, and by 1989, built-in functions were added, while user-defined functions finally appeared in the specification in 2002. Before that, manufacturers had created their own functions that were often ad-hoc.
00:04:02.560 Now, I want to discuss some of the things that we gained from COBOL, both good and bad. One of the first positive aspects was the enforced separation of concerns, which can be quite beneficial. COBOL programs are divided into four divisions: the identification division, the environment division, the data division, and the procedure division. If you look at most modern programming languages and environments, you'll find that few enforce such separation.
00:04:34.560 Unlike COBOL, which mandated this structure, many programming environments leave you to handle identification and configuration details on your own. There is also the inclusion of an identifiable section for declaring data, which is crucial for readability and maintainability in programming.
00:05:08.960 I've got an example from Storm, which is an event processing system. Storm has a Clojure DSL for performing similar types of work as COBOL—processing data. A definition of a bolt in Clojure bears a significant resemblance to COBOL’s structure, with clear identifiers and a logical layout for output and input processing.
00:05:48.400 Systems we use today offer various levels of this organization and clarity, but COBOL was the first significant language to encourage this kind of structured thinking in programming languages and systems.
00:06:07.760 Another point to consider is that naturalness in language design is subjective and not always a reliable measure of a language's quality. Let's explore some real COBOL code to illustrate this point. When you see a COBOL program, hopefully, you grasp what is happening—it reads simple records from a file and copies them to another file for printing. If you have even a loose understanding of its function, you can identify the program's parts quickly. In fact, I'd argue that, even if you weren't a programmer, you could read COBOL code and deduce its purpose.
00:07:02.560 However, the simplicity can lead to verbosity, particularly when dealing with more abstract concepts. As those who've worked with COBOL can attest, the insistence on using natural language can sometimes hinder more advanced programming tasks, making it difficult to work with mathematical operations or string manipulations due to the resulting verbose syntax. The issue with COBOL's popularity boils down to the challenges it poses in writing efficient code, leading to the vast number of existing lines of COBOL.
00:08:14.960 When discussing programming languages today, you might hear someone express admiration for Ruby because it's intuitive or natural. What they typically mean is that Ruby's syntax resembles something familiar and accessible. This was true for COBOL's time, where the analogy was English. Now, most programmers are accustomed to languages that incorporate functions, and later languages tend to strike a balance between accessibility and complexity.
00:08:49.920 Finally, a critical takeaway from COBOL is the reminder that ETL—extract, transform, and load—will always be relevant. This process essentially defines the interactions we have with databases and mirrors what COBOL was designed to achieve. COBOL should not be seen as a general-purpose programming language; instead, it is better viewed as a domain-specific language (DSL) and a framework for ETL tasks, indicating how data is pulled from one location, manipulated, and then written back to another.
00:09:13.760 I have a visual from a COBOL book that illustrates the ETL process beside a diagram from Apache Pig documentation, a framework built on Hadoop. These visuals highlight that the tasks we perform with modern tools aren't much different from those established by COBOL. The same principles apply today as we continue to seek languages that express these ideas effectively across big data and event processing challenges.
00:09:48.000 In conclusion, you can find the references I used for this discussion at a shortened URL: bit.ly/gorucco-cobol, where you can access documents and articles related to COBOL. Unfortunately, many COBOL resources are not freely available online, but this URL links to some ACM articles and books that can help you further explore COBOL's history and relevance. Understanding our past is crucial, and I encourage everyone to delve into it.
00:10:16.800 Thank you, and go learn about COBOL!
Explore all talks recorded at GORUCO 2014
+5