Baltic Ruby 2024

Summarized using AI

Going back to the BASICs

Jan Krutisch • June 14, 2024 • Malmö, Sweden

In the presentation "Going Back to the BASICs," Jan Krutisch explores the history and significance of the BASIC programming language, particularly in the context of early home computers. He begins by reminiscing about the era of the Altair 8800, the first home computer, and Microsoft's role in delivering BASIC software to microcomputers. The session highlights how BASIC emerged as a crucial educational tool to teach programming to non-computer science students due to its simplistic design and ease of use. Krutisch discusses the early evolution of home computers like the Commodore C64 and the Amiga, emphasizing their architectural simplicity compared to modern systems.

Key Points Discussed:
- Background of BASIC: BASIC originated in 1963, aimed at making programming accessible and was later adapted for smaller home computers, leading to its widespread adoption.
- BASIC on Home Computers: Most first-generation home computers used some variation of Microsoft BASIC, positioning it as a cornerstone of early programming.
- Programming Experience: Jan demonstrates programming in BASIC using a Commodore 64 emulator, showcasing its immediate execution model and the use of line numbers for code structure.
- Technical Insights: He discusses the limitations of BASIC, such as its reliance on global variables and lack of modern programming concepts like local variables and structured error handling.
- Interfacing with Hardware: The talk also delves into assembly language and the essential workings of the 6502 processor used in many early computers, explaining how BASIC programs were managed in memory.
- Modern Implications: Krutisch draws parallels between BASIC’s simplicity and modern programming practices, stressing the importance of understanding foundational concepts in programming and the evolution from BASIC to contemporary languages like Ruby.
- Reflection: The presentation concludes with insights on the importance of learning from historical programming languages and their impact on current practices, urging developers to consider the ongoing complexity of modern software systems.

The overarching takeaway from Jan’s talk is a respectful appreciation for BASIC as an influential programming language that provided a gateway to computing for many, along with a reminder of the lessons learned in resourcefulness and simplicity that still apply today.

Going back to the BASICs
Jan Krutisch • June 14, 2024 • Malmö, Sweden

BASIC was once the most important programming language on home computers. Let's re-implement it in Ruby, learn some history on how computers worked back then and a few tricks along the way.

Baltic Ruby 2024

00:00:08 I'm very excited. We're going to hear what John is going to tell us about going back to the basics.
00:00:15 So, hi, my name is Jan, and I'm old.
00:00:20 There's no way around it. Because I'm old, I like to talk about old stuff.
00:00:31 Let's go back in time. Just out of curiosity, the hands are probably going to be the reverse of what Aurora just asked.
00:00:43 How many of you were born after 1985? Yeah, I kind of expected that.
00:00:51 I'll come back to 1985, but let's go back ten years earlier to 1975. This is the Altair 8800, which is sort of considered to be one of the first home computers.
00:01:07 It means that you can buy it as a person, bring it home, and do computing of sorts.
00:01:20 We don't really use the term 'home computer' anymore. Another term for that would be 'microcomputer'. My phone, which sits here, would like to argue with this guy about being a microcomputer. Maybe we should call them nanocomputers.
00:01:34 Anyway, the term 'home computer' stems from the distinction to what computers were back at that time, which were mini computers, like fridge-size machines, or actual computers that were room-size.
00:01:48 Large rooms filled with equipment to run corporate tasks. The Altair is relevant in one specific way because it represents one way of delivering software.
00:02:03 This particular piece of punch tape actually contains the first version of Microsoft BASIC, which was written by Bill Gates, Paul Allen, and another person named Monte Davidoff who helped with the floating-point implementation.
00:02:17 Microsoft BASIC was sort of the first product of Microsoft, and they sold it to Altair. The Altair came with Microsoft BASIC; you had to buy it actually for several hundred dollars, and you would get this roll of punch tape.
00:02:36 Throughout the end of the 70s and the beginning of the 80s, we saw a wave of home computers, like the C64, BBC Micro, and later, the ZX Spectrum for the British.
00:02:50 That's when the home computer market really took off. Those were commercial machines. There's a reason why I like these first-generation home computers.
00:03:11 They're really simple, and if you have enough time and energy, you can gain a complete understanding of these computers.
00:03:22 I mean complete in the sense that you understand how the hardware works, how the software works, how all the extra chips work, the timing of the instructions, and everything.
00:03:43 I am not that person, but I know people who have internalized every single aspect of, for example, the Commodore C64.
00:03:55 Coming back to 1985, this was my second computer. The Commodore Amiga was launched in 1985.
00:04:01 The Atari ST was also launched in 1985, while the Macintosh had launched in 1984 and the IBM PC debuted in 1981, gaining traction a few years later.
00:04:13 All of these were 16-bit machines. The operating systems became more complicated as the hardware became more advanced.
00:04:24 With extra power came extra complexity. For the Commodore Amiga 500, there are people who know a lot about it, but it’s kind of the cutoff point.
00:04:39 From that point on, it becomes impossible for one human being to fully understand what this system does at all times.
00:04:46 My first computer was actually a Commodore C64, and it came with BASIC.
00:04:55 That's what we're going to talk about today. BASIC stands for Beginners All-purpose Symbolic Instruction Code.
00:05:03 It rolls right off the tongue, right?
00:05:09 Some basic facts: it was created in 1963 at Dartmouth College by John Kemeny and Thomas Kurtz. Now, in '63, computers looked very different.
00:05:30 BASIC was designed as a compiled language for multi-user systems, which utilized a large system with numerous terminals.
00:05:45 It was meant to replace Fortran, which was briefed in Matt's talk yesterday. Fortran was a very clunky language—it’s not easy to write or read.
00:06:03 They wanted to create something that could actually be given to students, including students who were not computer science majors, to build simple software.
00:06:15 Since BASIC is very simple in its design, it was then easy to port it to smaller systems and write interpreters for them.
00:06:30 Microsoft BASIC is actually one of the first interpreters for home computers—specifically, these small computers I’ve been talking about.
00:06:48 Most home computers from that era used some version of Microsoft BASIC. There were a couple of alternatives, but most of them used Microsoft BASIC.
00:07:01 So did the Commodore C64, even though it doesn’t say 'Microsoft' anywhere here—that’s the effect of licensing agreements.
00:07:14 This is the screen that comes up—we're going to see it in a second on an emulator—that appears when you start the machine.
00:07:30 The machine is in BASIC when it starts; you can start programming BASIC from here. There’s no in-between.
00:07:44 That’s what I like about these computers; they scream 'program me' the moment you turn them on.
00:08:01 Now, a quick example for people who have never seen BASIC.
00:08:06 How many of you here have actually programmed in BASIC at some point?
00:08:18 Well, that’s a lot of hands! But I expect that to be similar to Visual BASIC, which is very different from what you’re seeing on the screen here.
00:08:31 There are similarities, but also elements that are very different. I'm not going to explain what this program does; I think if you put your mind to it, you can understand it.
00:08:49 It’s going to be on the slides for a while. Let’s talk about the things that actually stand out from a modern perspective.
00:09:01 The first thing is that BASIC is loud. It’s all uppercase on the C64. There’s actually a trick you can hit, which makes everything appear in lowercase, but it’s just a visual modification; the actual data is still uppercase.
00:09:17 Also, we have very short variable names. I know this is a bad example because we’re still using the 'I' for loops—at least some of us do, at times—and I'm guilty in that regard.
00:09:34 Variable names cannot be much longer than that. On the C64, the only thing you get is two letters or one letter and a number, along with a suffix for the type.
00:09:50 You run out of variable space very quickly, especially since all variables are global. There’s nothing like local state in BASIC, at least not in this version.
00:10:02 There are functions, but they're weird. They resemble mathematical functions with one argument, and it can only be a mathematical expression.
00:10:16 Also, there are no arguments, so you see the 'GO TO' on line 20 calling a subroutine on line 100, and the subroutine just returns.
00:10:30 However, there’s no data pushed between those; you have to use global variables.
00:10:44 The last thing that might be bothering you the most is explicit line numbers. There are a few reasons for this.
00:11:01 The boring reason is that we use it for addressing; so 'GOTO 100' would be how you write that, as there are no labels or jump marks.
00:11:13 The real reason is that when BASIC was created, the computer interfaces looked very different from what we have today.
00:11:24 Even the C64 doesn’t have what we would now consider a proper editor. Everything was line-based.
00:11:38 You could type a line and then hit enter, and it would get entered into memory. There was no scrolling through your code or anything.
00:11:54 Back in those days, your computer interface probably looked like this—a teletype.
00:12:04 It has a keyboard and looks like a typewriter, if you’re old enough to know what a typewriter is.
00:12:17 You type stuff into the computer, it gets sent via a serial line, and if there are any results, those get printed on paper.
00:12:32 Screens were not generally available at that time.
00:12:46 Using line numbers allowed you to quickly change your code by just typing the line number again and entering the new line.
00:13:03 Let's have a quick look at that.
00:13:15 Here is VICE, a C64 emulator. Let me try to write a quick program.
00:13:26 (Demonstrating the programming process on the emulator.)
00:13:45 Okay, this line is now entered into memory. Let's do the obvious.
00:13:57 I can print it out again with 'LIST', and then I can run it.
00:14:04 Woo! That’s exciting.
00:14:07 Especially if you do it in an electronic store in the '80s, it felt like you hacked the system or something silly.
00:14:20 The cool thing about BASIC is that it has immediate mode. 'LIST' and 'RUN' are already two immediate commands.
00:14:38 I can do something like 'PRINT 2 + 2'. Writing in BASIC is similar to writing all your Ruby programs in IRB or Pry.
00:14:52 To me, that feels awesome because there's no disconnect.
00:15:05 The interesting thing about this emulator is that it has a machine monitor, allowing me to take a look at what’s happening inside the machine.
00:15:19 There were similar tools available for the C64 itself, but here it's very comfortable because I can run it in a different window.
00:15:35 I know that the BASIC program sits at address 800, since that’s all hex.
00:15:52 Let's look at the program. It looks odd because somehow the print and GOTO are missing, but the data is there.
00:16:03 You can see the string and the line number, so let's explore what that actually means.
00:16:16 In the data, the first number highlighted is a zero; there’s a reason for that—it’s kind of dodgy.
00:16:30 Every BASIC program starts with a zero; that’s just how it is.
00:16:46 The next two bytes are in reverse order. I keep forgetting if it's Big Endian or Little Endian.
00:16:59 This is one form of endianness. The address it's pointing to is the location of the next line of the BASIC program.
00:17:14 The address is 81C, which is on line two; behind the zero, we find the next line.
00:17:26 If you know your data structures, it’s a linked list that represents how the BASIC program sits in memory.
00:17:40 Next, we have the actual line number, and that’s also in reverse order. For example, 0A represents 10.
00:17:56 99 is the token for the print command. Something happened with that line, and it has been so-called tokenized.
00:18:08 There are a few reasons for that: one is memory consumption, and the other is that we can easily turn that into a jump table for the BASIC command.
00:18:21 Next, we have a two (32 in ASCII), which is a space, followed by 22, which represents the opening quote.
00:18:37 Then we have our string, followed by a closing quote. 21 represents the exclamation mark; that's the closing quote then ends with a zero, which is the end of the line.
00:18:51 After that is the next line, which begins with the address for the following BASIC line, but since there’s no next line, it points to a couple of zeros.
00:19:06 The interpreter recognizes this and knows to signify the end of the program.
00:19:20 This makes me think about a modern development workflow.
00:19:34 When you think about it, you write your Ruby code in an editor.
00:19:49 This could be an example setting; you have a textual representation of the Ruby program along with a few dozens of other representations.
00:20:04 After writing it, you save it, and the Ruby interpreter loads this into memory.
00:20:19 I’m not an expert in how Ruby internals work, but I’ve heard that there are experts here who probably know more about this.
00:20:33 Meanwhile, the C64 is very straightforward; you enter the code, it’s in memory, and executed directly.
00:20:45 The journey to find out what the C64 does has taken me several weeks.
00:20:58 So the first source I used is this book from 1988, titled 'The Commodore 64 Programmer’s Reference Guide.'
00:21:10 It contains a complete disassembled listing of the ROM of the C64, which includes both the kernel and the BASIC interpreter.
00:21:23 It’s about 300 lines of disassembled code. I think I might still have the original edition.
00:21:38 There are more modern variants available. A gentleman named Lee Davidson created an English version, online with cross-references.
00:21:54 Unfortunately, he passed away a couple of years ago, which was a tragic loss for the retro computing community.
00:22:10 Then there’s the work of Michael Stiel, who centralized this into a website.
00:22:24 It contains not only the disassembly on the left side but also the original source code from Microsoft, which has been leaked.
00:22:38 The website also incorporates various commentary in English and German.
00:22:55 I found it helpful to understand what’s actually going on.
00:23:08 We’re going to take a quick look at a piece of 6502 assembler.
00:23:23 Don’t be afraid if you've done some form of assembler in university or even in a job.
00:23:38 6502 assembler is odd but relatively easy due to its small instruction set.
00:23:51 What we will examine is a very small piece of code that represents a BASIC warm start.
00:24:05 If you’ve entered a BASIC line, it jumps back to the BASIC warm start and everything starts from scratch.
00:24:18 The format is as follows: the red column is the actual memory addresses, followed by the bytes the code consists of in memory.
00:24:31 After that, we have the disassembled instructions and comments.
00:24:46 We start off with 'JSR', which denotes jumping to a subroutine, putting the current address onto the stack.
00:25:00 This is somewhat similar to 'GOTO' but relies on the stack to return to the original address.
00:25:13 While this jumps to an input location, it also keeps track of the return address.
00:25:26 While this occurs, it copies entered characters into a buffer, setting X and Y to the buffer address.
00:25:41 The program returns with the RTS instruction, which pops the address from the stack.
00:25:54 A stack can have arbitrary values, making it difficult to read code.
00:26:09 The X and Y registers serve as single variables on the CPU, which allows the program counter to hold the current address.
00:26:24 The stack pointer points to the current stack. The stack on the 6502 is limited to 256 bytes.
00:26:39 If too many items are pushed on it, the stack will overflow, leading to loss of data.
00:26:53 The accumulator serves multiple calculations, while X and Y are general-purpose registers.
00:27:07 The status register tracks flags for operations, including whether it’s zero or negative.
00:27:25 Next two instructions, STX and STY, store the contents of the X and Y registers into a memory address.
00:27:38 The odd memory address format is due to the zero page optimization, which enables addressing the first 256 bytes efficiently.
00:27:56 We’re using the zero page RAM, allowing faster access to the routine that resides there.
00:28:10 We jump to a very important routine called CHARGET that increments pointer values.
00:28:26 We transfer the accumulator's contents into the X register; this is not crucial, but it sets zero bits depending on the copied number.
00:28:43 If it's zero, it indicates an empty input return that leads to a BASIC warm start.
00:28:57 Branch if equal translates to Branch if the zero flag is set.
00:29:13 Next, we’re doing an immediate load, where instead of loading from a zero-page address, we’re loading an actual value.
00:29:26 This value is -1, indicating no line number set and thus entering immediate mode.
00:29:41 The Branch if carry clear function checks the carry bit statuses, confirming whether the input is numerical.
00:29:57 If it is designated as a digit, it proceeds to a designated part that tokenizes the line.
00:30:10 This marks the interpreter’s start point for executing started code.
00:30:25 Let’s transform that into Ruby. The code isn’t very complex. We read a line and parse the line's first character.
00:30:50 If it’s a digit, we store it; otherwise, we process it directly and run it.
00:31:08 One key concept in the BASIC interpreter is the tokenizer.
00:31:23 This function compares the current character against a table of keywords.
00:31:41 When a match occurs, it continues forward; if not, it sequentially checks for the next keyword.
00:32:01 The tokenizer adapts keyword inclusion by categorizing special cases in an analogue manner.
00:32:20 The interpreter appears deceptively simple, calling routines based on tokenized input.
00:32:38 One catch is there’s no handling of arguments right now, as they are delegated to routines.
00:32:55 For instance, the print function requests the formula evaluation, which acquires values from expressions.
00:33:11 Explaining this process relates to modern interpreters, as precedence and multiple variables come into play.
00:33:28 When processing parentheses, expressions with numerical values indicate another round of parsing.
00:33:45 When tokens assemble within parentheses, it creates a hierarchy of operations, showcasing the thread of complexity.
00:34:02 The BASIC interpreter demonstrates the journey of arduous expressions, an insightful way to depict programming fundamentals.
00:34:18 BASIC is roughly 1,000-1,500 lines of Ruby code I've configured, and it quite surprisingly works.
00:34:35 I have it set up in a terminal, a normal z shell, and I can run Ruby with no memory issues.
00:34:51 I can simply type my code and list it, and it successfully runs.
00:35:05 Seeing it in action is incredibly rewarding!
00:35:16 What did I learn from this experience? Microsoft BASIC is a work of art. While it has many drawbacks, it illustrates amazing resourcefulness. Understanding it has been quite humbling.
00:35:44 Even today, with abundant resources, we sometimes overlook the past's constraints. The current software landscape is laden with abstraction layers that can weigh down performance.
00:36:00 There’s a necessity to reflect on our choices, especially as systems scale in complexity.
00:36:16 I find it particularly fascinating that companies often prioritize expansion over more efficient solutions.
00:36:36 It’s vital to remember the consequences of our development practices, and to assess their environmental impact.
00:36:55 Last acknowledgments: thanks to Michael Stiel, Lee Davidson, and the VICE team for their emulator development.
00:37:14 It allows us to experience these computers without needing to repair ancient machines.
00:37:27 I also want to thank my beta testers who have provided invaluable feedback, making this talk significantly better.
00:37:43 As I finish, I'm Jan, co-founder of De Fu, where we automate dependency updates. I'm a freelance web developer and digital artist.
00:38:06 If you’re interested in connecting, I'm looking for opportunities. You can find me on the Fediverse and other platforms.
00:38:30 Thank you all very much for listening!
Explore all talks recorded at Baltic Ruby 2024
+13