LA RubyConf 2015

Data Migrations with MagLev

Data Migrations with MagLev

by JohnnyT

The video titled "Data Migrations with MagLev," presented by JohnnyT at LA RubyConf 2015, focuses on the features and capabilities of MagLev, a Ruby implementation built on the Gemstone S virtual machine. The presentation begins with an introduction to MagLev, which allows for object persistence without the need for complex object-relational mapping, hence simplifying the data handling process. The following key points are discussed throughout the presentation:

  • Introduction to MagLev: MagLev is a Ruby implementation utilizing the Gemstone S VM, known for its long-standing reputation and suitability for large-scale data operations.
  • Key Features: MagLev offers significant features, including:
    • Simplified persistence of native Ruby objects.
    • Management of caching and validation.
    • Transactional memory, ensuring that operations are safely committed or aborted.
  • Data Persistence: The speaker illustrates how persistence works in practice through live demonstrations, showcasing how different Ruby objects, including procs, can be persisted within the MagLev environment. MagLev’s reachability mechanism helps define which objects should be persisted, enhancing convenience in data sharing across VMs.
  • Migrations: The presentation delves into practical migration examples, such as adding a published date to blog posts. The migration process is clarified by demonstrating how to batch-update objects that lack certain attributes, thus improving the data model while maintaining functional integrity.
  • Code Migrations: An example involving the restructuring of a blog post class illustrates how to effectively perform object migrations while retaining references to existing persisted data.
  • Current Updates on MagLev: The speaker shares insights on the state of the MagLev project, discussing past halts in development due to company acquisition, but highlights current work aimed at production deployment, including improvements for compatibility with Ruby 1.9.3. Additionally, the speaker notes the significance of community involvement in enhancing documentation and onboarding processes for new users.

The video concludes with an open Q&A session, addressing user concerns about ease of use, commit conflict resolution, and the management of object IDs, reiterating the accessibility of MagLev for Ruby developers, regardless of prior Smalltalk experience. Overall, the audience gains insight into the potential of using MagLev for data migrations, while also learning about the project's ongoing development and community support.

00:00:24.039 Can you hear me? Is the mic good? Great. Okay, so data migrations with MagLev. I'm excited, super excited to be here talking about MagLev. I love Smalltalk.
00:00:30.240 MagLev is built on Smalltalk, which we'll get to in a second. Let's see... why is that there? There we go.
00:00:44.399 Okay, so quick outline: I'm going to introduce MagLev, kind of what it is, and then go through some data migration examples and concepts. Finally, I'll end with a quick status update and some FAQs about the project.
00:01:10.720 So, first, what is MagLev? It is an implementation of Ruby that runs on top of the Gemstone S VM. Now, for most people, that doesn't mean much, so what is Gemstone S? Gemstone S is a Smalltalk platform that has been around for quite some time.
00:01:30.390 One of their notable customers has been JP Morgan, which uses Gemstone for derivatives trading. It is a platform that can scale, deal with huge datasets, and operate on them very quickly.
00:01:42.159 Gemstone S is a virtual machine built from the ground up with object persistence in mind. Caching is also a key feature, providing ACID properties.
00:01:57.440 The first version of Gemstone came out in 1986, which is ten years before Ruby or Java. Interestingly, Ruby and Java are the same age, both emerging in 1996.
00:02:07.880 Gemstone S is a virtual machine that has been around for a long time. A quick blurb on why you'd want to use MagLev: we've seen the quote a few times from Mats saying, 'Computers are our slaves,' emphasizing the focus on writing code and software for people, not machines.
00:02:36.280 One of the things that MagLev allows you to do is to free yourself from thinking like a distributed system or like a computer. Notably, you don't need an object-relational mapping layer; you can simply persist your native objects.
00:02:50.879 MagLev also manages caching and validation, which are typically difficult problems. This really lets you focus on your application and what you care about, rather than boilerplate code and various layers of abstraction.
00:03:15.319 To illustrate a typical Rails stack: Generally on the server, or possibly on other servers, there's going to be a data store and often a memory cache, typically something like Memcached.
00:03:40.200 Then, you'll spin up your VM, which could be MRI or JRuby. As soon as the VM is loaded, it logs into the persistence layer to get a connection back, allowing you to start operating with that data.
00:04:16.280 In MagLev, we have different terms. A key concept is the 'stone,' which is akin to your data store where all your data resides. Surrounding the stone is a shared page cache. When you log in and attach to this shared cache, your Ruby objects persist directly in the stone.
00:04:50.039 Another notable feature is that the code is always running within a transaction using transactional memory. To get a fresh view of your code, you either successfully commit changes or abort the transaction. Aborting isn't a bad action; it's a way to refresh.
00:05:27.400 Now, I'm going to try some live demos to show how to persist a basic string. MagLev allows you to persist almost anything, though some things, like I/O handles, mutexes, and semaphores, don't really make sense to persist.
00:06:06.360 Let’s fire up an IRB shell. MagLev persistence works by reachability; anything that a persistent object references will also get persisted. MagLev has a persistent root, which is a Ruby hash that serves as a convenient place to store objects.
00:06:50.080 Now, we’ll persist a string: 'Hello, world.' Right now, if we check the root, it’s empty because we haven't committed our changes. Let's commit and verify if it’s available. If we haven't received a fresh view of the changes made by other processes, we can abort and check the root again.
00:07:50.159 Now, we can also create a proc and persist it. If we commit, we can call the proc from another VM, showcasing the persistence of that proc. This exemplifies how MagLev allows for convenient data sharing.
00:08:39.160 Now, about how MagLev knows to persist referenced objects through reachability—let's look at a short case as we create and persist a proc that has messages.
00:09:03.160 Next, let’s talk about the hat trick demo that AI Bryant showcased previously. Here, we're creating two classes: a hat and a rabbit. With MagLev, whenever crafting code changes, it is essential to signal the VM to indicate that those changes should be persistent.
00:09:58.280 We'll run the code to create the hat and the rabbit and commit them to the stone. By inspecting the data, we can confirm the persistence of our new instances.
00:11:11.040 Now that we have a grasp on the MagLev persistence, let's shift our focus to migrations. With MagLev, you are able to commit plain Ruby objects to the stone, making them persistent.
00:11:32.600 We're accustomed to working with transient objects; however, with MagLev, objects are always accessible. Let's consider an example of a blog where each blog post consists of a title and body text.
00:12:34.360 We will define a blog post version and utilize a simple command line client to showcase the data. The client will print out our saved posts.
00:13:01.840 Upon creating posts, we decide to enhance the class by adding a published date. For this, we'll modify the existing blog post class by introducing a date variable.
00:13:59.679 Now we will update our previous posts to include a published date and ensure that they can be accurately displayed in our client.
00:15:00.480 If we encounter an error due to some posts lacking the date variable, we’ll perform a migration to add that data to those instances.
00:15:45.240 This involves looping through all blog posts and setting a default date for those that are nil. After running the migration, all previous posts should now have a date assigned.
00:16:26.640 Next, we will implement a method from Smalltalk called 'become.' This method is powerful yet can be risky if not properly executed, akin to Ruby's method missing.
00:17:56.000 To illustrate, we might want to refactor the blog post class into a module called 'Blog' and shift its attributes. Instead of title and text, we'll transition to subject and content.
00:18:40.960 We'll migrate our persisted objects into their new format, reassessing the class structure while ensuring that all references to these objects remain intact.
00:19:20.200 As we progress, we can confirm that the instances reference the correct class format, ensuring smooth functionality moving forward.
00:20:05.520 Before concluding, I'd like to share some updates about MagLev. The project faced a halt after its parent company was acquired by VMware, freezing significant development.
00:20:54.800 Although we are not yet implementing MagLev in production, we're aiming for a production setting by the end of the year at CredTera.
00:21:35.600 Currently, MagLev is nearly 1.9.3 compatible, and we've been working to fix various parser issues. While MagLev is open-source, Gemstone itself is a commercial product.
00:22:14.200 MagLev has a community-friendly license capable of supporting medium-scale applications, but scalability can become a concern with its limitations on connected sessions and shared memory.
00:23:20.000 The community is working on improving documentation, troubleshooting the building process, and aiding developers to ensure a smoother onboarding experience.
00:24:06.000 Finally, I'd like to open the floor to any questions. You don’t need prior Smalltalk knowledge to get started with MagLev. Users can get quickly familiar with the environment when starting out.
00:25:18.720 As for handling commit conflicts, MagLev raises exceptions for conflicts, providing feedback for resolution, thus giving more hands-on control over persisting objects.
00:25:55.160 When pulling persisted objects, they retain their original object IDs, which ensures that they are managed efficiently in memory.
00:26:30.560 Thank you for your time. I'm glad to be here, and I'm happy to answer any remaining questions!