Frederico Linhares

High-performance real-time 3D graphics with Vulkan

RubyKaigi 2023

00:00:02.639 Hello, my name is Frederico Linhares, and I will be speaking about how to use Ruby to create 3D graphics in real-time using Vulkan.
00:00:06.180 Here, I have an example of a functioning game running in Ruby. This software you see here is built on Ruby, and in the backend, it uses Vulkan to handle graphics. As you can see, there is a texture on the top with two polygons, some text rendering, and also some 3D cubes featuring dynamic lighting. Additionally, it captures input from my keyboard for interaction.
00:00:17.940 So, how does it work? As Yukihiro Matsumoto said, Ruby is simple in appearance but very complex inside, just like the human body. When I was developing this engine, I tried to use the same pattern: to create an interface that is simple, yet complex on the inside. This approach hides all the complexities of how the engine operates. Here’s an example of code using the candy gear game engine. It’s so simple that if you are familiar with Ruby, you could probably figure out what it’s doing just by looking at this code. The first line creates a texture from an image. Textures provide color to 3D graphics, while the second line creates a mesh, which is the format of the 3D objects. Then, we establish an association between the mesh and the texture. If you've ever played a game with several guns or characters that have different colors, but are the same objects, you know how this association works. The next line involves an instance, which is how we represent multiple copies of the same object in different positions—like grass, where you have lots of grass scattered in various places.
00:01:03.780 Fields are regions of the screen that we use to handle different aspects of the game. For example, if you are playing a game with multiple players, each player has their own view, and there is a camera position for each view. This determines where in the game world the action is occurring. There is also the tick model, which handles one object at a time. I will explain more about the tick model in another slide. So, why am I pursuing candy gear, and why is it beneficial for developers? We currently have many game engines available in the industry, such as Unreal and Unity, but I find them bloated. They incorporate too many features that may make development easier for newcomers, but can constrain experienced developers, sometimes slowing them down instead of speeding them up.
00:01:45.060 There are many developers who prefer a thinner interface. This is why in Ruby, we have libraries like Rubinius that are a bit more extensive, but we also have thinner alternatives like Sinatra. Ruby is a multi-purpose programming language, while many of these game engines use specific languages like Blueprint for visual scripting or Godot. If you learn those languages, you can only apply that knowledge within those engines. However, if you learn Ruby, you can apply your skills in a variety of contexts. Therefore, I believe it is preferable to work with a language that has various applications rather than spending time learning something that you can only use in one place. Also, the Blueprint visual scripting in Unreal is a well-known engine but comes with a caveat: it is a binary blob. Visual scripting is binary, making it challenging to manage when working with version control systems like Git.
00:02:02.940 By using a language like Ruby, you enjoy better control over versioning. Furthermore, candy gear is good for our community, as Ruby is primarily associated with web development. Whenever I see someone working with Ruby, it's often in the context of web-related projects. I believe we should expand our language's applicability to other areas, including gaming. When we test the language, we mostly focus on web applications since that is what we primarily do. Therefore, having Ruby utilized in diverse fields allows us to test its effectiveness in various situations. Additionally, if there are successful engines built with Ruby, we can attract game developers to our community and increase the popularity of our language.
00:02:30.060 When I started developing this engine, I initially tried to create it using YARV, as it's the most popular Ruby implementation. However, I encountered a significant problem: the Ruby code had to call my C++ game, leading to numerous issues with fine-tuning the garbage collector in Ruby. To resolve this, I moved the engine to MRuby, which is more advantageous because, first, the engine starts and then provides an API to the Ruby code, only invoking Ruby when necessary. This way, I merely need to share resources between the Ruby code and the game while allowing the engine to perform all the heavy lifting without Ruby needing to be aware of the underlying processes.
00:03:01.380 The engine follows the Hollywood principle, which posits that instead of the game calling the API of the game engine, it is the engine that calls the game. Here is an example illustrating how this API works: first, there is a config method that the game must provide to the engine. This call occurs before the engine is fully initialized and allows the game to specify configurations like the desired frames per second, the game's resolution, and its name. After calling the config method, the engine completes the initialization. Then, there is an init method, where we can create items such as textures and models that cannot be established in the config phase since the engine isn't fully initialized at that point.
00:03:51.100 Lastly, there is the tick method, which is called every single frame—this is where the main game logic operates. There are three callback methods: key down, key up, and quit, which are triggered when the game receives input from the player—such as a key press or release. On another note, thread management is a crucial aspect of game development. Starting and stopping threads can be resource-intensive, so one technique we employ in video games is to initiate all threads at the beginning and only shut them down at the end. This means we only handle jobs through these threads, keeping the number of threads equal to or less than the number of processors available; otherwise, we risk slowing the game down.
00:04:20.960 Internally, the engine needs to initialize Vulkan first. Vulkan instances are required, each of which can contain one or more physical devices. These physical devices correspond to the graphics cards available on your computer. If your system has multiple graphics cards, the interface provides multiple physical devices and you can select which device to utilize by creating a logical device. Each logical device has its own set of queues, which are how the graphics cards process jobs. Before we can send a job to the graphics card, we must create command buffers. Command buffers are like instructions for the graphics cards, detailing what the engine needs to do to enhance performance. The engine caches every draw command issued by the game, and at the end of the tick method, it generates several command buffers to send to the graphics card, facilitating parallel processing across multiple queues.
00:05:15.360 When we handle 3D graphics, it is essential to employ graphics pipelines. Graphics pipelines serve to optimize performance, enabling more efficient handling of rendering tasks. Each type of image we want to manage in the game utilizes a different graphics pipeline. For instance, some graphics pipelines are dedicated to rendering 3D models, while others handle 2D models, such as sprites. Various pipelines are also designated for more complex objects, like vast bodies of water, including lakes and rivers. Let me explain how the 3D graphics pipeline operates. When dealing with a textured model, each vertex represents a point in 3D space and includes UV coordinates for texturing. These coordinates allow images to map correctly onto 3D objects.
00:06:15.360 If textures aren't required, we can simply draw wireframes, which do not necessitate UV coordinates—just spatial coordinates. Each polygon in 3D space typically forms a triangle, requiring three vertices, and a mesh is comprised of a group of polygons. To save memory, we can utilize shared vertices to create multiple polygons, indexed through their vertex references. Furthermore, 3D graphics cards can also handle 2D objects as well—these objects are just simple sprites, which can be represented as rectangles from images. For example, you might have three distinct images at the top, with the corresponding three sprites used to represent those images in the game.
00:06:55.500 Additionally, there are cube maps, which are employed for managing backgrounds in games, usually for rendering the sky. Since we can observe the sky from any direction, it must be represented accurately. To accomplish this, six images are utilized to create a sample that appears as one cohesive image. This approach ensures that when you look toward a corner, it doesn't seem disjointed; rather, it renders smoothly, creating an immersive experience. The cube maps employ what is known as the 'smooth effect.' Now, let's look at the structure of a graphics pipeline. As mentioned earlier, there are input indirect buffers, indexed buffers, and vertex buffers that facilitate data passing through the graphics pipeline. The descriptors attached to the graphics pipeline provide optional data that can enhance the rendering process. Within the pipeline, there are various stages; some are mandatory, while others are customizable, particularly regarding color outputs.
00:07:56.940 Before any data is attached to the graphics pipelines, it must first be loaded, often requiring decompression to save memory. The game engine converts the data to a format compatible with the graphics cards and transfers it to them. Importantly, the CPU and GPU usually possess separate memory spaces, but in certain platforms—like the PlayStation 5—they share the same memory architecture, which streamlines operations. Ultimately, once the data has been loaded and processed, it is returned as a Ruby object that the game can utilize. Every graphics pipeline has a unique layout that specifies which kind of data will be associated with it. This layout is divided into descriptor sets that allow the attachment of data as needed. For instance, the 'world' descriptor contains ambient light data—light that illuminates from all directions—as well as directional light, which comes from a single source.
00:09:26.640 The lighting effects, such as the brightness of a cube face during rotation, are influenced by this data. We also have the view descriptors that define the viewpoints used for rendering, including both 2D and 3D images. The camera configuration includes its position, rotation, and projection—a projection dictates how the scene is viewed, whether in a standard perspective or an isometric view where depth is minimized.
00:10:09.660 Model instances can represent multiple distinct objects in the same physical space. For each object you intend to render, an instance must be created and passed to the command buffers. Here's an illustration of the graphics pipelines and how each stage functions: the first stage involves the draw command, which specifies the tasks to be executed. The next stage is the input assembly, which collects all relevant data such as vertices and indices and represents these on the screen as triangles corresponding to the desired objects.
00:11:34.680 In the initial stage, we receive the object, and in the second stage, the graphics pipeline transforms it into the appropriate position as defined by the camera perspective. This transformation occurs in the vertex shader. The tessellation stage—although I am consolidating a more complex series of operations into one for simplification—involves subdividing polygons into smaller ones to apply transformations, resulting in smoother renditions of objects. This technique is advantageous when rendering details on closer views while optimizing performance by keeping distant objects simpler.
00:13:19.560 The geometry shader provides an additional layer of versatility, allowing for more advanced handling of objects, though it tends to be more resource-intensive. Following this stage, you complete transformations, leading into vertex post-processing—which finalizes the assembly of objects after all transformations have been applied. This stage will remove any portions of objects that fall outside the edges of the screen, cleansing what would otherwise be unnecessary additional processing time.
00:14:54.960 Thus, up until now, objects are strictly geometric coordinates in 3D space. In this stage, the graphics pipeline will convert them into actual images by applying rasterization, which is the process of translating those coordinates into pixel formats. The fragment shader subsequently colors the generated images and establishes depth values for each object, which enables the system to determine which object is visually in front of the others during rendering. Finally, colors are blended at the last stage, allowing the rendering engine to depict objects correctly based on their spatial orientation.
00:16:32.640 There is still a considerable amount of work left to be done on the game engine; I am far from completion. For example, I still need to finalize the threading system, ensuring that tasks can be delivered in parallel to the Vulkan back end efficiently. Moreover, I aim to implement multi-GPU support, so the graphics processing can leverage multiple graphics cards effectively, enhancing rendering performance.
00:18:32.440 I also need to create a standardized file format for the engine so that it can reliably load and process resources. Currently, this aspect remains unstable. Additionally, I plan to build an audio engine from scratch, although for the moment, I'm using the SDL library to handle all audio functionalities. Going further, I plan to incorporate MIDI synthesizers to streamline sound creation for games. As of now, the game is running solely on Linux using Vulkan; however, I intend to adapt it for Windows as well and potentially expand support for Nintendo and PlayStation systems.
00:19:47.020 It's worth mentioning that while SDL has made certain elements simpler previously, I now find the need to remove it to optimize overall engine performance. If you're interested in learning more about this engine, or understanding how it operates, the overview I've presented today is just the surface. I have included references for further reading—the Vulkan tutorial is perfect for beginners with no background in 3D graphics while the Vulkan programming guide is suited for those with some foundational understanding.
00:21:05.160 There is also the Coffee game engine, a series of videos that provide a comprehensive explanation of everything I discussed here, but at a slower pace. If you are new to game development or 3D graphics, you can follow along there, and I highly recommend checking out my engine provided in the links below if you're interested in reviewing the source code or considering contributing, as there’s still much to accomplish. Thank you.
00:22:15.940 Does anyone have any questions?
00:22:20.720 (Applause) Thank you.