00:00:28.960
Hello everyone, I hope you're all back. We're going to start with the next speaker right now. He is Mark Chao from RubyConf Taiwan. He will share about translating XML and EPUB using ChatGPT. Welcome!
00:01:12.000
Hi everyone! So, this talk is not AI-centric. It's just a usage of GPT, the technology we use in Taiwan. Sorry, my name is Mark, and I go by the name 'La La La' on the internet, specifically on GitHub and Twitter.
00:01:36.079
I'm a backend engineer at GitHub, and I'm looking for someone to review my pull request that has been sitting there for five weeks. If you're a contributor, please let me know.
00:02:01.439
I belong to a hobby group called Saku, which is an otaku group. We create things related to anime and earlier this year, we decided to do some research on Masaaki Yuasa, an anime director. I recommend his film called 'Mind Game'; it's really trippy.
00:02:15.280
To do this research, I discovered there is a magazine called Urea that published a special issue on Yuasa, which contains a lot of valuable interviews. Of course, my group wanted to read it, but we couldn't read Japanese, so we needed to rely on machine translation.
00:02:43.120
The ebook is a collection of HTML pages, and since HTML is a form of XML, we needed a translation service. I found two services: one is DeepL, which everyone is familiar with, but it costs money and doesn't accept this credit card, which is a shame.
00:03:09.319
The other one I found is a free Japanese translation service called 'Auto Translation.' If you want to translate Japanese content, I suggest trying it out. I tried writing a draft using both services, one for DeepL and the other for Auto Translation.
00:03:27.319
As you can see from the code, there are lots of simple functions—one for DeepL and one for the other service. There's no test involved; I just copied and pasted one function to the other without adhering to object-oriented design. But I was able to translate the content to Chinese, and we were happy.
00:04:07.239
However, I thought I could turn this spaghetti code into a modular design. My idea was to create one gem that focuses on translating XML and another to handle translation of dynamic content.
00:04:34.000
Now, I will be talking about my XML translation gem. The name came from my visit to a place called Aryama Hoto in 2017 where I had a dessert called 'Natsukotou.' It’s a grapefruit that has the center removed, turned into jelly, and then reinserted. This process is quite similar to XML translation: you pull out text, translate it, and then reinsert it into the XML structure.
00:05:40.680
That’s why I chose this name for my gem, even though there was a slight misunderstanding during a presentation when my Japanese friend pointed out they had no idea what 'Natsukotou' meant without context. So, the moral of the story is, if you want to use a foreign language to name something, it's crucial to consult an expert in that language.
00:06:45.320
Let’s start with the first topic I want to discuss, which is the API. My goal is to maximize the customization of the gems to enhance translation.
00:07:00.400
I want to allow users to choose different translation engines and different parsers. You will type your input at the top, which is passed layer by layer down through filter layers. The filters can modify the text if they wish. Once the button layer is reached, it will be translated by the language model.
00:07:38.120
The translated text is returned layer by layer and the filters can also modify the resulting translated output. In code, this will look like the middleware pattern, which you may be familiar with if you've written Rails applications.
00:08:01.440
In this case, I've created a layer called 'Filter' and once that's set up, I can submit the source text and the desired language for translation. It then returns the translated XML to me.
00:08:30.000
I want to utilize Ruby middleware. The concept of Ruby annotation is not related to Ruby the programming language, but it is a markup where you can place text on top of other text.
00:09:02.000
For example, I can place on text above a Japanese word to demonstrate pronunciation. However, this can be problematic for translators, as they may think they need to translate each character separately rather than as part of a phrase.
00:09:59.520
To improve the translation, I want to join two words and present them as one single unit through middleware. The middleware starts with an initial method that will take app parameters for the middleware configuration.
00:10:40.560
Then, I need to access every single Ruby XML tag. For each Ruby tag, I will first remove all the unnecessary tags, which we do not want here. The result after this process is that we have well-structured XML to pass to the translation service.
00:11:11.920
Once we pass this XML to DeepL, we receive a better translation, demonstrating the effectiveness of middleware in translation.
00:11:31.560
Now, let’s talk about the user interface design for my gem. When we think about command line parameters, we usually think about flags and arguments. They work fine for small tasks, but once your program grows, like ImageMagick, it can be difficult to remember which arguments do what. My gem is similar; there are too many dynamic states, and it becomes hard to function just by using simple arguments.
00:12:51.159
Therefore, I think a wizard interface is more appropriate. It would essentially help generate my middleware and then utilize it through a Ruby file for translation.
00:13:01.120
My program is divided into two phases. The first phase asks the user to choose which middleware and translation engines they want. For each of the middleware, we will go through each of the initial arguments and ask the user to provide them.
00:13:30.120
Once I have configured three settings, I can use ERB to generate the necessary Ruby file. In the second phase, I will perform actual translations and ask for the target language.
00:13:59.240
I'll then call the maximum with the XML HTML file. If you are familiar with Ruby, you know how this complicates things. I enter the access token and a password to select the translation service.
00:14:18.959
You can select two options, or just one if you prefer. Then, the program will ask you which language to translate into, and, finally, what file you want to save the translations to.
00:14:38.899
This is a brief outline of my program, showcasing the user interface design.
00:15:06.959
How we can implement this user interface relies on understanding which middleware is available. I could scan a directory to find all the middlewares, but I believe developers would prefer to know explicitly which middlewares are accessible.
00:15:23.720
To achieve this, I used an auto-loading mechanism that allows developers to register their middleware by using a method that marks middleware for usage.
00:15:49.760
Once we know which middleware we have, I need to ask for the necessary arguments that users must input. For example, they might need to enter an API token for DeepL.
00:16:05.840
Each middleware should have an initialized method that specifies what arguments it needs. I initially thought I would have to use RBS or similar for better type checking, but I discovered I could just use documentation to provide necessary instructions to users.
00:16:36.320
The documentation is crucial; if you're using a wizard to assist users, you need to clearly explain what they need to provide.
00:17:00.600
Firstly, they would provide a file path, and then from there, I can look into the yard registry for information on the DeepL gem.
00:17:40.680
Later, I would look through its methods for an 'initialize' method. Once I have the 'initialize' method, I will call text, which grabs the comments and necessary texts to produce a clear instruction for the users.
00:18:02.680
This method ignores parameters that users do not need to enter and leaves us with the essential arguments to capture.
00:18:34.160
My wizard needs to be user-friendly, so I decided to utilize TTY prompt for a pleasing, interactive user interface. This allows the users to select from different options, such as predefined selections or input prompts.
00:19:15.279
For example, I ask the user to choose a translation engine from a list of available candidates.
00:19:59.400
Once I've confirmed a selection, I will call a method that shows all available parameters and their documentation.
00:20:01.919
Next, I do the same for middleware layers, allowing multiple selections. For each middleware, I also initialize parameters to ensure everything is clear.
00:20:44.560
Before asking users about initialization parameters, I specify the defaults for each to give them clarity.
00:21:05.560
For each parameter, I check if it is optional or required, and if it has a default value. These insights guide users on what information they need to input.
00:21:32.360
Once I gather all the necessary inputs, I compile everything into a parameters configuration file. This includes the middleware details along with the users' inputs for each of the arguments.
00:22:48.560
The output from this ERB file establishes the middleware configuration file so that users do not have to re-enter the same information each time they wish to translate something.
00:23:20.048
Instead, they can simply specify the middleware file moving forward and initiate translations without hassle.
00:24:10.640
Now, regarding the integration of ChatGPT in translating XML, I thought of using the structured output but ended up opting for a more straightforward approach.
00:24:56.679
My main challenge was to instruct the model to handle XML structures accurately. I had to create very precise prompts to ensure it only returns XML.
00:25:12.919
I found that if I instructed it to translate XML from English to French without additional constraints, it sometimes responded inaccurately. I had to create a robust prompt indicating that the return must only be the translated XML.
00:26:19.920
Also, I want to discuss the idea of a glossary file. A glossary file is essentially a dictionary provided by the user for specific translations, sometimes following a limited format.
00:27:14.000
The approach I have in mind is to substitute text in the document with user-provided alternatives before sending it for translation.
00:27:55.919
This allows me to manage what portions I want to translate effectively. However, note that this method becomes tricky with models that don't have nice handling for exclusions, such as skipping certain texts.
00:28:48.160
Another issue I've noted is integration with large models like GPT-4 for complex translation tasks. While I'm still employing version 3.5, it has its limitations and does not always read the previous context, leading to potentially inaccurate translations.
00:29:48.200
To provide a solution for this, I plan to implement a more contextual architecture to integrate better with prior text structures.
00:30:40.000
In the concluding part, I want to borrow some ideas from Java's Option Data Library. They have excellent functionality for combining translations side by side.
00:32:45.919
As for EPUB processing, I evaluated various libraries and considered their current maintenance status. For XML processing, I believe the main libraries to focus on are Nokogiri and REXML.
00:33:19.000
I chose certain paths for managing dependencies while ensuring functionality across platforms, which informs my decision-making.
00:34:14.560
To conclude my talk, I’d like to focus on my future plans, including support for alternative models and addressing integration challenges.
00:34:56.040
I appreciate your attention, and if you have any questions or thoughts, please feel free to ask!
00:35:41.560
Thank you! Does anyone have questions for Mark?
00:36:07.000
Participant: For the Gem to work properly, does the source material need to be in a specific format? Mark: Yes, it requires a proper structure. It only supports text-based formats, not images.
00:37:07.760
Mark responds to other audience inquiries. Thank you for your participation. We are now moving to take a tea break and will reconvene before 3:30 PM.