00:00:03.480
Good day, everyone! It's awesome to be here. Just a bit of housekeeping before we start: I do have a bit of a tremor, but that's not because I'm nervous; I just have a slight neurological condition. I'm actually really excited! I've been working on this project for the whole week, building demos every day. I even came up with a new idea for an AI demo, but I had to stop because it was getting out of hand.
00:00:14.280
Before I dive into today's topic, which begins with my wife and the situation we’ve had, I really want to talk about the AI ecosystem in relation to Ruby. There are things we can do now that we couldn't do a year ago, and I want to show that it’s actually pretty easy.
00:00:38.760
Before we get into that, we had our game out back that some of you participated in. You got your ducks and some received gift cards. We had some gift cards left, which I wanted to present to those who deserved it most over the last two days. I would like to present those gift cards to the Ruby Australia team.
00:00:56.440
As they come up, I’d like to play a quick song. It's the last song recorded by my wife before she lost her voice. It’s a rough recording, but it's very special to me. While they come up, I would like Errol to present them with the gift cards. Thank you all; it’s been an amazing two days, and you all did a fantastic job!
00:02:47.720
So, I thought, 'I need to do something about this,' which led me to the technology I’ll discuss today. I've always been interested in AI, but this situation pushed me down the rabbit hole to find solutions. I want to play that last bit again because it truly captures her talent.
00:03:41.659
I found a one-minute interview she did many years ago. Other than that and some outdated voice messages which were low quality, that was all I had. I'd like to play a segment of the interview so you can hear her voice.
"Hi, I’m Peggy, and I’m a mom of three girls. I look after my daughters full time. Most of the time, I'll be helping them do their homework or they'll help me bake; they love baking cookies, cupcakes—all sorts of things."
00:04:07.600
I took that one-minute clip and used a service called 11 Labs, which is the leader in voice cloning technology. Let’s take a look at how it compares: "Hello Ruby Australia! I hope you are all having an amazing time this week. I'd like to introduce to you my husband, Kane. He has been working very hard on artificial intelligence for the past year. I hope you enjoy his talk today."
00:05:08.560
There's a slight British tang in the AI's voice. I’ve noticed that the AI models tend to take the Australian accent and British-ize it a little bit. If I had more time with audio, I'd be able to refine it further. Luckily, I also have spoken with the 11 Labs crew, and they have the original recordings with about an hour's worth of audio.
00:05:35.080
With this professional cloning service, we can train the AI extensively, and I believe that the results from an hour's worth of quality recordings will be highly impressive.
00:06:00.880
My philosophy through all of this has been: "When life gives you lemons, make an app." The next piece of AI I’m working on is just a phone call away because one of the biggest challenges my wife faces is communicating with call centers.
00:06:27.600
I’m currently developing something for that purpose, and it's still in progress. Caitlyn, I think you have a phone call coming in now.
I'm breaking the cardinal rule of live presentations by doing a live demo.
00:06:43.760
If you can come up here, I'll run my script and show you something special.
"Hello, CN! It's Matt calling all the way from Japan. Hello to all my Ruby friends from Australia! I hope you have been having an amazing time the last two days. Caitlyn, I heard you and Toby did a great job MC-ing the event, and the audience should give you a big round of applause!"
00:07:00.560
So, Caitlyn, there’s a gift card for you, and Toby already got one. Otherwise, I would have given you one as well.
00:07:44.360
Now, we do need to discuss the dark side of AI. Think about how easy it is to clone someone’s voice; it brings up a lot of questions about where this technology can lead us. AI is a double-edged sword, particularly with deep fakes, voice cloning, and the new lip-sync AI emerging. In six months, we may not be able to tell the difference between a real person online and a fake.
00:08:16.880
This is why it's important to remain vigilant and validate information from multiple sources, not just what we see in a single video. To illustrate this, I also cloned my own voice and created a message for my daughter and sent it to my accounts team.
00:08:57.360
Here’s an example of the message intended for my daughter: "Hi Emily, there has been an emergency. Mom is in hospital, and I can't pick you up from school. My friend Tom will pick you up, so please go to the IG near the school. He'll get you in his white van. Something has happened to Mom."
00:09:22.120
My daughter said she would have known the difference had she heard that as a voicemail. I tried another message for my accounts team, telling them about the urgency of a bill payment. If it works out, I’ll see how they respond to that.
00:09:54.160
This technology has real-world implications. A company in Hong Kong recently lost $10 million because an accounts person received a deep fake call from someone claiming to be the CFO. They were convinced by the AI-generated voice that they needed to transfer funds immediately.
00:10:36.560
This highlights the need for companies to innovate quickly, as technology is outpacing traditional methods. Regarding the architecture of what I built, I utilized Ruby for the backend, along with 11 Labs for voice cloning and Twilio for call transfers. It’s quite simple.
00:11:01.040
With just an API call to 11 Labs and some text input, it quickly returns audio as a binary file that is then processed through Twilio for the phone call. One hurdle I'm still trying to solve is real-time voice input through the web.
00:11:26.560
I’d like to achieve something like a Google Meet, where my wife can type and it translates things in real-time but getting the audio through the browser remains a challenge.
00:12:06.320
Now, I want to show you the code behind this technology, not necessarily for you to understand the specifics, but to prove that voice cloning is accessible. Once you’ve trained the model, you just make an API call and send the necessary data.
00:12:57.720
This ASP.NET code is straightforward, demonstrating how AI is not just a phone call away but simply an API call away. Previously, tasks requiring machine learning experts would take extensive training and effort, but now Ruby developers can accomplish this with ease.
00:13:37.440
One area I started exploring was the Vision API, known for its complexities. Creating a laundromat quality control demo took me only an hour. The concept involves taking photos before and after cleaning garments to leverage AI for quality checks.
00:14:51.720
In my demo, I show the AI running a script to determine if a stain is gone or still present, along with a quality score for the cleaning. I didn’t even train the AI model, and I managed to get good results in under an hour!
00:15:43.360
The architecture of the AI system for this demo involves a Ruby backend sending an image URL and prompt to the Vision API, which then returns a response. This process is straightforward and doesn't require extensive machine learning knowledge.
00:16:44.680
Now, I want to outline some basics about neural networks. While I won't go too deep, understanding how these models work is essential when discussing current AI technologies. My journey into AI started when I was around 15, working with a clunky chatbot that struggled to engage in meaningful conversation.
00:17:33.800
We also created a fintech model years ago that required thousands of lines of code. Today, you can accomplish the same results in about 20 lines. We've also been collaborating with the New South Wales State Library to translate oral histories using AI, achieving around 80% accuracy through community corrections.
00:18:31.720
Regarding AI models, you might hear about parameters—GPT-3 has around 500 billion parameters, while GPT-4 is rumored to reach up to 1.2 trillion. The more nodes you have with various weights and biases, the more complex patterns the model can determine.
00:19:15.120
Some models can classify images with just a few lines of code, while complex tasks require deep learning models. The power of AI lies within these models' abilities to analyze large data sets for patterns.
00:19:47.960
As AI continues to develop, it raises concerns, such as the uniformity of AI-generated content. AI often finds patterns among previous data, leading to a loss of unique voices in things like blogging. The rise of chatbots has made all AI-generated blogs sound similar, which is disheartening.
00:20:49.520
So, what I want to cover in the remaining time is how to get started with AI: prompting, knowledge retrieval, and utilizing open-source options. Prompting is more of an art and requires trial and error. Precise instructions yield precise responses while vague instructions lead to vague outputs.
00:21:40.720
When working on production-grade applications, take time to fine-tune your prompts—change words, try different phrasings, and optimize until you get the desired output.
00:22:44.120
Json schema allows you to define how the AI should respond using structured guidelines. This is particularly useful when you need precise data. Prompting has evolved; as AI models now handle vast context windows, we can explore intricate details.
00:23:42.800
In terms of knowledge retrieval, AI systems can pull data from your knowledge base. For instance, querying API documentation or customer info becomes seamless with AI assistance. This includes applications like Zendesk where you can quickly access specific tickets or past client interactions.
00:24:44.440
Overall, AI knows the meaning behind data, leading to more intuitive operations. Knowledge retrieval systems enhance traditional keyword searches by leveraging semantic understanding—this allows for better responses based on user questions.
00:25:39.760
Today, I quickly set up a reactive AI system leveraging our blog content—asking about running AI models returns accurate sources. Similar systems are emerging in places like AWS, where you can query endpoints directly to save time.
00:26:53.760
Vector embedding is vital in searching via meaning and context. Concepts in machine learning create a spatial relationship, allowing AI to determine similar meanings based on varying phrases. For example, it's able to realize the connection between words like 'cat,' 'kitten,' and 'dog'—useful for determining user intent and retrieving relevant documents.
00:28:15.880
When analyzing large datasets, we can define similarities and retrieve specific information quickly. AI is just a prompt and knowledge retrieval system away—efficiently querying databases based on contextual understanding.
00:29:36.640
As I conclude, I stress again that AI is just a prompt, knowledge retrieval, fine-tuning, and API call away. We're exploring exciting territory; that could pave the way for more tailored AI experiences catering to users on a personal level.
00:30:19.480
The final point I'd like to touch on is the importance of open-source AI models. Tools like Llama allow anyone with proprietary data to explore AI locally without needing to compromise sensitive information. Many of these models are progressing rapidly, leading me to believe open-source will dominate in the near future.
00:31:01.680
Platforms like AWS Bedrock streamline access to AI models, enabling users to switch seamlessly between different services with minimal effort. This process encourages exploration and integration of AI into various applications.
00:31:56.960
Now for the demonstration, I developed a mini call center in just a few hours. The goal is to improve AI assistance—Toby will be calling in to select an issue, demonstrating how AI can route him to the correct agent.
00:33:31.600
[Failed live demo recording] Welcome to Telra. Before we proceed, can I check your address? Thanks! How can I assist you today? I have a mobile issue.
Perfect! I will connect you with Beverly, our mobile team specialist.
00:36:03.680
[Demo successful] The key takeaway is how we could revolutionize call centers with AI, providing targeted information to the right expert at the right time and freeing them up to focus on urgent human interactions where they'll have the highest impact.
00:36:49.760
To conclude, AI is waiting for Ruby to embrace it as extensively as Python has. We have an opportunity here within our community to innovate and contribute positively to society, enhancing user experiences and personal interactions.
00:37:08.320
If anyone is interested, I'm offering consultations over the next couple of weeks for those who want to integrate AI into their projects—this won’t be a sales pitch; I’m simply here to help.
00:37:40.880
Thank you all for this brilliant opportunity, and it has been an amazing experience interacting with such a fantastic community!