Syntax Isn't Everything: NLP For Rubyists

RubyConf MY 2017

Play on YouTube

Syntax Isn't Everything: NLP For Rubyists

Aja Hammerly • October 12, 2017 • Selangor, Malaysia • Talk

Overview

In the talk "Syntax Isn't Everything: NLP For Rubyists," presented by Aja Hammerly at RubyConf MY 2017, the speaker discusses Natural Language Processing (NLP) and its significance, especially for Ruby developers. Hammerly emphasizes that NLP is about teaching computers to understand human language and highlights the potential of NLP to improve user experiences and accessibility.

Key Points

Definition of NLP: Natural Language Processing is defined as programming computers to process and respond to human languages like English and Japanese.
Importance of NLP: NLP has been in use for decades, helping enhance user experiences (e.g., voice-activated menus, chatbots) and making human communication with machines more natural.
Examples of NLP Applications:
- Voice Interfaces: Help those with disabilities to interact easier with technology, though they might also create new challenges for those who cannot speak.
- Understanding Large Texts: NLP can simplify the comprehension of large documents and feedback.
- Email Analysis Tool: A tool that alerts users when their emails might come off as hostile.
Language Complexity: Hammerly points out that human languages are inherently complex and nuanced, presenting challenges in NLP, such as:
- Ambiguity in words and meanings (e.g., homophones like "their" and "there").
- The evolution of language and how meanings change over time (e.g., the evolving meaning of the word "unique").

Syntax Isn't Everything: NLP For Rubyists
Aja Hammerly • October 12, 2017 • Selangor, Malaysia • Talk

Speaker: Aja Hammerly (@the_thagomizer)

Website: http://rubyconf.my

Produced by Engineers.SG

RubyConf MY 2017

00:00:06.090 hello i'm mocchi Camberley I am from Seattle like Aaron who was talking yes

00:00:11.460 tomorrow I am tagging Weiser on github the thagomizer on twitter and i blog at

00:00:18.539 taco miser calm and I like dinosaurs a lot psycho miser is that spiky part at

00:00:25.679 the end of a Stegosaurus I was gifted the domain name by a partner at the time

00:00:30.980 almost ten years ago and it just kind of exploded from there I even have a

00:00:36.540 cogniser earrings now so I work at Google on

00:00:42.210 Google cloud platform as a developer advocate and I have many answers and questions and opinions about how you can

00:00:49.110 run your stuff on Google cloud platform come find me I'm happy to chat and because I work at a really big company I

00:00:54.900 have to have this slide the lawyer cat says that any code in my slide is copyright Google and licensed Apache v2

00:01:01.340 so NLP this talk is about NLP and you

00:01:08.580 might be wondering what that is if you didn't just hear the answer but specifically its natural language

00:01:13.590 processing I get to a little bit closer to understanding what this talk is about but it's still a little bit fuzzy so

00:01:20.219 let's see what wikipedia has to say natural language processing is a few

00:01:27.450 do--the of computer science artificial intelligence and computational linguistics concerned with the

00:01:32.609 interactions between computers and human natural languages and in particular it's

00:01:39.149 concerned with programming computers to fruitfully process large natural data

00:01:44.609 corpus and that definition is really long and has a lot of big words so

00:01:50.850 here's the definition I actually use natural language processing is teaching computers to understand and ideally

00:01:56.939 respond to human languages and human languages are things like English Japanese Chinese American sign language

00:02:04.469 British sign language basically any languages that humans use to communicate with each other so why should I care

00:02:11.340 this is you know a queueing thousands of thousands of children everywhere in school and having to learn things when

00:02:17.220 am I ever going to use this and the big reason is it's already here NLP is here and has been already here

00:02:24.000 for decades when you call up an airline and try to use a voice-activated phone tree and you're sitting there in a

00:02:30.390 parking lot screaming reservations reservations reservations and it's only

00:02:36.270 sort of understanding you that's really bad NLP when you ever you

00:02:42.150 chat with an agent in a chat window on a website I had to this morning working on some flight reservations there was an

00:02:49.260 actually human involved that is also NLP this particular website said that the

00:02:54.360 agents name was Jennifer no there wasn't actually a person there one of the big

00:03:00.030 promises of NLP is better user experiences instead of having to teach ourselves how we interact with a

00:03:05.790 computer we can teach the computer to interact with us in ways we already know how I don't know how many of you have

00:03:11.580 had the pleasure or pain of trying to teach someone who's not particularly computer literate how to do something

00:03:16.860 and it's clear that if it could just work with the way their brain works you

00:03:22.020 wouldn't have to sit there and play tech support but it doesn't and so NLP is potentially a way for us to get there so

00:03:29.610 one of my favorite examples it isn't good NLP but it's a great example of NLP that's been a long run for a while is if

00:03:35.760 you say something like computer tea Earl Grey hot no one gets my Star Trek joke that makes

00:03:42.750 me sad I also have one of these at home and how many virtual assistants are

00:03:49.980 there out there like I moved in June and I don't actually know where all the light switches in my new place are

00:03:55.890 because I just shout at the computer to turn the lights on all the time it's fantastic and these are becoming more

00:04:02.550 and more and more popular and then there's one I already mentioned which is tech support and phone trees so

00:04:09.510 hopefully at this point you believe me that NLP can improve user experiences but there's another way that NLP is

00:04:15.750 helpful that you may not have thought of which is accessibility so voice interfaces are important

00:04:21.060 because they can be more accessible than text for some people for example those who can't write because of disabilities

00:04:27.300 and/or physical and/or cognitive they can bail so be helpful if your hands are busy right now both of my

00:04:33.600 hands are busy I could not write anything and they also can but the other

00:04:39.540 thing to know is that voice interfaces can also reduce accessibility for other users people who can't speak so I don't

00:04:46.200 be as good it can help us it can make computers easier to use and can make our lives better that isn't the only benefit

00:04:52.160 and Hopi also helps us improve our understanding especially of large huge

00:04:57.480 piles of text or speech so maybe you work on a website where you get feedback from users when I was at a startup we

00:05:04.230 had a feedback button on every page and it was a startup making software for children so we got feedback from five-year-olds and it was fantastic and

00:05:11.570 hilarious five-year-olds do not hide their opinions about your software at

00:05:16.860 all and they have some of the best insults ever but you can also use it for

00:05:22.950 things like reading through investor briefings or all sorts of those really long crazy documents that companies use

00:05:29.130 to try to hide important information we can use things like NLP to help us understand those faster so we don't have

00:05:34.290 to read them ourselves and it can used to be assist us in other ways one of my

00:05:39.480 co-workers made a tool called deep breath that analyzes your emails as you're writing them in Gmail and if what

00:05:45.690 you're writing it comes off as hostile it tells you to take a deep breath before you send it it's just super handy

00:05:52.080 imagine how many github flamewars could have died before they even got started if everyone had a tool like this and set

00:05:58.710 up so I don't be as useful but we don't have it yet as widely available as with

00:06:03.930 like and what we do have isn't great the number of times that I have screamed at the computer know the bedroom lights

00:06:09.300 know the bedroom lights turn on the bedroom lights and it's just like okay

00:06:15.120 and then turn something else on in completely different way too high that's

00:06:21.000 because I don't be as hard well why is that hard and largely because English is

00:06:27.450 horrible it's a horrible horrible language and to prove this to you I have

00:06:32.910 this wonderful word right here so the word is steel everyone imagine what that word is it's a noun I'll give you that

00:06:40.750 how many of you guys thought of something like this yeah I mostly

00:06:45.790 included this because I really really liked this picture how many thought of

00:06:51.070 something like this how many thought of something completely different that I don't have a picture of

00:06:56.790 yeah there's a lot of different meanings for that one word and without context

00:07:02.170 and even with context sometimes it's hard to figure out what you're talking about another great example is homophones there there and there

00:07:09.520 yeah those they all mean different things and then there are words that

00:07:15.010 give me multiple parts of speech love is a great example he loves his wife and

00:07:20.760 love lasts forever in the first one love is a verb and the second one love is a

00:07:26.350 noun but it's the same word how can it be multiple parts of speech and there are many other languages that don't let

00:07:32.110 this happen by use by adding things to the end of words or changing how vowels work out the part of speech is indicated

00:07:38.560 but English is really really bad really horrible at this so English is horrible

00:07:43.630 and I really didn't even get into things like irregular verbs slang idioms and all the other bits of language that make

00:07:49.990 a human language on human language but it turns out that English isn't alone because all human languages are horrible

00:07:55.930 they're horrible and these ways or in different ways every language has weird interesting things that you only

00:08:01.660 understand once you learn the language fluently if nothing else every language has idioms and one of the big things

00:08:10.030 that makes human language is hard for computers is because there are no formal closed grammars for human languages who

00:08:15.700 follows the phrase formal closed grammar raise your hand okay so that means that I can't basically make a flowchart for

00:08:22.270 how to make a valid sentence like there are lots of ways to throw words together it isn't as bad as some other languages

00:08:28.180 where there are no where there's no word order just you know wherever they feel good but there's no there's no flowchart

00:08:35.680 that I can make to make valid sentences in the English language or in many many human languages but

00:08:44.340 human languages are heart and they're much harder than computer languages it's really easy for me to make a flowchart for how to make a valid phrase in Ruby

00:08:51.130 for example and one of the other big challenges is that NLP is hard because humans work really bad at precision for

00:08:59.380 example I could say I'm starving it might be true but probably not I could

00:09:08.140 say you look freezing also maybe but

00:09:13.839 most likely not and then I was reading the other day in blue was the New York Times that the

00:09:20.529 word unique is getting less unique over the last thirty years there's a good

00:09:26.710 probability now when someone says unique what they actually mean is unusual instead of thirty years ago when they

00:09:32.710 said unique they actually meant unique it has gone up like six fold in its use in printed word in English newspapers

00:09:40.330 and other publications in the last thirty years so language is constantly evolving which also makes it hard for us to write

00:09:46.210 programs that understand human language and my last example as computers are really bad at sarcasm so I could say

00:09:55.870 sure I'd love to help you out with that sounds pretty sincere I'm not being a

00:10:02.350 jerk in this case but if I said sure I'd love to help with that you can tell that

00:10:07.570 I'm being sarcastic in this case but there's no actual difference in the word so the only difference is in my Petone

00:10:13.270 of voice in my pitch the meaning of the sentence changes based on how I say it or on the surrounding context and

00:10:20.830 despite what we learn from Hitchhiker's Guide to the galaxy computers are really bad at sarcasm so why is this hard this

00:10:30.459 is hard because humans humans use language in weird ways like sarcasm and exaggeration it's hard because English

00:10:36.130 is complicated and all languages are complicated and they're always changing but since humans created human languages

00:10:42.010 we can just simplify all this and say NLP is hard because humans

00:10:48.339 so I'm gonna dive a little bit into the history of NLP right now um we've been doing this for a really really long time

00:10:55.300 who recognizes those two those two names possibly from a calculus class near you yeah both luminance and the cart

00:11:03.140 proposed ways to do algorithmic translation between languages and when I travel one of my absolute favorite

00:11:09.320 things in the world was the Google Translate app on my phone where I can take a picture and it tells me what things say like it was fantastic when I

00:11:15.260 was last time I was in Japan it is so handy but we've been working on this that very idea since the time of

00:11:22.190 Descartes so this is not noon it's taken us a really long time to get to a point

00:11:27.350 where even sort of works another good example of NLP is the Turing test who thinks they know what the Turing test is

00:11:33.500 I thought I did too I was wrong those through the 1950s uh the way this works

00:11:41.330 is you're testing to see if a machine is intelligent so you have a judge and they're watching a conversation between

00:11:47.089 a human and a machine they know that there is one human in one machine but they don't know which one is which if

00:11:52.250 the judge can't tell which one is the human or which one is the machine the machine passes the Turing test I always

00:11:59.089 was under the impression that it was a human is conversing with a machine and doesn't realize they're talking to a machine but the actual version proposed

00:12:06.290 by Alan Turing was the judge a third party is trying to figure out which of two people in a conversation is a

00:12:11.990 machine people and then there's the other famous example of Eliza uh who

00:12:17.120 uses Emacs I use Emacs Emacs is awesome you should use Emacs um if you go meta X

00:12:24.380 dr. you can let you play with it in real time Eliza is the computer psychologist and this is also late 1950s early 1960s

00:12:30.910 it's surprisingly good considering how little code is actually there and the

00:12:36.050 fact that it's written in Lisp so but wise it was really the first of the chat BOTS whose were in the chat BOTS or a

00:12:42.740 Twitter bot or a slack bot or something I wrote one it went and pulled the cute

00:12:47.959 stuff from our jaw on Reddit and posted it into the channel it was fantastic

00:12:53.320 so I have talked a lot about history I've talked a lot about the theory so I'm here actually talking guys about

00:12:59.139 some coat um one of the things that I normally do when I get on stage is I really hate giving brick practical and

00:13:05.800 useful examples because I'm afraid that people are gonna go use the stuff that I crammed onto a slide and think it's

00:13:10.959 production worthy it's not it's never production worthy production where the code doesn't fit on slides so I'm gonna

00:13:17.949 give you impractical examples the first one is Twitter at rubyconf in the US and 2015 I gave one of the keynotes and the

00:13:25.930 talk was called this was stupid ideas for many computers in that talk actually

00:13:31.389 based on talk that aaron and ryan did in 2011 something like that when they were

00:13:38.889 writing the fail bus together and i wanted to take all of those ideas and try to do stupid competing at scale so

00:13:46.810 in the talk i demonstrated how I could do sentiment analysis of tweets by scoring the emoji that they contained it

00:13:53.920 was a really bad idea does this make this clear this is a horrible idea so the sentiment analysis

00:13:59.230 is the process of computationally identifying and categorizing opinions based it pinions in a piece of text

00:14:06.819 especially in order to determine whether the writer's attitude toward a particular topic product etc is positive

00:14:12.069 negative or neutral this is something that a lot of brands use on their social media streams to figure out how people

00:14:17.860 are feeling about their product either in response to an ad campaign or you know Airlines and response to storms

00:14:24.069 that are causing delays things like that so I decided to use emoji because

00:14:29.709 sentiment analysis is really hard so I made a scale on one end you have the

00:14:35.709 like purple angry guy and he's negative 30 and then there's the poop emoji which is negative 15 and the smiley face is

00:14:43.149 positive 30 and so on and I used emoji because it's much much easier than

00:14:48.610 actually doing it but it turns out that since I gave that talk we've gotten better we've gotten access to better

00:14:54.370 tools and one of the things is I work at Google and we released the natural language API so I'm gonna show you how

00:15:00.519 this would work with that instead of my really horrible emoji based system so

00:15:05.769 you can get access to this gem stall google cloud language it's currently an alpha it's going beta

00:15:11.290 shortly yes I have been told it was actually supposed to be beta by today but we found something we didn't like so

00:15:17.620 we're fixing it um and the codes actually pretty straightforward you require it you create a new language

00:15:23.560 object and then I've got this analyzed method that takes in the text of a tweet and I'm like okay language your document

00:15:31.030 is the text for this tweet and then I'm like hey go give me the sentiment and it goes off to the server and the server is like here's your sentiment and then the

00:15:37.360 sentiment has two things it has score and magnitude we're gonna just look and

00:15:43.030 listen to score and so I care about that and it gives me a number between negative 1 and 1 it's fantastic so I'm

00:15:49.450 gonna vigorously hand wave over how to do this at scale there is a small distributed system that environs a thing

00:15:55.120 called Rinda it's setup using kubernetes if you want to talk to me about it come find me afterwards I'll walk you through it you can also watch the talk the

00:16:01.840 videos up on confreaks and I actually watch walk through the whole architecture and the source codes all in line if you want to try to analyze a

00:16:08.590 bunch of tweets sure do that awesome but

00:16:13.810 the big thing is is that I took well we used to be about 30 lines of code and I've managed to take it down to three

00:16:19.150 lines of code by using a model that's been trained to do sentiment analysis by

00:16:24.190 someone else yeah I could write sentiment Alice's code in Ruby but I like I'm lazy I'm fundamentally lazy and

00:16:31.750 I like to keep things easy so I'm using someone else's library for it the other example I have for you today is sentence

00:16:37.180 diagramming when I was in school I had to do a lot of things that looked like this I had to figure out what the

00:16:43.750 subject and the verb of a sentence were I had to separate them with a big line whereas a direct object that was half a

00:16:49.510 line and like other words when it crazy angles this was something that I did in

00:16:55.630 grade 7 all year no matter how much I hated it I think it was technically

00:17:01.480 useful I don't know why I was just part of you know school where I went to school one of my friends did all his

00:17:09.339 this way kind of doing an abstract syntax tree of English on you

00:17:15.550 grammar so I was talking to some folks about how excited I was about to about giving this

00:17:21.130 talk and they're like I don't actually remember any grammar at all so I'm gonna you know do a quick sidequest

00:17:27.909 my guess is that this will be a refresher for some of you and for most

00:17:33.279 of you who are like what why did your friends not know this stuff well answer in some cases because my

00:17:39.010 friends are monolingual and you don't learn grammar as well unless you know multiple languages and my experience so

00:17:44.380 real quick parts of speech this is one of the way we understand words we label

00:17:49.809 them with what they do so we have verbs verbs verbs are the most important part

00:17:55.389 of a sentence you can't have a sentence without a verb one type of verb is an action jump you could also have a state

00:18:02.710 of being think thinking nouns nouns are

00:18:08.409 person like Matt's or Alan Turing a place like Malaysia or bathroom or a

00:18:16.450 building or thing like a bird or a goat

00:18:22.330 I met this goat when I went hiking it was pretty cool but now ins can also be

00:18:27.340 ideas so you may have heard the phrase abstract noun some examples are democracy freedom love

00:18:34.600 those are all abstract nouns you can also have adjectives they describe or

00:18:40.480 modify other words usually known it gets complicated but there are things like attributes blue small v they also help

00:18:47.620 us compare like near and far you also have articles a and uh which

00:18:54.850 are sort of adjectives and sort of not and they don't no longer call them articles they now call them determiners

00:19:01.299 for reasons that I don't understand determiners also include this and that all the articles are determiners not all

00:19:08.080 the terminals are articles so yeah that was all the parts of speech that I care about for today we also have the parts

00:19:13.120 of a sentence the root this is the only required part of sentence which means it's the verb then we have the subject

00:19:20.139 which is the thing that does the verb guerilla thinks or just speaks and then

00:19:27.130 there's the direct object is the thing that the verb happens to so the cat eats fish cat is the subject the

00:19:35.559 root is eats and the direct object is fish side-quests complete so back to

00:19:42.910 sentence diagramming I promised this was actually important so this is basically

00:19:48.910 how a rough sentence diagram works for the kind I used in order to draw these diagrams I need to figure out which part

00:19:55.210 of speech or part of the sentence each word is to do that I need to use syntax the natural language API has a method

00:20:02.350 called syntax so my normal boilerplate and then I have a document I'm gonna

00:20:08.020 tell it that we're gonna work on the sentence the cat ate fish and I go hey document give me the syntax of that and

00:20:14.410 then I have it print out the tokens and I get this crazy pile of stuff this is the token for the word cat and there's

00:20:21.429 way more stuff here than matters like there's ideas of a grammatical gender English doesn't have grammatical gender

00:20:26.980 so that's kind of irrelevant here because this of course works on multiple languages but the important thing is

00:20:32.559 that here's the text itself cat it is four characters into the string that is its offset here's the part of speech it

00:20:39.820 is a noun and it is singular I can also have case if the language has case but

00:20:46.270 we're not doing German or other languages the case so we're not going to do that and then this is the most important part this token is labeled n

00:20:52.900 sub four normal nominative subject which is basically just saying that cat is the subject of the sentence which is good

00:20:58.750 because cat is the thing that eats fish so I was able to write some code that

00:21:04.570 created ASCII art versions of the sentence diagrams and it's really simple I'm gonna find the token that's marked

00:21:10.059 the Naumann of subject I'm gonna save that as sucked I'm gonna find the token that's marked as the root and save that as verb and then I'm gonna do some crazy

00:21:17.380 our ASCII art with puts and you know some math and it's awesome so the one

00:21:24.100 thing yes so the last line there is a

00:21:29.380 cool trick I learned you can multiply a string by a numeric and in order to make

00:21:34.900 sure that everything is spaced properly in this case so rockin ASCII art but

00:21:40.929 that's kind of boring because I don't have all the words yet so now I have direct objects while I go find the direct object in the tokens and

00:21:47.890 save that off and change my ASCII art up a little bit and then I get the cat ate fish oh wait well I'm missing the so how

00:21:55.630 do I include the so I have to actually look at that the results from the

00:22:00.790 natural language API and I get a head to token index for each word this is the index of the parent of the current token

00:22:07.870 so this is the token for the its head index token is one if I look at the

00:22:14.230 array of tokens the cat eats fish period the thing at index one is cats this is

00:22:22.690 telling us that the refers to cat someone write some really bad code

00:22:28.950 tokens go through them all if a tokens head index is the subject I'm going to

00:22:36.340 print the text of that token yes and with that I can make this diagram yay

00:22:43.360 all of my words so let's make this a little more a little more challenging so

00:22:48.580 now I have the cat ate fish with a side of milk yeah my code doesn't work at all

00:22:54.070 on that at all even a little bit so at this point I jumped ship and switched to

00:22:59.230 my old friend graph this was the actually the Jim I gave my very first conference talk about in 2007 something

00:23:05.770 like that ate the graph Jim is a gem that makes creating node and edge graphs

00:23:12.610 graphs like graph theory not graphs like bar charts provides a DSL in Ruby to

00:23:19.059 create dot files dot files are the file format that graph is uses to render graphs so some simple graph stuff you

00:23:27.460 create nodes by calling a node method you pass an ID and a label you create edges by calling the edge method with a

00:23:34.120 from in a two and it draws the arrows so here's the code it's actually relatively simple this is all the code you need

00:23:39.460 oddly enough this is some graph boilerplate this drops me into the graph

00:23:44.860 guess all I'm gonna go through each of the tokens with index I'm gonna make a

00:23:50.110 node specifying the index and using the text span text as the label and unless its head token index is

00:23:57.630 itself so unless it is referring to itself the only thing that does that is the root I'm gonna draw an edge from my

00:24:03.960 node I index to its head token and that

00:24:08.970 gives me this doll refers to cat cat refers to eight fish refers to eight and the period refers to eight because it's

00:24:15.960 the root but I can also use my more example my more complicated example of the cat ate fish with a side of milk and

00:24:22.769 it works as well and you can even see that of milk the prepositional phrase there is labeled correctly everything is

00:24:30.750 all laid out exactly the way you would expect it to be so I showed you some really silly examples today but there

00:24:37.799 are lots of practical uses for animal people handling customer feedback better understanding language summarizing

00:24:42.840 things for humans to read I'm sure that some of you have your own ideas on how you could use this at home so if you

00:24:50.880 just want to stick your foot into NLP the Google natural language API is a good place to start you don't have to go in and learn all the algorithms you

00:24:56.880 don't have to make sure that you build separate models for different languages many many languages are actively included and the first 5,000 a month

00:25:04.710 first 5,000 requests a month are free getting started is easy just installed the gem and you can experiment I

00:25:11.309 actually ran the Jabberwocky through it just to see what it would do because I'm at heart a tester and I actually got the

00:25:18.659 syntax out analysis is exactly correct so it's not based on just vocabulary

00:25:23.909 it's based on the structure of the language and endings and all sorts of other things like that so thank you very

00:25:30.179 much for having me uh I like dinosaurs so I have a ton of dinosaur stickers and

00:25:36.659 a bunch of Google stickers of various kinds in my bag I don't want to take them back home to the US so come come

00:25:43.380 find them for me thank you

00:25:48.390 thank you so much as a alright we're gonna take some questions

00:25:53.410 alright come on guys what questions you have on idle processes so the first two

00:25:58.840 speakers didn't get a lot of questions so I'm gonna promise you that I will show you a picture of a cat if you ask a question hard to say no to that there's

00:26:09.190 an obvious question mm-hmm come on we've got to Erin first can I see a cat Thank

00:26:20.290 You Roger do you have a real question

00:26:25.900 thanks Erin so you see that the English

00:26:31.660 language is bad right it's horrible but so if super

00:26:37.420 intelligence AI is able to create language that will that you will use to communicate are they I how does it how

00:26:44.380 do you think we look so the question is English is hard what would a super

00:26:52.300 intelligent AI that could actually create language cuz you'll notice that everything I talked about today was analyzing human language it wasn't it

00:27:00.100 wasn't creating spontaneous language how would that super intelligent AI look I don't know actually I know that we're

00:27:06.820 getting closer and closer to having really complicated AI but at the end of the day and I've been working on a blog

00:27:13.090 series on basics of machine learning we are we are still at this point for the

00:27:19.480 most part blocked by our datasets and blocked by our ability to create algorithms there's a really interesting

00:27:25.150 field that's coming up an algorithmic bias and how we're limited by the data that we have access to so I actually

00:27:31.810 have no idea what I would look like and I'm a little bit afraid because there's been so many horror movies written about

00:27:37.000 that you got

00:27:45.380 certain point they just a certain point they just stop conversation in the in

00:27:50.460 English and instead of saying give me three pieces so that they say me be me

00:27:55.650 today they repeated three times with the opponent from from their nonprofit you

00:28:01.440 said you watch that in a movie okay and not Teleca okay all right okey-dokey one

00:28:13.140 last question oh sorry yeah blue girls he was

00:28:18.810 pointing at you

00:28:31.990 so the question was the question was am I using Ruby itself for doing any of the

00:28:38.000 natural language processing or I'm just using it to consume an API I've played a little bit with like implementing things

00:28:49.750 Ruby doesn't have all the libraries that something like Python or Java has built in and I don't have a PhD in machine

00:28:56.540 learning makes me a little sad sometimes and so I don't have that don't necessarily know the right the right set

00:29:02.660 of tools and so I chose to use the API because it was faster and easier and the

00:29:07.940 best part is it's getting better all of the all the machine learning API is that we've released are getting better and

00:29:14.030 better and better like the vision one can identify breeds of cats and dogs why

00:29:19.040 because people wanted that okay we added it and that means that I'm not responsible for maintaining it as

00:29:25.190 language advances and as technology advances I do want to play with it I actually just did a blog post on some

00:29:31.940 basic machine learning techniques in Ruby I've got K nearest neighbors I've got basic linear regression there's

00:29:37.010 gonna be a couple more showing that you can actually use Ruby for this stuff but if there's something that exists coming

00:29:43.700 back to my belief and being fundamentally lazy I'm going to use the lazy thing that's my cat Emma more cats

00:29:52.450 do we have any final question one last one yes sir so in English you can say

00:29:59.930 the same thing with different structure right so can this um Google cloud

00:30:05.420 language understand in intent so tell me what you are saying with different ways

00:30:12.340 so the Google code language API just takes text we also have a speech API and

00:30:20.210 there's probably ways to hook those two together to try to understand intent but most of what it can use is context so

00:30:26.690 you can have it understand a single sentence and it may get it wrong one of my favorite examples is I had it do bunnies hop and it didn't understand

00:30:34.640 that bunnies was a noun in that context but if I said the bunnies hop and put a period on the end it all of a sudden

00:30:40.670 understood it because I had additional context languages are hard and computers are

00:30:46.410 actually really really dumb they're just really really fast at doing dumb things over and over and over again so okay my

00:30:52.950 last kitten there you go thank you so much eyes up for the cats dinosaurs

Aja Hammerly

@thagomizer

Explore all talks recorded at RubyConf MY 2017

+16

RubyConf MY 2017