Summarized using AI

Syntax Isn't Everything: NLP For Rubyists

Aja Hammerly • October 12, 2017 • Selangor, Malaysia • Talk

Overview

In the talk "Syntax Isn't Everything: NLP For Rubyists," presented by Aja Hammerly at RubyConf MY 2017, the speaker discusses Natural Language Processing (NLP) and its significance, especially for Ruby developers. Hammerly emphasizes that NLP is about teaching computers to understand human language and highlights the potential of NLP to improve user experiences and accessibility.

Key Points

  • Definition of NLP: Natural Language Processing is defined as programming computers to process and respond to human languages like English and Japanese.
  • Importance of NLP: NLP has been in use for decades, helping enhance user experiences (e.g., voice-activated menus, chatbots) and making human communication with machines more natural.
  • Examples of NLP Applications:
    • Voice Interfaces: Help those with disabilities to interact easier with technology, though they might also create new challenges for those who cannot speak.
    • Understanding Large Texts: NLP can simplify the comprehension of large documents and feedback.
    • Email Analysis Tool: A tool that alerts users when their emails might come off as hostile.
  • Language Complexity: Hammerly points out that human languages are inherently complex and nuanced, presenting challenges in NLP, such as:
    • Ambiguity in words and meanings (e.g., homophones like "their" and "there").
    • The evolution of language and how meanings change over time (e.g., the evolving meaning of the word "unique").

Syntax Isn't Everything: NLP For Rubyists
Aja Hammerly • October 12, 2017 • Selangor, Malaysia • Talk

Speaker: Aja Hammerly (@the_thagomizer)

Website: http://rubyconf.my

Produced by Engineers.SG

RubyConf MY 2017

00:00:06.090 hello i'm mocchi Camberley I am from Seattle like Aaron who was talking yes
00:00:11.460 tomorrow I am tagging Weiser on github the thagomizer on twitter and i blog at
00:00:18.539 taco miser calm and I like dinosaurs a lot psycho miser is that spiky part at
00:00:25.679 the end of a Stegosaurus I was gifted the domain name by a partner at the time
00:00:30.980 almost ten years ago and it just kind of exploded from there I even have a
00:00:36.540 cogniser earrings now so I work at Google on
00:00:42.210 Google cloud platform as a developer advocate and I have many answers and questions and opinions about how you can
00:00:49.110 run your stuff on Google cloud platform come find me I'm happy to chat and because I work at a really big company I
00:00:54.900 have to have this slide the lawyer cat says that any code in my slide is copyright Google and licensed Apache v2
00:01:01.340 so NLP this talk is about NLP and you
00:01:08.580 might be wondering what that is if you didn't just hear the answer but specifically its natural language
00:01:13.590 processing I get to a little bit closer to understanding what this talk is about but it's still a little bit fuzzy so
00:01:20.219 let's see what wikipedia has to say natural language processing is a few
00:01:27.450 do--the of computer science artificial intelligence and computational linguistics concerned with the
00:01:32.609 interactions between computers and human natural languages and in particular it's
00:01:39.149 concerned with programming computers to fruitfully process large natural data
00:01:44.609 corpus and that definition is really long and has a lot of big words so
00:01:50.850 here's the definition I actually use natural language processing is teaching computers to understand and ideally
00:01:56.939 respond to human languages and human languages are things like English Japanese Chinese American sign language
00:02:04.469 British sign language basically any languages that humans use to communicate with each other so why should I care
00:02:11.340 this is you know a queueing thousands of thousands of children everywhere in school and having to learn things when
00:02:17.220 am I ever going to use this and the big reason is it's already here NLP is here and has been already here
00:02:24.000 for decades when you call up an airline and try to use a voice-activated phone tree and you're sitting there in a
00:02:30.390 parking lot screaming reservations reservations reservations and it's only
00:02:36.270 sort of understanding you that's really bad NLP when you ever you
00:02:42.150 chat with an agent in a chat window on a website I had to this morning working on some flight reservations there was an
00:02:49.260 actually human involved that is also NLP this particular website said that the
00:02:54.360 agents name was Jennifer no there wasn't actually a person there one of the big
00:03:00.030 promises of NLP is better user experiences instead of having to teach ourselves how we interact with a
00:03:05.790 computer we can teach the computer to interact with us in ways we already know how I don't know how many of you have
00:03:11.580 had the pleasure or pain of trying to teach someone who's not particularly computer literate how to do something
00:03:16.860 and it's clear that if it could just work with the way their brain works you
00:03:22.020 wouldn't have to sit there and play tech support but it doesn't and so NLP is potentially a way for us to get there so
00:03:29.610 one of my favorite examples it isn't good NLP but it's a great example of NLP that's been a long run for a while is if
00:03:35.760 you say something like computer tea Earl Grey hot no one gets my Star Trek joke that makes
00:03:42.750 me sad I also have one of these at home and how many virtual assistants are
00:03:49.980 there out there like I moved in June and I don't actually know where all the light switches in my new place are
00:03:55.890 because I just shout at the computer to turn the lights on all the time it's fantastic and these are becoming more
00:04:02.550 and more and more popular and then there's one I already mentioned which is tech support and phone trees so
00:04:09.510 hopefully at this point you believe me that NLP can improve user experiences but there's another way that NLP is
00:04:15.750 helpful that you may not have thought of which is accessibility so voice interfaces are important
00:04:21.060 because they can be more accessible than text for some people for example those who can't write because of disabilities
00:04:27.300 and/or physical and/or cognitive they can bail so be helpful if your hands are busy right now both of my
00:04:33.600 hands are busy I could not write anything and they also can but the other
00:04:39.540 thing to know is that voice interfaces can also reduce accessibility for other users people who can't speak so I don't
00:04:46.200 be as good it can help us it can make computers easier to use and can make our lives better that isn't the only benefit
00:04:52.160 and Hopi also helps us improve our understanding especially of large huge
00:04:57.480 piles of text or speech so maybe you work on a website where you get feedback from users when I was at a startup we
00:05:04.230 had a feedback button on every page and it was a startup making software for children so we got feedback from five-year-olds and it was fantastic and
00:05:11.570 hilarious five-year-olds do not hide their opinions about your software at
00:05:16.860 all and they have some of the best insults ever but you can also use it for
00:05:22.950 things like reading through investor briefings or all sorts of those really long crazy documents that companies use
00:05:29.130 to try to hide important information we can use things like NLP to help us understand those faster so we don't have
00:05:34.290 to read them ourselves and it can used to be assist us in other ways one of my
00:05:39.480 co-workers made a tool called deep breath that analyzes your emails as you're writing them in Gmail and if what
00:05:45.690 you're writing it comes off as hostile it tells you to take a deep breath before you send it it's just super handy
00:05:52.080 imagine how many github flamewars could have died before they even got started if everyone had a tool like this and set
00:05:58.710 up so I don't be as useful but we don't have it yet as widely available as with
00:06:03.930 like and what we do have isn't great the number of times that I have screamed at the computer know the bedroom lights
00:06:09.300 know the bedroom lights turn on the bedroom lights and it's just like okay
00:06:15.120 and then turn something else on in completely different way too high that's
00:06:21.000 because I don't be as hard well why is that hard and largely because English is
00:06:27.450 horrible it's a horrible horrible language and to prove this to you I have
00:06:32.910 this wonderful word right here so the word is steel everyone imagine what that word is it's a noun I'll give you that
00:06:40.750 how many of you guys thought of something like this yeah I mostly
00:06:45.790 included this because I really really liked this picture how many thought of
00:06:51.070 something like this how many thought of something completely different that I don't have a picture of
00:06:56.790 yeah there's a lot of different meanings for that one word and without context
00:07:02.170 and even with context sometimes it's hard to figure out what you're talking about another great example is homophones there there and there
00:07:09.520 yeah those they all mean different things and then there are words that
00:07:15.010 give me multiple parts of speech love is a great example he loves his wife and
00:07:20.760 love lasts forever in the first one love is a verb and the second one love is a
00:07:26.350 noun but it's the same word how can it be multiple parts of speech and there are many other languages that don't let
00:07:32.110 this happen by use by adding things to the end of words or changing how vowels work out the part of speech is indicated
00:07:38.560 but English is really really bad really horrible at this so English is horrible
00:07:43.630 and I really didn't even get into things like irregular verbs slang idioms and all the other bits of language that make
00:07:49.990 a human language on human language but it turns out that English isn't alone because all human languages are horrible
00:07:55.930 they're horrible and these ways or in different ways every language has weird interesting things that you only
00:08:01.660 understand once you learn the language fluently if nothing else every language has idioms and one of the big things
00:08:10.030 that makes human language is hard for computers is because there are no formal closed grammars for human languages who
00:08:15.700 follows the phrase formal closed grammar raise your hand okay so that means that I can't basically make a flowchart for
00:08:22.270 how to make a valid sentence like there are lots of ways to throw words together it isn't as bad as some other languages
00:08:28.180 where there are no where there's no word order just you know wherever they feel good but there's no there's no flowchart
00:08:35.680 that I can make to make valid sentences in the English language or in many many human languages but
00:08:44.340 human languages are heart and they're much harder than computer languages it's really easy for me to make a flowchart for how to make a valid phrase in Ruby
00:08:51.130 for example and one of the other big challenges is that NLP is hard because humans work really bad at precision for
00:08:59.380 example I could say I'm starving it might be true but probably not I could
00:09:08.140 say you look freezing also maybe but
00:09:13.839 most likely not and then I was reading the other day in blue was the New York Times that the
00:09:20.529 word unique is getting less unique over the last thirty years there's a good
00:09:26.710 probability now when someone says unique what they actually mean is unusual instead of thirty years ago when they
00:09:32.710 said unique they actually meant unique it has gone up like six fold in its use in printed word in English newspapers
00:09:40.330 and other publications in the last thirty years so language is constantly evolving which also makes it hard for us to write
00:09:46.210 programs that understand human language and my last example as computers are really bad at sarcasm so I could say
00:09:55.870 sure I'd love to help you out with that sounds pretty sincere I'm not being a
00:10:02.350 jerk in this case but if I said sure I'd love to help with that you can tell that
00:10:07.570 I'm being sarcastic in this case but there's no actual difference in the word so the only difference is in my Petone
00:10:13.270 of voice in my pitch the meaning of the sentence changes based on how I say it or on the surrounding context and
00:10:20.830 despite what we learn from Hitchhiker's Guide to the galaxy computers are really bad at sarcasm so why is this hard this
00:10:30.459 is hard because humans humans use language in weird ways like sarcasm and exaggeration it's hard because English
00:10:36.130 is complicated and all languages are complicated and they're always changing but since humans created human languages
00:10:42.010 we can just simplify all this and say NLP is hard because humans
00:10:48.339 so I'm gonna dive a little bit into the history of NLP right now um we've been doing this for a really really long time
00:10:55.300 who recognizes those two those two names possibly from a calculus class near you yeah both luminance and the cart
00:11:03.140 proposed ways to do algorithmic translation between languages and when I travel one of my absolute favorite
00:11:09.320 things in the world was the Google Translate app on my phone where I can take a picture and it tells me what things say like it was fantastic when I
00:11:15.260 was last time I was in Japan it is so handy but we've been working on this that very idea since the time of
00:11:22.190 Descartes so this is not noon it's taken us a really long time to get to a point
00:11:27.350 where even sort of works another good example of NLP is the Turing test who thinks they know what the Turing test is
00:11:33.500 I thought I did too I was wrong those through the 1950s uh the way this works
00:11:41.330 is you're testing to see if a machine is intelligent so you have a judge and they're watching a conversation between
00:11:47.089 a human and a machine they know that there is one human in one machine but they don't know which one is which if
00:11:52.250 the judge can't tell which one is the human or which one is the machine the machine passes the Turing test I always
00:11:59.089 was under the impression that it was a human is conversing with a machine and doesn't realize they're talking to a machine but the actual version proposed
00:12:06.290 by Alan Turing was the judge a third party is trying to figure out which of two people in a conversation is a
00:12:11.990 machine people and then there's the other famous example of Eliza uh who
00:12:17.120 uses Emacs I use Emacs Emacs is awesome you should use Emacs um if you go meta X
00:12:24.380 dr. you can let you play with it in real time Eliza is the computer psychologist and this is also late 1950s early 1960s
00:12:30.910 it's surprisingly good considering how little code is actually there and the
00:12:36.050 fact that it's written in Lisp so but wise it was really the first of the chat BOTS whose were in the chat BOTS or a
00:12:42.740 Twitter bot or a slack bot or something I wrote one it went and pulled the cute
00:12:47.959 stuff from our jaw on Reddit and posted it into the channel it was fantastic
00:12:53.320 so I have talked a lot about history I've talked a lot about the theory so I'm here actually talking guys about
00:12:59.139 some coat um one of the things that I normally do when I get on stage is I really hate giving brick practical and
00:13:05.800 useful examples because I'm afraid that people are gonna go use the stuff that I crammed onto a slide and think it's
00:13:10.959 production worthy it's not it's never production worthy production where the code doesn't fit on slides so I'm gonna
00:13:17.949 give you impractical examples the first one is Twitter at rubyconf in the US and 2015 I gave one of the keynotes and the
00:13:25.930 talk was called this was stupid ideas for many computers in that talk actually
00:13:31.389 based on talk that aaron and ryan did in 2011 something like that when they were
00:13:38.889 writing the fail bus together and i wanted to take all of those ideas and try to do stupid competing at scale so
00:13:46.810 in the talk i demonstrated how I could do sentiment analysis of tweets by scoring the emoji that they contained it
00:13:53.920 was a really bad idea does this make this clear this is a horrible idea so the sentiment analysis
00:13:59.230 is the process of computationally identifying and categorizing opinions based it pinions in a piece of text
00:14:06.819 especially in order to determine whether the writer's attitude toward a particular topic product etc is positive
00:14:12.069 negative or neutral this is something that a lot of brands use on their social media streams to figure out how people
00:14:17.860 are feeling about their product either in response to an ad campaign or you know Airlines and response to storms
00:14:24.069 that are causing delays things like that so I decided to use emoji because
00:14:29.709 sentiment analysis is really hard so I made a scale on one end you have the
00:14:35.709 like purple angry guy and he's negative 30 and then there's the poop emoji which is negative 15 and the smiley face is
00:14:43.149 positive 30 and so on and I used emoji because it's much much easier than
00:14:48.610 actually doing it but it turns out that since I gave that talk we've gotten better we've gotten access to better
00:14:54.370 tools and one of the things is I work at Google and we released the natural language API so I'm gonna show you how
00:15:00.519 this would work with that instead of my really horrible emoji based system so
00:15:05.769 you can get access to this gem stall google cloud language it's currently an alpha it's going beta
00:15:11.290 shortly yes I have been told it was actually supposed to be beta by today but we found something we didn't like so
00:15:17.620 we're fixing it um and the codes actually pretty straightforward you require it you create a new language
00:15:23.560 object and then I've got this analyzed method that takes in the text of a tweet and I'm like okay language your document
00:15:31.030 is the text for this tweet and then I'm like hey go give me the sentiment and it goes off to the server and the server is like here's your sentiment and then the
00:15:37.360 sentiment has two things it has score and magnitude we're gonna just look and
00:15:43.030 listen to score and so I care about that and it gives me a number between negative 1 and 1 it's fantastic so I'm
00:15:49.450 gonna vigorously hand wave over how to do this at scale there is a small distributed system that environs a thing
00:15:55.120 called Rinda it's setup using kubernetes if you want to talk to me about it come find me afterwards I'll walk you through it you can also watch the talk the
00:16:01.840 videos up on confreaks and I actually watch walk through the whole architecture and the source codes all in line if you want to try to analyze a
00:16:08.590 bunch of tweets sure do that awesome but
00:16:13.810 the big thing is is that I took well we used to be about 30 lines of code and I've managed to take it down to three
00:16:19.150 lines of code by using a model that's been trained to do sentiment analysis by
00:16:24.190 someone else yeah I could write sentiment Alice's code in Ruby but I like I'm lazy I'm fundamentally lazy and
00:16:31.750 I like to keep things easy so I'm using someone else's library for it the other example I have for you today is sentence
00:16:37.180 diagramming when I was in school I had to do a lot of things that looked like this I had to figure out what the
00:16:43.750 subject and the verb of a sentence were I had to separate them with a big line whereas a direct object that was half a
00:16:49.510 line and like other words when it crazy angles this was something that I did in
00:16:55.630 grade 7 all year no matter how much I hated it I think it was technically
00:17:01.480 useful I don't know why I was just part of you know school where I went to school one of my friends did all his
00:17:09.339 this way kind of doing an abstract syntax tree of English on you
00:17:15.550 grammar so I was talking to some folks about how excited I was about to about giving this
00:17:21.130 talk and they're like I don't actually remember any grammar at all so I'm gonna you know do a quick sidequest
00:17:27.909 my guess is that this will be a refresher for some of you and for most
00:17:33.279 of you who are like what why did your friends not know this stuff well answer in some cases because my
00:17:39.010 friends are monolingual and you don't learn grammar as well unless you know multiple languages and my experience so
00:17:44.380 real quick parts of speech this is one of the way we understand words we label
00:17:49.809 them with what they do so we have verbs verbs verbs are the most important part
00:17:55.389 of a sentence you can't have a sentence without a verb one type of verb is an action jump you could also have a state
00:18:02.710 of being think thinking nouns nouns are
00:18:08.409 person like Matt's or Alan Turing a place like Malaysia or bathroom or a
00:18:16.450 building or thing like a bird or a goat
00:18:22.330 I met this goat when I went hiking it was pretty cool but now ins can also be
00:18:27.340 ideas so you may have heard the phrase abstract noun some examples are democracy freedom love
00:18:34.600 those are all abstract nouns you can also have adjectives they describe or
00:18:40.480 modify other words usually known it gets complicated but there are things like attributes blue small v they also help
00:18:47.620 us compare like near and far you also have articles a and uh which
00:18:54.850 are sort of adjectives and sort of not and they don't no longer call them articles they now call them determiners
00:19:01.299 for reasons that I don't understand determiners also include this and that all the articles are determiners not all
00:19:08.080 the terminals are articles so yeah that was all the parts of speech that I care about for today we also have the parts
00:19:13.120 of a sentence the root this is the only required part of sentence which means it's the verb then we have the subject
00:19:20.139 which is the thing that does the verb guerilla thinks or just speaks and then
00:19:27.130 there's the direct object is the thing that the verb happens to so the cat eats fish cat is the subject the
00:19:35.559 root is eats and the direct object is fish side-quests complete so back to
00:19:42.910 sentence diagramming I promised this was actually important so this is basically
00:19:48.910 how a rough sentence diagram works for the kind I used in order to draw these diagrams I need to figure out which part
00:19:55.210 of speech or part of the sentence each word is to do that I need to use syntax the natural language API has a method
00:20:02.350 called syntax so my normal boilerplate and then I have a document I'm gonna
00:20:08.020 tell it that we're gonna work on the sentence the cat ate fish and I go hey document give me the syntax of that and
00:20:14.410 then I have it print out the tokens and I get this crazy pile of stuff this is the token for the word cat and there's
00:20:21.429 way more stuff here than matters like there's ideas of a grammatical gender English doesn't have grammatical gender
00:20:26.980 so that's kind of irrelevant here because this of course works on multiple languages but the important thing is
00:20:32.559 that here's the text itself cat it is four characters into the string that is its offset here's the part of speech it
00:20:39.820 is a noun and it is singular I can also have case if the language has case but
00:20:46.270 we're not doing German or other languages the case so we're not going to do that and then this is the most important part this token is labeled n
00:20:52.900 sub four normal nominative subject which is basically just saying that cat is the subject of the sentence which is good
00:20:58.750 because cat is the thing that eats fish so I was able to write some code that
00:21:04.570 created ASCII art versions of the sentence diagrams and it's really simple I'm gonna find the token that's marked
00:21:10.059 the Naumann of subject I'm gonna save that as sucked I'm gonna find the token that's marked as the root and save that as verb and then I'm gonna do some crazy
00:21:17.380 our ASCII art with puts and you know some math and it's awesome so the one
00:21:24.100 thing yes so the last line there is a
00:21:29.380 cool trick I learned you can multiply a string by a numeric and in order to make
00:21:34.900 sure that everything is spaced properly in this case so rockin ASCII art but
00:21:40.929 that's kind of boring because I don't have all the words yet so now I have direct objects while I go find the direct object in the tokens and
00:21:47.890 save that off and change my ASCII art up a little bit and then I get the cat ate fish oh wait well I'm missing the so how
00:21:55.630 do I include the so I have to actually look at that the results from the
00:22:00.790 natural language API and I get a head to token index for each word this is the index of the parent of the current token
00:22:07.870 so this is the token for the its head index token is one if I look at the
00:22:14.230 array of tokens the cat eats fish period the thing at index one is cats this is
00:22:22.690 telling us that the refers to cat someone write some really bad code
00:22:28.950 tokens go through them all if a tokens head index is the subject I'm going to
00:22:36.340 print the text of that token yes and with that I can make this diagram yay
00:22:43.360 all of my words so let's make this a little more a little more challenging so
00:22:48.580 now I have the cat ate fish with a side of milk yeah my code doesn't work at all
00:22:54.070 on that at all even a little bit so at this point I jumped ship and switched to
00:22:59.230 my old friend graph this was the actually the Jim I gave my very first conference talk about in 2007 something
00:23:05.770 like that ate the graph Jim is a gem that makes creating node and edge graphs
00:23:12.610 graphs like graph theory not graphs like bar charts provides a DSL in Ruby to
00:23:19.059 create dot files dot files are the file format that graph is uses to render graphs so some simple graph stuff you
00:23:27.460 create nodes by calling a node method you pass an ID and a label you create edges by calling the edge method with a
00:23:34.120 from in a two and it draws the arrows so here's the code it's actually relatively simple this is all the code you need
00:23:39.460 oddly enough this is some graph boilerplate this drops me into the graph
00:23:44.860 guess all I'm gonna go through each of the tokens with index I'm gonna make a
00:23:50.110 node specifying the index and using the text span text as the label and unless its head token index is
00:23:57.630 itself so unless it is referring to itself the only thing that does that is the root I'm gonna draw an edge from my
00:24:03.960 node I index to its head token and that
00:24:08.970 gives me this doll refers to cat cat refers to eight fish refers to eight and the period refers to eight because it's
00:24:15.960 the root but I can also use my more example my more complicated example of the cat ate fish with a side of milk and
00:24:22.769 it works as well and you can even see that of milk the prepositional phrase there is labeled correctly everything is
00:24:30.750 all laid out exactly the way you would expect it to be so I showed you some really silly examples today but there
00:24:37.799 are lots of practical uses for animal people handling customer feedback better understanding language summarizing
00:24:42.840 things for humans to read I'm sure that some of you have your own ideas on how you could use this at home so if you
00:24:50.880 just want to stick your foot into NLP the Google natural language API is a good place to start you don't have to go in and learn all the algorithms you
00:24:56.880 don't have to make sure that you build separate models for different languages many many languages are actively included and the first 5,000 a month
00:25:04.710 first 5,000 requests a month are free getting started is easy just installed the gem and you can experiment I
00:25:11.309 actually ran the Jabberwocky through it just to see what it would do because I'm at heart a tester and I actually got the
00:25:18.659 syntax out analysis is exactly correct so it's not based on just vocabulary
00:25:23.909 it's based on the structure of the language and endings and all sorts of other things like that so thank you very
00:25:30.179 much for having me uh I like dinosaurs so I have a ton of dinosaur stickers and
00:25:36.659 a bunch of Google stickers of various kinds in my bag I don't want to take them back home to the US so come come
00:25:43.380 find them for me thank you
00:25:48.390 thank you so much as a alright we're gonna take some questions
00:25:53.410 alright come on guys what questions you have on idle processes so the first two
00:25:58.840 speakers didn't get a lot of questions so I'm gonna promise you that I will show you a picture of a cat if you ask a question hard to say no to that there's
00:26:09.190 an obvious question mm-hmm come on we've got to Erin first can I see a cat Thank
00:26:20.290 You Roger do you have a real question
00:26:25.900 thanks Erin so you see that the English
00:26:31.660 language is bad right it's horrible but so if super
00:26:37.420 intelligence AI is able to create language that will that you will use to communicate are they I how does it how
00:26:44.380 do you think we look so the question is English is hard what would a super
00:26:52.300 intelligent AI that could actually create language cuz you'll notice that everything I talked about today was analyzing human language it wasn't it
00:27:00.100 wasn't creating spontaneous language how would that super intelligent AI look I don't know actually I know that we're
00:27:06.820 getting closer and closer to having really complicated AI but at the end of the day and I've been working on a blog
00:27:13.090 series on basics of machine learning we are we are still at this point for the
00:27:19.480 most part blocked by our datasets and blocked by our ability to create algorithms there's a really interesting
00:27:25.150 field that's coming up an algorithmic bias and how we're limited by the data that we have access to so I actually
00:27:31.810 have no idea what I would look like and I'm a little bit afraid because there's been so many horror movies written about
00:27:37.000 that you got
00:27:45.380 certain point they just a certain point they just stop conversation in the in
00:27:50.460 English and instead of saying give me three pieces so that they say me be me
00:27:55.650 today they repeated three times with the opponent from from their nonprofit you
00:28:01.440 said you watch that in a movie okay and not Teleca okay all right okey-dokey one
00:28:13.140 last question oh sorry yeah blue girls he was
00:28:18.810 pointing at you
00:28:31.990 so the question was the question was am I using Ruby itself for doing any of the
00:28:38.000 natural language processing or I'm just using it to consume an API I've played a little bit with like implementing things
00:28:49.750 Ruby doesn't have all the libraries that something like Python or Java has built in and I don't have a PhD in machine
00:28:56.540 learning makes me a little sad sometimes and so I don't have that don't necessarily know the right the right set
00:29:02.660 of tools and so I chose to use the API because it was faster and easier and the
00:29:07.940 best part is it's getting better all of the all the machine learning API is that we've released are getting better and
00:29:14.030 better and better like the vision one can identify breeds of cats and dogs why
00:29:19.040 because people wanted that okay we added it and that means that I'm not responsible for maintaining it as
00:29:25.190 language advances and as technology advances I do want to play with it I actually just did a blog post on some
00:29:31.940 basic machine learning techniques in Ruby I've got K nearest neighbors I've got basic linear regression there's
00:29:37.010 gonna be a couple more showing that you can actually use Ruby for this stuff but if there's something that exists coming
00:29:43.700 back to my belief and being fundamentally lazy I'm going to use the lazy thing that's my cat Emma more cats
00:29:52.450 do we have any final question one last one yes sir so in English you can say
00:29:59.930 the same thing with different structure right so can this um Google cloud
00:30:05.420 language understand in intent so tell me what you are saying with different ways
00:30:12.340 so the Google code language API just takes text we also have a speech API and
00:30:20.210 there's probably ways to hook those two together to try to understand intent but most of what it can use is context so
00:30:26.690 you can have it understand a single sentence and it may get it wrong one of my favorite examples is I had it do bunnies hop and it didn't understand
00:30:34.640 that bunnies was a noun in that context but if I said the bunnies hop and put a period on the end it all of a sudden
00:30:40.670 understood it because I had additional context languages are hard and computers are
00:30:46.410 actually really really dumb they're just really really fast at doing dumb things over and over and over again so okay my
00:30:52.950 last kitten there you go thank you so much eyes up for the cats dinosaurs
Explore all talks recorded at RubyConf MY 2017
+16