Summarized using AI

Detecting and classifying object images using ruby

Fabio Leandro Janiszevski • November 15, 2024 • Chicago, IL • Talk

Introduction

In the talk "Detecting and Classifying Object Images Using Ruby," Fabio Leandro Janiszevski presents a novel approach to digital image processing using the Ruby programming language, diverging from more common frameworks. This presentation, given at RubyConf 2024, aims to demonstrate how Ruby can be effectively employed for object detection and classification through deep learning methods.

Key Points

  • Presenter Background: Fabio introduces himself as a software engineer from Brazil, highlighting his experience with web applications and his work in digital image processing at COD Miner.

  • Digital Image Processing Overview: The discussion begins with an overview of digital image processing essentials, including how images can be represented using arrays for different color models (grayscale and RGB).

  • Choice of Tools: Janiszevski compares two major libraries, ImageMagick and OpenCV, emphasizing OpenCV's dominance in academic research and its extensive use. He notes the outdated Ruby wrappers for OpenCV, prompting the decision to explore alternatives like PyCall, which allows Ruby to interface with Python libraries easily.

  • Research Application: Fabio describes his master’s research, which involves capturing soil images using drones, and processing these to create heat maps of chemical data. He explains the transformation of raw images into usable data to classify soil features using deep learning.

  • Implementation Examples: The presenter outlines practical examples using OpenCV through Ruby with PyCall, detailing how to read, modify, and analyze images, including conversion to grayscale and handling multi-band images for vegetation indexes.

  • Deep Learning Techniques: The discussion shifts to using deep learning frameworks like Caffe for classification tasks. Fabio explains how to set up the model, define layers, evaluate results, and the importance of preparing datasets for training models.

  • Research Findings: He provides insights into the results of his research, achieving accuracy metrics for potassium, phosphorus, and organic matter in soil, underscoring the efficacy of their image-processing methods.

  • Future Directions: Moving forward, Fabio mentions the need for further integration with frameworks such as TensorFlow and Julia, and expresses a desire to advance Ruby’s capabilities in the realm of AI and machine learning.

Conclusion

Fabio concludes that while Ruby has great potential, it currently lags behind other languages like Python in the field of AI. He encourages the community to adopt and adapt existing solutions to enhance Ruby's functionality in image processing and machine learning applications, fostering collaboration and innovation in the Ruby ecosystem.

Detecting and classifying object images using ruby
Fabio Leandro Janiszevski • November 15, 2024 • Chicago, IL • Talk

When we discuss Digital Image Processing, we always encounter other
programming languages but Ruby. Today, Ruby will rise in this topic! I'll discuss
the Ruby implementations that helped me write code to detect objects in a tiled
image and then classify them using deep learning techniques. AI is right there!

Repo: https://github.com/fabiosammy/rubyconf2024

RubyConf 2024

00:00:15.400 so good morning uh everyone so let's
00:00:18.039 talk about detecting and classify object
00:00:20.119 image using Ruby uh about myself so my
00:00:24.240 name is Fab Leandro uh you can find me
00:00:27.160 using uh my nickname uh fa semi uh I'm
00:00:31.720 from Brazil so English is not my first L
00:00:35.040 uh language so uh feedbacks are welcome
00:00:38.800 please uh and also I am software
00:00:41.239 engineer at COD Miner for the company uh
00:00:44.520 at COD Miner we are software Boutique uh
00:00:48.399 company that we can handle and help you
00:00:51.719 with any kind of web application mainly
00:00:55.840 our uh Ruben raos application but feel
00:00:58.440 free to reach us and and contact us if
00:01:01.320 you need some help in your team or in
00:01:03.920 your web application uh our examp all
00:01:07.400 examples that I'm using today um are
00:01:10.560 available at
00:01:12.680 my uh GitHub page uh in the Ripple call
00:01:16.720 the rubikon
00:01:18.320 2024 uh the care code is to the to the
00:01:22.200 Ripple uh so I will talk about my
00:01:25.079 research overview uh about the digital
00:01:28.520 image processing Ruby options that we
00:01:31.159 had at the time uh also object detection
00:01:34.920 supervis classification and while we're
00:01:38.000 going to try to do and what what else
00:01:40.680 you can also try okay so my master
00:01:44.439 degree research are basically um we we
00:01:49.000 collect some soil samples in the field
00:01:51.600 in the Farms or whatever is the place we
00:01:54.600 sent to the lab they sent to us uh the
00:01:57.479 chemical information about this soil
00:01:59.520 sample
00:02:00.920 and with that we also launch a drone
00:02:04.039 with multiple cameras and sensors to
00:02:06.799 capture uh the soil
00:02:09.319 image uh we collect the the image do
00:02:13.120 some some formulas in there and put on a
00:02:17.200 CNN and try to get a heat map about the
00:02:20.680 chemical that are in the
00:02:22.920 soil so basically we have
00:02:25.640 like 10,000 of images we try to glue
00:02:30.280 those imag is called the image sting and
00:02:34.239 to have a osic like a a map of the field
00:02:38.560 and after that we we crop the image and
00:02:41.720 create some image with that and this is
00:02:44.400 the process that I talk about today so
00:02:48.120 Digital Image processing every
00:02:50.440 everything that we did on programming is
00:02:53.480 not about the languag is about the
00:02:55.640 process so for Digital Image processing
00:02:58.920 basically we have
00:03:00.840 a way to represent the image uh that
00:03:05.360 major of the times we are using AR race
00:03:07.680 to do that um like uh uh a grayscale
00:03:13.640 image can be represented by single
00:03:15.799 single values uh for for each pixel and
00:03:19.599 for our RGB image we have like for each
00:03:23.799 color band we have a different value uh
00:03:26.680 and also those values can depending off
00:03:29.000 the colors space and Beyond so basically
00:03:32.760 the pattern is uh to to handle with
00:03:36.480 digital images on on on language
00:03:40.080 programming we can use in this kind of
00:03:43.720 process uh and and today we have like
00:03:47.400 two major projects that are focused on
00:03:50.480 digital image processing that's called
00:03:52.879 image magic and open CV uh image magic
00:03:56.560 is more focused on the uh image per se
00:04:01.239 like cropping doing some blur and this
00:04:04.599 kind of stuff and the open CV has the
00:04:07.360 digital image processing there but also
00:04:09.879 is using by uh to visual Computing like
00:04:13.680 detecting detecting uh sub image and
00:04:17.120 this kind of stuff and since I'm from
00:04:20.280 Academia also a master degree student uh
00:04:23.919 we need to to look at uh Publications
00:04:29.560 and and see what is going on on the
00:04:32.039 Academia field uh if you look at El and
00:04:37.759 it3 uh Publications uh you going to got
00:04:41.639 you got to got like uh 17 Publications
00:04:46.800 and the last four years using Mage magic
00:04:49.560 but with open CVS more than 6,000 of
00:04:53.919 publication in the last uh five
00:04:57.000 years uh so we choose to to use open CV
00:05:04.160 for hours uh for ours
00:05:07.520 research um okay and open CV has a a
00:05:11.919 bunch of rappers um that you can use in
00:05:15.759 on C uh python um Java and also Ruby but
00:05:22.520 sadly the the Ruby wrapper is is too old
00:05:26.440 it's like the last commit is I don't
00:05:29.360 know eight years ago maybe seven uh so
00:05:34.160 we had some choices to do uh we need to
00:05:37.600 Define if you going to keep up with the
00:05:39.800 rubby wrapper or I don't know go to
00:05:43.560 image magic uh the image Magic Gem is
00:05:47.479 keep is up to date uh maybe create our
00:05:51.039 own implementation size is just a
00:05:53.199 process or other open CV implementations
00:05:57.240 from
00:05:58.160 Ruby uh so we had to make a choice the
00:06:01.560 other three ones were not uh the the
00:06:05.560 fastest way to our uh reproduce some
00:06:09.039 some articles
00:06:11.280 so we found the
00:06:13.800 P please uh don't don't leave yet I
00:06:19.280 know uh but yeah uh the p is to C python
00:06:24.120 from Ruby yes uh the author is K Morata
00:06:27.759 and the first release are was in 2006 16
00:06:33.160 and the latest releas is is from May of
00:06:37.000 this
00:06:37.919 year okay so our problems are
00:06:43.039 resolved okay so to use p is simple as
00:06:46.960 that uh you can uh load the py call
00:06:50.360 import included the methods and using a
00:06:53.360 method called p p import uh the port
00:06:57.039 will uh import the package from python
00:06:59.759 so you need to uh set up the the entire
00:07:03.160 environment the entire P python
00:07:05.160 environment and install the python
00:07:07.199 packes that you're going to you're going
00:07:08.879 to use in in my Ripple uh I put a Docker
00:07:12.800 file there that created all the python
00:07:16.319 python environment and install like the
00:07:18.520 open CV and other tools that uh I will
00:07:22.160 show you
00:07:23.319 today okay uh this is an example uh on
00:07:28.720 how you can
00:07:30.160 uh do a Hello word using using the py
00:07:33.120 call so the the print is execute by the
00:07:37.160 python and you have the object size the
00:07:40.400 print is not like a value is just uh STD
00:07:44.520 out print uh the the value is new now
00:07:50.720 side okay so we have the python now
00:07:54.919 let's try the open CV uh to load the
00:07:58.159 open CV uh on python SC
00:08:01.080 CV2 and to load the images is simple as
00:08:05.960 that uh you can call the the IM readd uh
00:08:10.520 method from Pyon and as I said before is
00:08:14.520 represent by by
00:08:16.440 aray um so this is example of the RGB
00:08:20.919 image um well in open CV is not RGB uh
00:08:27.560 open CV loads as B so it swap the um the
00:08:33.760 color Channel long star short is about
00:08:36.880 the history and Legacy on the early
00:08:40.719 2000s years okay so like the first pixel
00:08:46.519 is represent by array with those values
00:08:49.760 uh and uh the range values uh by default
00:08:53.920 is 0 to 255 and is B jar so this this
00:08:59.440 first position uh represents the the
00:09:02.519 blue color uh green color and red color
00:09:06.399 and for you to convert the the BJ image
00:09:10.800 to to a gray scaled image the CV CVT
00:09:15.000 color is there so it's a method from
00:09:17.640 open CV you can call that and all of
00:09:20.959 these are executed by python but you can
00:09:24.079 have the um the value on Ruby so this is
00:09:28.720 uh RB
00:09:31.200 I RB uh shell uh that are calling
00:09:35.839 through Ruby so the value at the end is
00:09:40.519 161 so it's just one value for the for
00:09:43.360 the great pixel and there is you have a
00:09:47.079 great schedle image simple as
00:09:50.320 that uh because we have the open CV
00:09:53.519 working for
00:09:54.720 us okay so this is another example that
00:09:58.480 you can just resize the image draw a
00:10:01.399 rectangle in there and write the new
00:10:04.000 image uh using the object from from
00:10:08.399 Ruby okay so now we have a new image
00:10:11.800 there that's
00:10:13.959 great and this is
00:10:18.920 the the the option that we have that uh
00:10:23.959 we are TR about it because we we can
00:10:27.720 separate all the all the all the pixels
00:10:31.680 uh by by his value like split the image
00:10:35.320 uh so I have a image that represents the
00:10:38.200 blue value the green value and the red
00:10:40.240 value uh for our bjr image is is simple
00:10:43.600 as that uh but for our research we using
00:10:47.200 like mpect image with 10 color bands in
00:10:50.839 the single in the single image like
00:10:53.880 infrared New Year infrared and this kind
00:10:55.920 of stuff and with that uh sorry with
00:11:00.079 that we can apply some uh some formulas
00:11:03.519 that we call vegetation
00:11:05.720 indexes uh that is basically some
00:11:08.000 formulas that the academic field uh
00:11:11.399 provide for us uh
00:11:15.079 and man we can create some some just
00:11:19.440 doing this kind of stuff using Ruby this
00:11:22.959 this is awesome for us uh so basically
00:11:26.240 uh oh and at the end we can apply a
00:11:28.800 color m so uh the raow image uh is
00:11:32.920 represented by that like a binary image
00:11:35.720 just zero and and ones but with color
00:11:39.920 map we can have a a human uh view to to
00:11:46.959 identify whatever we want to identify in
00:11:49.760 the image uh
00:11:54.120 for like for the for the Mets and this
00:11:58.440 kind of stuff they are same values but
00:12:01.320 is more like for for human being SI
00:12:04.800 image and we can see whatever we want to
00:12:08.800 see and this is kind of stuff that we
00:12:12.440 can grab uh apply multiple uh vegal
00:12:15.760 indexes formula so this is the same
00:12:18.199 image and we can have like um multiple
00:12:22.880 informations based on a single a single
00:12:25.279 part of image applicating multiple
00:12:28.560 multiple formulas in there all the
00:12:30.920 formulas that I use in these examples
00:12:33.040 are also in the in the
00:12:35.800 r uh so that is great we can use in call
00:12:39.760 to to just load the open CV but not uh
00:12:43.839 just the open CV uh we can using uh
00:12:48.519 some some algorithms to detecting
00:12:52.079 objects uh we we usually uh we usually
00:12:57.800 use supervisor
00:12:59.920 um AI you know to classify the image uh
00:13:04.240 is not like full automatically AI but is
00:13:06.920 more like oh I need to focus I need you
00:13:11.279 you to focus on this and this kind of
00:13:14.160 classes so to do that we need to
00:13:17.360 generate some objects there and to
00:13:20.240 generate objects we can also using open
00:13:23.720 CV like finding counters uh you can
00:13:26.720 apply a threshold that will
00:13:29.839 uh find like some edges of the pixels
00:13:33.360 like the color space and this kind of
00:13:36.040 stuff so as I said before open CV
00:13:39.480 provide all of this for us and we can
00:13:43.279 have like a image with like this like
00:13:46.480 all the all the green stuff are objects
00:13:49.600 in the
00:13:50.800 image uh I appli this example for the
00:13:54.279 RGB image but if I use like the um the
00:13:58.759 visual ation index image I can have this
00:14:02.920 kind of objects so that's why we can
00:14:06.839 apply some some kind of stuff like
00:14:09.519 that and also we
00:14:11.920 have on other algorithm that can uh that
00:14:16.399 can be like for example slide window is
00:14:18.720 another algorithm for Academia that is
00:14:21.600 just I slide that will crop multiple
00:14:24.519 image in in the in the image that you
00:14:27.720 have and selective search is also
00:14:31.079 another another way to to generate your
00:14:33.880 objects uh all of those those algorithms
00:14:38.440 are available at my my
00:14:42.440 RI okay
00:14:44.639 good right we have a object I can load
00:14:48.279 on Ruby I can inst information from
00:14:50.639 there but where is the AI well the you
00:14:56.000 can use so many approaches to user
00:14:59.800 classify uh
00:15:02.279 image and like you need to see what the
00:15:06.440 Academia is is using uh go check the the
00:15:10.680 L server and
00:15:12.240 I3 but today the hype is using deep
00:15:16.240 learning CNN transfer learning
00:15:18.160 Transformers and this kind of stuff uh
00:15:21.000 it's basically the same
00:15:23.639 uh the same approach that llm uses to
00:15:27.880 achieve uh uh
00:15:29.680 the results it's not exactly the same
00:15:33.279 way but it's some kind uh also for image
00:15:37.600 you can uh check the challenges or
00:15:39.959 benchmarks call the image net
00:15:42.720 C uh e for the and kago competitions
00:15:47.000 they are like the benchmarks for results
00:15:50.440 of uh classifying
00:15:53.040 Imes uh
00:15:55.040 okay so uh for this example I'm using
00:15:59.600 the CFE uh deep learning framework uh is
00:16:04.240 built by Berkeley
00:16:05.839 University and it's simple as that uh
00:16:09.680 you load the the deploy partiy this is
00:16:13.639 uh this is adjacent file that have the
00:16:16.519 information about the layers and your
00:16:19.920 model the model is basically a huge
00:16:22.079 array with a lot of Randal values that
00:16:26.560 uh put the weight on on the Network to
00:16:31.000 to get the results and then you can
00:16:34.120 reshape the first layer this for this
00:16:37.519 example the data layer is is the input
00:16:41.240 uh to insert the image there and
00:16:43.759 basically you can do whatever you want
00:16:46.120 but uh you need to to
00:16:49.440 Define how many channels that you have
00:16:53.480 and the window size of each
00:16:55.720 image and the cafe framework have a
00:16:58.519 trans farmer um to like make a pattern
00:17:03.519 about your image like what what what
00:17:07.039 will be the the channels what is the
00:17:09.480 sequence of the color color channels uh
00:17:13.959 what will be the values what you the
00:17:16.079 size and this kind of stuff and you can
00:17:18.839 just call it and processing the image
00:17:21.319 right
00:17:22.000 there uh and to perform the the
00:17:25.360 valuation uh of the image you call the
00:17:29.200 forward and you're going to have a
00:17:31.400 result uh each each model will have your
00:17:35.440 own uh your own layers uh to evaluate
00:17:38.880 the results uh for this specific case uh
00:17:42.919 the layer is named softmax and basically
00:17:45.799 we going to have like uh the label um
00:17:49.919 the label names and
00:17:52.799 the like the curs for that label uh for
00:17:57.559 the example um just using like a high
00:18:00.840 value of a chemical and a lower level of
00:18:03.559 a chemical for this example is 99% that
00:18:07.360 is a lower level of the chemical that
00:18:10.400 I'm evaluating uh
00:18:13.720 I yeah I have so for this kind of stuff
00:18:17.919 I'm using uh aluminium uh values so it's
00:18:21.480 basically low level and higher level of
00:18:25.080 aluminum
00:18:26.799 um if you see the image seems okay great
00:18:31.720 like you have a roof and maybe uh is
00:18:34.720 high aluminum there um and you have a a
00:18:39.039 road that is low aluminum uh but I I I
00:18:42.760 grabb the best examples of that but uh
00:18:47.200 as uh as I said uh sometimes the the the
00:18:53.400 AI hallucinate that those vales so yeah
00:18:58.799 we can have like all those scripts using
00:19:01.720 python that Academy provides the
00:19:04.600 implementation for that and you can use
00:19:08.159 on Ruby and this is great okay I show
00:19:12.600 you how to use a model but how can I
00:19:15.480 train or build my own model to do that
00:19:18.679 is not a simple task because you need to
00:19:21.480 evaluate that and it's more like you
00:19:25.360 need uh to build your own database and
00:19:29.880 separate the image uh visualize the
00:19:32.760 results and try again try again try
00:19:35.320 again try again try again try again it's
00:19:37.520 it's is a huge loop on that so for such
00:19:42.760 task we use the envid digits it's a
00:19:46.240 platform build by Nidia and is not uh
00:19:52.320 it's not for Cafe you can use interal
00:19:54.960 flow and other other uh other projects
00:19:59.200 to do
00:20:00.159 that
00:20:02.679 um okay so I'll show you about the Deep
00:20:05.679 learning and some algorithms that you
00:20:08.559 can use that but also in the Academia we
00:20:12.200 have a bunch of other algorithms to do
00:20:15.000 that like K sift svm YOLO is another
00:20:19.640 kind of deep learning uh also a
00:20:22.679 basically as elen distance uh that like
00:20:27.280 is more like trying to your uh to your
00:20:30.760 problem uh see the results try again
00:20:34.039 change the thing try again see the
00:20:36.360 results and going on going on going on
00:20:40.159 so for us the cafe uh we have great
00:20:43.720 results using Cafe for for image uh but
00:20:48.200 sometimes you going to use like I don't
00:20:51.159 know uh see to to use for audio some
00:20:56.520 something like that uh um okay and the
00:21:01.280 results for my dissertations just to to
00:21:04.159 show you uh so the best the best uh
00:21:09.840 results that I have until now I have for
00:21:12.679 potassium phosphorus and organic matter
00:21:15.880 like I have a 64 for potassium 74 75%
00:21:21.080 for uh phosphorus and 74 for organic
00:21:25.080 matter like so this is the result that
00:21:28.600 we are trying to achieve uh creating
00:21:31.279 some some kind of hit map uh
00:21:35.000 but this is some way in fake because the
00:21:40.480 results is like a image with 10 color
00:21:43.760 bands and we cannot represent that for
00:21:46.799 human
00:21:47.679 view uh so I just appli the color map
00:21:51.080 that I showed
00:21:52.880 before uh also to correlate the image
00:21:56.520 and his position we using two two other
00:21:59.080 packages that are called gal and JY uh
00:22:02.919 we have both
00:22:05.400 uh we have similar uh packs uh similar
00:22:09.360 gems that can be used but it's not like
00:22:14.799 we have on python word uh and also a
00:22:18.520 great P to to stack this multiple color
00:22:22.720 image uh we use rer
00:22:25.919 iio uh future steps
00:22:29.279 you can try using ter sflow uh that is
00:22:32.240 another deep learn framework and is made
00:22:35.039 by Google and a lot of uh research and
00:22:40.520 have more tools that c provide for you
00:22:44.440 uh also we want to try to use Ruby Julia
00:22:47.679 Julia is another programming language
00:22:49.880 and Ruby Julia is something like by call
00:22:52.320 you can call uh the Julia language using
00:22:55.640 python uh but we we need a if work maybe
00:22:59.679 in two years we're going to try that um
00:23:03.799 also we are testing for other
00:23:07.200 elements so I'm a
00:23:11.080 liar uh the message that I I I I can put
00:23:16.559 on the table is more like great Ruby is
00:23:20.679 a great Community we know how to build
00:23:23.240 software but we cannot keep keep
00:23:26.279 Computing uh competing
00:23:29.200 with another Fields like uh the AI is
00:23:34.000 today is the bab steps of the AI uh we
00:23:38.960 are in baby steps for the last 15 years
00:23:42.080 but okay and it's more like the Academia
00:23:46.600 is is building that and they using
00:23:50.039 python uh and we like Ruby you know it's
00:23:55.320 beautiful
00:23:57.200 and I I don't want to to compare the
00:24:00.880 language but more like I like to be on
00:24:05.039 Ruby I like to use Ruby uh so P provide
00:24:09.720 a lot of uh opportunities to us uh to
00:24:13.240 bring what they are building
00:24:16.120 for uh for our our world
00:24:20.640 okay
00:24:22.919 um I think that we have some sometimes
00:24:25.440 for keyway any questions yes please
00:24:29.440 no no for us is uh more agriculture but
00:24:33.240 we have some teams that are using for
00:24:35.440 environment like they need to uh to to
00:24:39.320 see where where we have some rivers and
00:24:43.360 like determine how much of trees we need
00:24:50.080 uh inside of the rivers you know uh on
00:24:54.279 the on the on the field uh but like if
00:24:57.720 you send the Drone to the field you
00:25:00.600 cannot capture the the rivers because
00:25:03.520 the trees will will close the The
00:25:11.880 View
00:25:13.799 please special drones or like an off the
00:25:16.240 shelf drone no we we usually build our
00:25:19.399 own like sometimes we use uh phantom
00:25:23.679 for and we just uh have the mount to to
00:25:27.760 put some some cameras in
00:25:30.080 there but usually uh they build our own
00:25:35.679 drones
00:25:37.480 using I don't remember
00:25:39.880 the the framework to do
00:25:44.320 that that's
00:25:46.480 it okay thank you guys feel free to read
00:25:51.279 me
Explore all talks recorded at RubyConf 2024
+64