00:00:11.420
All right, how are we doing, RailsConf? Thank you all for coming.
00:00:18.720
I know there are a lot of other great sessions right now, so I appreciate you being here with me.
00:00:25.380
My name is Andy Glass, and I'm a Brooklyn-based Rubyist.
00:00:32.070
I spend a third of my time as an entrepreneur, I'm a nomad, I'm a maker, and I'm also a Guinness World Record adjudicator.
00:00:38.160
So, I feel I’m perfectly suited for this unusual Rails app track.
00:00:44.219
I’m here to talk about how to create a human-powered API with Ruby on Rails and Mechanical Turk.
00:00:49.350
So first, why are we talking about MTurk? Who has heard of MTurk? Show of hands.
00:00:55.170
Nice! A lot of people. And who’s actually using MTurk?
00:01:01.440
Okay, a few people. Come on in!
00:01:06.990
So yeah, I briefly ran a company that built custom APIs off of Turk to clean data, but it failed miserably.
00:01:12.299
However, I found it to be an interesting enough experience that I wanted to create a talk about it.
00:01:18.360
The gist is that you can integrate with MTurk, which provides a scalable, 24/7, always-available workforce.
00:01:24.479
Though there is some controversy around Mechanical Turk, which we’ll get to.
00:01:31.439
First, I want to express my gratitude to Rails.
00:01:36.750
I owe everything in my career to you all, not just any of you personally, but to the Rails community at large.
00:01:42.270
I’m so thankful to be a part of this community. It’s an honor to speak at RailsConf for the first time.
00:01:49.799
Being a programmer has given me the financial freedom to live an unusual life.
00:01:57.479
More importantly, it has given me the confidence to pursue unusual pursuits.
00:02:03.869
I believe that's what being a Rails developer taught me—that anything can be accomplished.
00:02:10.080
So what do I owe you? I think I owe you something unusual.
00:02:16.860
You are spending your valuable time with me in this room.
00:02:24.660
After I finish this, you probably won't leave, although feel free to leave if you want.
00:02:30.810
Maybe this talk isn’t really about Mechanical Turk yet; it still is a talk about Mechanical Turk.
00:02:38.010
Maybe it’s not a talk about crowdsourcing, though it kind of is.
00:02:44.280
Or maybe it is just a talk about being an impostor.
00:02:49.470
I know impostor syndrome has been a topic at RailsConf so far.
00:02:54.860
I struggle with it. Some days, I don’t feel like a good enough developer.
00:03:00.840
I often feel like a bad Rails developer who doesn’t deserve to share this stage.
00:03:06.330
I tried to print my company logo on a t-shirt, but I didn’t wear it.
00:03:12.510
But I think we’re in good company.
00:03:19.380
According to Wikipedia, many people suffer from impostor syndrome.
00:03:26.579
The talks about it often explore whether it’s bad or good.
00:03:32.070
I think it’s good.
00:03:38.010
There’s a pretty cool article I checked out that explains why impostor syndrome is beneficial.
00:03:44.340
It states that if you’re interested in personal growth, you’ll continuously push yourself into new experiences.
00:03:52.109
When we are in unfamiliar territory, we often feel less comfortable than when performing familiar tasks.
00:03:58.709
It's about growth, and I believe you need to fake it.
00:04:04.019
What does that mean?
00:04:09.329
The article suggests it’s about the impostor experience, not impostor syndrome.
00:04:15.150
This is something we should expect in our lives as we push our limits.
00:04:22.169
It should be embraced, and we should consider our personal responses to it.
00:04:28.500
We need to realize that we belong, and our successes are not accidents.
00:04:35.580
So, it’s not just about Mechanical Turk. It’s about learning to fake it.
00:04:41.670
We’re going to build a human-powered API.
00:04:47.430
At the end of the day, it’s really just an API, and Mechanical Turk is, at its core, about faking.
00:04:53.040
Now let’s talk about Turk.
00:04:59.970
How did it start? It followed the path of a few other AWS products.
00:05:06.600
It began as an internal tool and then became open to the world.
00:05:13.590
Basically, it’s a marketplace for online micro-jobs where requesters can post tasks at different price points, and workers can complete them.
00:05:22.260
Does anyone know how Mechanical Turk got its name?
00:05:29.729
It is named after the Turk, an 18th-century chess-playing robot.
00:05:35.220
Of course, this was not an actual robot. Inside that Turk, there was a person controlling it.
00:05:41.670
So, it was a hoax! I told you this was about faking from the beginning.
00:05:48.600
Now, what can we use MTurk for? There are four main use cases.
00:05:55.830
These can be organized into four buckets: image and video processing, data verification and cleanup, information gathering, and data processing.
00:06:01.290
When I was using Turk, I did a lot of analyzing or dividing up video data.
00:06:06.600
I also conducted data validation for leads and provided business information.
00:06:13.590
For instance, I would give a worker the address of a salon and ask them to verify if the salon offers a specific service.
00:06:22.260
Workers would also figure out the cost for that particular service.
00:06:29.100
Turk is widely used nowadays for a lot of random tasks.
00:06:35.220
It is cited in hundreds of academic journals, and it has been used to analyze satellite data where human input is needed.
00:06:43.020
Additionally, it is often used for training machine learning algorithms.
00:06:50.180
Interestingly, much of the work done on Turk is also something that can be accomplished using AI.
00:06:55.790
So, in some ways, it can be considered a regressive technology.
00:07:02.520
Let’s discuss how someone could potentially make money as a Turk worker.
00:07:07.150
Here’s a rough estimate of the Turk environment.
00:07:12.240
1,500 groups of HITs, which means human intelligence tasks, equals about 300,000 individual assignments.
00:07:20.000
Each broad-stroke question might have an accompanying specific assignment.
00:07:27.250
An example could be: 'Does this salon offer microdermabrasion?'—the individual assignment being a specific salon.
00:07:34.400
The most HITs in a single group was 15,000, which means one person requested input on 15,000 salons.
00:07:41.670
The lowest reward given for completing a task was a penny, which is frankly crazy.
00:07:46.800
The highest reward was $150 for an eligible worker to transcribe two hours of audio.
00:07:53.040
I took some screenshots of work I saw on Turk.
00:07:59.970
For example, one task involved tracking fingerspelling.
00:08:05.160
They used a widget to start and stop a timer when fingerspelling occurred.
00:08:11.100
I thought this was quite interesting.
00:08:17.250
I also came across some humorous spam tasks, like a request to sign up with a Robinhood referral link.
00:08:22.080
Another task involved identifying fashion items in images.
00:08:27.130
One common task is extracting data from shopping receipts.
00:08:34.350
If you can develop a method for doing this using AI, it could be a lucrative opportunity.
00:08:41.160
Let’s consider a use case for today.
00:08:47.820
I spent considerable time contemplating what we should do for this talk.
00:08:54.090
I wanted it to be challenging enough to intrigue you, hoping it wouldn't be oversimplified.
00:09:03.200
I decided on a social media content scraper since we are in Pittsburgh.
00:09:09.800
Pittsburgh is famously known for having French fries on their sandwiches.
00:09:15.980
Quick show of hands: who is against this practice?
00:09:21.870
Only one person? That’s cool!
00:09:26.450
Everyone else seems to be okay with it. Awesome.
00:09:31.070
I chose this topic because it might be more fun than identifying different types of bridges.
00:09:37.160
Let's walk through the process of developing this without Ruby first, nor Rails.
00:09:44.130
We'll assemble some sample data using Instagram posts and create a new Mechanical Turk project.
00:09:50.890
We'll load up a batch and review the results.
00:09:57.300
So, we started by using the Turk GUI.
00:10:02.760
The title is: 'Look at a picture of a delicious sandwich and determine if there are French fries in it.'
00:10:09.180
Next, we set various properties, including how much to pay per assignment.
00:10:16.560
We also decided the number of assignments per HIT, which means we want each sandwich to go to two different people.
00:10:24.060
This way, their results would be validated and corroborated for accuracy.
00:10:30.470
We also allocated some time for completion.
00:10:36.530
We decided to use a GUI and chose their categorization template.
00:10:42.830
If any of you decide to test this out, I recommend using several templates available on Turk for ease.
00:10:50.210
The layout included explicit instructions like: 'Do not count if fries are on the side; we only care if they are in the sandwich.'
00:10:57.880
I also instructed workers to pay extra caution if the sandwich was cut in half.
00:11:04.000
We crafted the template with embedded Instagram posts.
00:11:10.330
After uploading a CSV with 21 sandwiches and offering 15 cents per task, I loaded everything in.
00:11:17.960
Turk does take a fee on top of that, but this was a minimal cost.
00:11:25.320
I then played the new Carly Rae Jepsen album for about 23 minutes—highly recommend it!
00:11:31.060
After those 23 minutes, my tasks were complete.
00:11:36.980
The output encompassed different columns, including HIT ID and assignment ID.
00:11:43.300
Each sandwich corresponds to an assignment ID for its individual task.
00:11:49.670
The worker ID indicates which worker completed the task and how long they took.
00:11:56.200
We also saw the input and the answer for each task.
00:12:02.450
As for time figures, they ranged from four to seven hundred seconds.
00:12:09.960
The median time taken was 42 seconds.
00:12:16.670
Interestingly, many workers completed multiple assignments.
00:12:23.290
Those who found simple tasks often cranked through several.
00:12:28.800
The maximum number completed by one worker was 21, as they wouldn’t have been able to do two of the same task.
00:12:35.200
We achieved great consensus on the results, where both workers agreed on whether there were or weren't fries in the sandwich.
00:12:41.830
We also measured accuracy, with a few edge cases I'm going to show you later.
00:12:48.290
These cases correctly identified sandwiches without fries.
00:12:56.200
A sandwich that apparently had no fries did slip through the cracks.
00:13:04.400
There were, indeed, some fries in the sandwich after a closer look.
00:13:11.000
Now, some tips and tricks for getting accurate results on Turk will be shared at the end.
00:13:17.400
We’re software developers; we’re not here to just upload a CSV file.
00:13:24.040
Let’s automate this process.
00:13:30.320
Let’s assume we already have a scraper for Twitter and Instagram.
00:13:36.720
We would push any posts with hashtag sandwich or hashtag Pittsburgh to our API.
00:13:43.700
Then, another application will process the Turk and post to another API for reading.
00:13:50.500
Now, how are we going to approach this?
00:13:59.160
First, we’ll create the Ruby on Rails service for MTurk.
00:14:07.100
We'll create processes for loading the task, approving results, and re-inputting tasks as needed.
00:14:13.560
And we will also serve our results via API.
00:14:21.580
I want to give credit to the two gems I used to build this project, which were really helpful.
00:14:36.170
The first is Turkey by Jim Jones—it’s built on top of our Turk.
00:14:44.200
It simplifies a lot of database models and makes creating forms easy.
00:14:51.550
The second gem is 'our Turk' by Ryan Pate, which is a simpler Mechanical Turk Ruby layer.
00:14:57.750
These are not optimized for Rails 5; my old app was on Rails 4.
00:15:03.000
Our Turk was built on top of an Amazon gem that has since been deprecated.
00:15:09.530
But they're still really great for this project.
00:15:14.500
So here’s our basic data model.
00:15:21.900
A batch is an overall task, like determining if a sandwich has fries.
00:15:28.500
Output field names will include categories and selectable options.
00:15:35.050
Each sandwich would be a batch item, and each one would have a result.
00:15:44.280
This part is a bit condensed, but we’ll create the batch with its title, description, and instructions.
00:15:52.050
We’ll specify the output field name and options, then input our post IDs.
00:15:59.400
Next, we’ll bring in Turkey, which integrates seamlessly.
00:16:05.930
The first step creates a basic configuration file with AWS credentials.
00:16:12.320
The second thing it does is create the database models.
00:16:18.340
The first model is a turkey task, which corresponds to each task put into Turk.
00:16:25.420
We call each of the assignments imported from Turk an imported assignment.
00:16:31.510
This model correlates to the results we specify in the batch item.
00:16:38.800
It's important to understand that the turkey imported assignment does not directly store result data.
00:16:45.700
It instead connects to the results within our batch item.
00:16:53.000
Let’s move on to launching the batch.
00:16:59.600
We're going to set some fairly simple variables to be sent to Turk.
00:17:05.450
One important requirement is specifying the model that will be created based on input data from the form.
00:17:12.360
We’ll set the number of assignments, only allowing workers with an approval rating greater than 95%.
00:17:19.850
Don’t forget to specify the form URL, embedding the ID into the batch item.
00:17:27.310
Turkey allows us to post forms to Turk and handles the arguments seamlessly.
00:17:34.330
We’ll need to import the results afterwards, as we won’t receive data directly into our server.
00:17:40.940
Let's take a look at the process of importing the result data.
00:17:49.530
We will create a turkey process that generates imported assignment records.
00:17:55.960
This process will also create our batch response records for handling the results.
00:18:03.500
The turkey imported assignment will include IDs correlating to Turk as well as the worker ID.
00:18:11.440
Additionally, it contains the task ID associated with our batch model.
00:18:17.810
The results are stored in a JSON hash format.
00:18:25.160
Let’s reference our original schema.
00:18:31.360
We began with batch, batch item, and result models.
00:18:37.960
Each batch item corresponds with a turkey task that inputs into Turk.
00:18:43.930
Each turkey task has multiple assignments, typically two or more.
00:18:52.410
Now, we also consider reprocessing and validating the output.
00:19:00.960
This means each batch item needs to have completed results.
00:19:07.830
We can send the results to our adjudicator model, confirming completion or reprocessing as necessary.
00:19:13.020
If the adjudicator approves, we’ll update the attributes to complete.
00:19:20.090
If there's disagreement, we can reprocess it in Turk for additional input.
00:19:26.060
The adjudicator model examines the results and makes decisions based on a histogram of outcomes.
00:19:32.360
If more than 50% of responses agree, we approve the result.
00:19:40.200
If not, we disapprove.
00:19:45.320
We also included a rake task to process everything.
00:19:50.740
This task imports hits from Turk and evaluates all incomplete batch items.
00:19:57.470
Then we determine whether to approve or reject the results.
00:20:05.780
We can establish a cron job to run this rake task every five minutes.
00:20:12.640
Thus, we’re ready to serve our batch items via an API!
00:20:18.660
The initial use case we discussed was efficient for our sandwich task.
00:20:23.430
However, we need to ensure extensibility for other potential use cases.
00:20:30.870
A crucial aspect of the Turk app I created was having multiple batch items in a single task.
00:20:36.670
For example, instead of analyzing one sandwich, we could analyze three.
00:20:43.360
This reinforces pricing and improves volume statistics.
00:20:51.440
Additionally, we can work on complex reprocessing flows, where certain data is reconfirmed.
00:20:57.240
For example, we might need name and address confirmed again after previously collecting phone number and email.
00:21:03.110
That’s a more sophisticated flow.
00:21:10.340
Currently, I want to focus on having different inputs and outputs for batch items.
00:21:17.750
We may want diverse outputs per model.
00:21:25.130
For example, while asking if fries are present, we could also inquire on how delicious the sandwich looked.
00:21:32.800
Having multiple input types is also key. We might gather names, emails, and website information along with photos.
00:21:39.180
Furthermore, we could have a numerical output, like counting fries!
00:21:46.970
We can determine the success of results based on diverse outputs.
00:21:54.460
For instance, if one worker claims there are nine fries, another says eight, and another claims seven.
00:22:00.780
We could likely accept eight fries as the result.
00:22:06.430
In terms of setup, we need to create input and output methods for the batch items.
00:22:12.480
For instance, an Instagram post will be our batch input, and its key will be its sandwich ID.
00:22:21.750
We then make sure to correlate this batch input to the corresponding task.
00:22:28.190
Next, the batch output may be framed around categories.
00:22:34.700
In this case, we specify yes/no labels with display settings.
00:22:41.220
We will also need adjudicator criteria for determining acceptance.
00:22:48.470
Think of cases where we want to confirm what we display based on input gathered.
00:22:55.070
This can involve identifying counts or categories.
00:23:01.860
I designed various different input formats and output formats for my Turk app.
00:23:09.560
Business listings were valuable for collecting addresses, emails, and other essential details.
00:23:16.600
We could also work with images, social media posts, and video.
00:23:24.000
Regarding output formats, we can work with text, numbers, and multi-select categories.
00:23:30.130
When handling multiple text outputs, it's crucial to clarify the logic used in processing results.
00:23:37.020
Next, let's discuss tips for achieving accuracy on Turk:
00:23:45.050
These are straightforward UX practices, such as providing clear instructions.
00:23:52.130
Ensure straightforward tasks; if someone needs to Google information, provide the URL.
00:23:59.870
One technique is to incorporate gold data into your tasks.
00:24:07.490
You can also screen workers using criteria for particular qualifications.
00:24:14.410
This includes approval rates and background tasks.
00:24:21.950
A key consideration involves setting prices to encourage task completion.
00:24:28.210
Remember that a market exists; the more tasks available, the better the completion rates.
00:24:35.240
Higher HIT counts lead to incentivization of workers seeking volume.
00:24:43.160
Now, let’s discuss the ethics of MTurk.
00:24:50.040
There are two notable articles highlighting its implications.
00:24:57.300
One studied the use of Turk and the prevalence of low wages.
00:25:05.500
The other is a letter-writing campaign requesting better wages.
00:25:12.230
Surprisingly, a large portion of Turk’s worker pool is made up of U.S. workers.
00:25:20.010
Yet, many are from countries with lower wages.
00:25:25.780
This raises the important question of fairness and exploitation.
00:25:32.310
Another issue stems from requester dishonesty and rejection of work.
00:25:39.950
This situation can lead to workers being unpaid for their efforts.
00:25:47.220
Another controversy involved Cambridge Analytica.
00:25:54.050
They used Facebook quizzes to collect data, many unaware they were paid to do so.
00:26:01.830
In total, 240,000 took those quizzes, and they were subsequently banned by Amazon.
00:26:09.890
It’s argued that MTurk played a role in influencing the election.
00:26:16.640
Thank you for your attention!
00:26:23.210
I hope you gained technical insights on Turk and crowdsourcing.
00:26:30.470
I enjoyed my experiences with Turk.
00:26:37.110
My goal was to coordinate people to process data more quickly.
00:26:43.670
Remember, what we do as developers is based on practice.
00:26:50.310
But it’s also about pushing our boundaries.
00:26:56.520
Thanks for listening!
00:27:02.800
By the way, I started a landing page service in New York.
00:27:09.110
We help high-growth companies create dynamic landing pages and run experiments.
00:27:16.240
I’m looking to hire Rails developers in New York or remotely.
00:27:23.590
Thank you for your time, and are there any questions?