Human Powered Rails: Automated Crowdsourcing In Your RoR App

by Andy Glass

In this talk at RailsConf 2018, Andy Glass discusses how to effectively create a human-powered API using Ruby on Rails in conjunction with Amazon Mechanical Turk (MTurk). The presentation begins with Andy's background and his experiences with programming that shaped his career. He then introduces the concept of MTurk as a marketplace for micro- tasks, linking requesters who need tasks completed with workers who perform those tasks. The talk covers several key points regarding integrating human input into applications quickly and efficiently:

MTurk Overview: MTurk provides a scalable and always-available workforce for various tasks such as image processing, data verification, and information gathering.
Historical Background: The name 'Mechanical Turk' comes from an 18th-century chess-playing machine that was revealed to be controlled by a human hidden inside.
Use Cases: Common applications for MTurk include data validation, price research, and training machine learning models.
Practical Application: Andy illustrates the process of using MTurk with a real-world example, designing a project to determine if certain Pittsburgh sandwiches contain fries. He discusses the setup process for requesting work from MTurk workers, including task descriptions and payment structures.
Automation in Rails: The importance of automating the MTurk process within a Ruby on Rails application is emphasized, explaining how background jobs can manage task submissions and data retrieval effectively.
Adjudication Process: To ensure accurate results, a system for adjudicating responses from workers is discussed, where conflicting answers can trigger further review from additional workers.
Ethical Considerations: The presentation addresses important discussions about the ethics of using MTurk, highlighting prevalent concerns regarding fair wages and the potential exploitation of low-cost labor from both U.S. and global workers.

By the end of the session, Andy Glass encourages attendees to explore the possibilities of integrating human input into their applications through MTurk, inspiring curiosity about leveraging crowdsourced labor effectively in software development. He leaves the audience with a sense of achievement that through collaboration and effort, software developers can create impactful solutions that push their boundaries.

00:00:11.420 All right, how are we doing, RailsConf? Thank you all for coming.

00:00:18.720 I know there are a lot of other great sessions right now, so I appreciate you being here with me.

00:00:25.380 My name is Andy Glass, and I'm a Brooklyn-based Rubyist.

00:00:32.070 I spend a third of my time as an entrepreneur, I'm a nomad, I'm a maker, and I'm also a Guinness World Record adjudicator.

00:00:38.160 So, I feel I’m perfectly suited for this unusual Rails app track.

00:00:44.219 I’m here to talk about how to create a human-powered API with Ruby on Rails and Mechanical Turk.

00:00:49.350 So first, why are we talking about MTurk? Who has heard of MTurk? Show of hands.

00:00:55.170 Nice! A lot of people. And who’s actually using MTurk?

00:01:01.440 Okay, a few people. Come on in!

00:01:06.990 So yeah, I briefly ran a company that built custom APIs off of Turk to clean data, but it failed miserably.

00:01:12.299 However, I found it to be an interesting enough experience that I wanted to create a talk about it.

00:01:18.360 The gist is that you can integrate with MTurk, which provides a scalable, 24/7, always-available workforce.

00:01:24.479 Though there is some controversy around Mechanical Turk, which we’ll get to.

00:01:31.439 First, I want to express my gratitude to Rails.

00:01:36.750 I owe everything in my career to you all, not just any of you personally, but to the Rails community at large.

00:01:42.270 I’m so thankful to be a part of this community. It’s an honor to speak at RailsConf for the first time.

00:01:49.799 Being a programmer has given me the financial freedom to live an unusual life.

00:01:57.479 More importantly, it has given me the confidence to pursue unusual pursuits.

00:02:03.869 I believe that's what being a Rails developer taught me—that anything can be accomplished.

00:02:10.080 So what do I owe you? I think I owe you something unusual.

00:02:16.860 You are spending your valuable time with me in this room.

00:02:24.660 After I finish this, you probably won't leave, although feel free to leave if you want.

00:02:30.810 Maybe this talk isn’t really about Mechanical Turk yet; it still is a talk about Mechanical Turk.

00:02:38.010 Maybe it’s not a talk about crowdsourcing, though it kind of is.

00:02:44.280 Or maybe it is just a talk about being an impostor.

00:02:49.470 I know impostor syndrome has been a topic at RailsConf so far.

00:02:54.860 I struggle with it. Some days, I don’t feel like a good enough developer.

00:03:00.840 I often feel like a bad Rails developer who doesn’t deserve to share this stage.

00:03:06.330 I tried to print my company logo on a t-shirt, but I didn’t wear it.

00:03:12.510 But I think we’re in good company.

00:03:19.380 According to Wikipedia, many people suffer from impostor syndrome.

00:03:26.579 The talks about it often explore whether it’s bad or good.

00:03:32.070 I think it’s good.

00:03:38.010 There’s a pretty cool article I checked out that explains why impostor syndrome is beneficial.

00:03:44.340 It states that if you’re interested in personal growth, you’ll continuously push yourself into new experiences.

00:03:52.109 When we are in unfamiliar territory, we often feel less comfortable than when performing familiar tasks.

00:03:58.709 It's about growth, and I believe you need to fake it.

00:04:04.019 What does that mean?

00:04:09.329 The article suggests it’s about the impostor experience, not impostor syndrome.

00:04:15.150 This is something we should expect in our lives as we push our limits.

00:04:22.169 It should be embraced, and we should consider our personal responses to it.

00:04:28.500 We need to realize that we belong, and our successes are not accidents.

00:04:35.580 So, it’s not just about Mechanical Turk. It’s about learning to fake it.

00:04:41.670 We’re going to build a human-powered API.

00:04:47.430 At the end of the day, it’s really just an API, and Mechanical Turk is, at its core, about faking.

00:04:53.040 Now let’s talk about Turk.

00:04:59.970 How did it start? It followed the path of a few other AWS products.

00:05:06.600 It began as an internal tool and then became open to the world.

00:05:13.590 Basically, it’s a marketplace for online micro-jobs where requesters can post tasks at different price points, and workers can complete them.

00:05:22.260 Does anyone know how Mechanical Turk got its name?

00:05:29.729 It is named after the Turk, an 18th-century chess-playing robot.

00:05:35.220 Of course, this was not an actual robot. Inside that Turk, there was a person controlling it.

00:05:41.670 So, it was a hoax! I told you this was about faking from the beginning.

00:05:48.600 Now, what can we use MTurk for? There are four main use cases.

00:05:55.830 These can be organized into four buckets: image and video processing, data verification and cleanup, information gathering, and data processing.

00:06:01.290 When I was using Turk, I did a lot of analyzing or dividing up video data.

00:06:06.600 I also conducted data validation for leads and provided business information.

00:06:13.590 For instance, I would give a worker the address of a salon and ask them to verify if the salon offers a specific service.

00:06:22.260 Workers would also figure out the cost for that particular service.

00:06:29.100 Turk is widely used nowadays for a lot of random tasks.

00:06:35.220 It is cited in hundreds of academic journals, and it has been used to analyze satellite data where human input is needed.

00:06:43.020 Additionally, it is often used for training machine learning algorithms.

00:06:50.180 Interestingly, much of the work done on Turk is also something that can be accomplished using AI.

00:06:55.790 So, in some ways, it can be considered a regressive technology.

00:07:02.520 Let’s discuss how someone could potentially make money as a Turk worker.

00:07:07.150 Here’s a rough estimate of the Turk environment.

00:07:12.240 1,500 groups of HITs, which means human intelligence tasks, equals about 300,000 individual assignments.

00:07:20.000 Each broad-stroke question might have an accompanying specific assignment.

00:07:27.250 An example could be: 'Does this salon offer microdermabrasion?'—the individual assignment being a specific salon.

00:07:34.400 The most HITs in a single group was 15,000, which means one person requested input on 15,000 salons.

00:07:41.670 The lowest reward given for completing a task was a penny, which is frankly crazy.

00:07:46.800 The highest reward was $150 for an eligible worker to transcribe two hours of audio.

00:07:53.040 I took some screenshots of work I saw on Turk.

00:07:59.970 For example, one task involved tracking fingerspelling.

00:08:05.160 They used a widget to start and stop a timer when fingerspelling occurred.

00:08:11.100 I thought this was quite interesting.

00:08:17.250 I also came across some humorous spam tasks, like a request to sign up with a Robinhood referral link.

00:08:22.080 Another task involved identifying fashion items in images.

00:08:27.130 One common task is extracting data from shopping receipts.

00:08:34.350 If you can develop a method for doing this using AI, it could be a lucrative opportunity.

00:08:41.160 Let’s consider a use case for today.

00:08:47.820 I spent considerable time contemplating what we should do for this talk.

00:08:54.090 I wanted it to be challenging enough to intrigue you, hoping it wouldn't be oversimplified.

00:09:03.200 I decided on a social media content scraper since we are in Pittsburgh.

00:09:09.800 Pittsburgh is famously known for having French fries on their sandwiches.

00:09:15.980 Quick show of hands: who is against this practice?

00:09:21.870 Only one person? That’s cool!

00:09:26.450 Everyone else seems to be okay with it. Awesome.

00:09:31.070 I chose this topic because it might be more fun than identifying different types of bridges.

00:09:37.160 Let's walk through the process of developing this without Ruby first, nor Rails.

00:09:44.130 We'll assemble some sample data using Instagram posts and create a new Mechanical Turk project.

00:09:50.890 We'll load up a batch and review the results.

00:09:57.300 So, we started by using the Turk GUI.

00:10:02.760 The title is: 'Look at a picture of a delicious sandwich and determine if there are French fries in it.'

00:10:09.180 Next, we set various properties, including how much to pay per assignment.

00:10:16.560 We also decided the number of assignments per HIT, which means we want each sandwich to go to two different people.

00:10:24.060 This way, their results would be validated and corroborated for accuracy.

00:10:30.470 We also allocated some time for completion.

00:10:36.530 We decided to use a GUI and chose their categorization template.

00:10:42.830 If any of you decide to test this out, I recommend using several templates available on Turk for ease.

00:10:50.210 The layout included explicit instructions like: 'Do not count if fries are on the side; we only care if they are in the sandwich.'

00:10:57.880 I also instructed workers to pay extra caution if the sandwich was cut in half.

00:11:04.000 We crafted the template with embedded Instagram posts.

00:11:10.330 After uploading a CSV with 21 sandwiches and offering 15 cents per task, I loaded everything in.

00:11:17.960 Turk does take a fee on top of that, but this was a minimal cost.

00:11:25.320 I then played the new Carly Rae Jepsen album for about 23 minutes—highly recommend it!

00:11:31.060 After those 23 minutes, my tasks were complete.

00:11:36.980 The output encompassed different columns, including HIT ID and assignment ID.

00:11:43.300 Each sandwich corresponds to an assignment ID for its individual task.

00:11:49.670 The worker ID indicates which worker completed the task and how long they took.

00:11:56.200 We also saw the input and the answer for each task.

00:12:02.450 As for time figures, they ranged from four to seven hundred seconds.

00:12:09.960 The median time taken was 42 seconds.

00:12:16.670 Interestingly, many workers completed multiple assignments.

00:12:23.290 Those who found simple tasks often cranked through several.

00:12:28.800 The maximum number completed by one worker was 21, as they wouldn’t have been able to do two of the same task.

00:12:35.200 We achieved great consensus on the results, where both workers agreed on whether there were or weren't fries in the sandwich.

00:12:41.830 We also measured accuracy, with a few edge cases I'm going to show you later.

00:12:48.290 These cases correctly identified sandwiches without fries.

00:12:56.200 A sandwich that apparently had no fries did slip through the cracks.

00:13:04.400 There were, indeed, some fries in the sandwich after a closer look.

00:13:11.000 Now, some tips and tricks for getting accurate results on Turk will be shared at the end.

00:13:17.400 We’re software developers; we’re not here to just upload a CSV file.

00:13:24.040 Let’s automate this process.

00:13:30.320 Let’s assume we already have a scraper for Twitter and Instagram.

00:13:36.720 We would push any posts with hashtag sandwich or hashtag Pittsburgh to our API.

00:13:43.700 Then, another application will process the Turk and post to another API for reading.

00:13:50.500 Now, how are we going to approach this?

00:13:59.160 First, we’ll create the Ruby on Rails service for MTurk.

00:14:07.100 We'll create processes for loading the task, approving results, and re-inputting tasks as needed.

00:14:13.560 And we will also serve our results via API.

00:14:21.580 I want to give credit to the two gems I used to build this project, which were really helpful.

00:14:36.170 The first is Turkey by Jim Jones—it’s built on top of our Turk.

00:14:44.200 It simplifies a lot of database models and makes creating forms easy.

00:14:51.550 The second gem is 'our Turk' by Ryan Pate, which is a simpler Mechanical Turk Ruby layer.

00:14:57.750 These are not optimized for Rails 5; my old app was on Rails 4.

00:15:03.000 Our Turk was built on top of an Amazon gem that has since been deprecated.

00:15:09.530 But they're still really great for this project.

00:15:14.500 So here’s our basic data model.

00:15:21.900 A batch is an overall task, like determining if a sandwich has fries.

00:15:28.500 Output field names will include categories and selectable options.

00:15:35.050 Each sandwich would be a batch item, and each one would have a result.

00:15:44.280 This part is a bit condensed, but we’ll create the batch with its title, description, and instructions.

00:15:52.050 We’ll specify the output field name and options, then input our post IDs.

00:15:59.400 Next, we’ll bring in Turkey, which integrates seamlessly.

00:16:05.930 The first step creates a basic configuration file with AWS credentials.

00:16:12.320 The second thing it does is create the database models.

00:16:18.340 The first model is a turkey task, which corresponds to each task put into Turk.

00:16:25.420 We call each of the assignments imported from Turk an imported assignment.

00:16:31.510 This model correlates to the results we specify in the batch item.

00:16:38.800 It's important to understand that the turkey imported assignment does not directly store result data.

00:16:45.700 It instead connects to the results within our batch item.

00:16:53.000 Let’s move on to launching the batch.

00:16:59.600 We're going to set some fairly simple variables to be sent to Turk.

00:17:05.450 One important requirement is specifying the model that will be created based on input data from the form.

00:17:12.360 We’ll set the number of assignments, only allowing workers with an approval rating greater than 95%.

00:17:19.850 Don’t forget to specify the form URL, embedding the ID into the batch item.

00:17:27.310 Turkey allows us to post forms to Turk and handles the arguments seamlessly.

00:17:34.330 We’ll need to import the results afterwards, as we won’t receive data directly into our server.

00:17:40.940 Let's take a look at the process of importing the result data.

00:17:49.530 We will create a turkey process that generates imported assignment records.

00:17:55.960 This process will also create our batch response records for handling the results.

00:18:03.500 The turkey imported assignment will include IDs correlating to Turk as well as the worker ID.

00:18:11.440 Additionally, it contains the task ID associated with our batch model.

00:18:17.810 The results are stored in a JSON hash format.

00:18:25.160 Let’s reference our original schema.

00:18:31.360 We began with batch, batch item, and result models.

00:18:37.960 Each batch item corresponds with a turkey task that inputs into Turk.

00:18:43.930 Each turkey task has multiple assignments, typically two or more.

00:18:52.410 Now, we also consider reprocessing and validating the output.

00:19:00.960 This means each batch item needs to have completed results.

00:19:07.830 We can send the results to our adjudicator model, confirming completion or reprocessing as necessary.

00:19:13.020 If the adjudicator approves, we’ll update the attributes to complete.

00:19:20.090 If there's disagreement, we can reprocess it in Turk for additional input.

00:19:26.060 The adjudicator model examines the results and makes decisions based on a histogram of outcomes.

00:19:32.360 If more than 50% of responses agree, we approve the result.

00:19:40.200 If not, we disapprove.

00:19:45.320 We also included a rake task to process everything.

00:19:50.740 This task imports hits from Turk and evaluates all incomplete batch items.

00:19:57.470 Then we determine whether to approve or reject the results.

00:20:05.780 We can establish a cron job to run this rake task every five minutes.

00:20:12.640 Thus, we’re ready to serve our batch items via an API!

00:20:18.660 The initial use case we discussed was efficient for our sandwich task.

00:20:23.430 However, we need to ensure extensibility for other potential use cases.

00:20:30.870 A crucial aspect of the Turk app I created was having multiple batch items in a single task.

00:20:36.670 For example, instead of analyzing one sandwich, we could analyze three.

00:20:43.360 This reinforces pricing and improves volume statistics.

00:20:51.440 Additionally, we can work on complex reprocessing flows, where certain data is reconfirmed.

00:20:57.240 For example, we might need name and address confirmed again after previously collecting phone number and email.

00:21:03.110 That’s a more sophisticated flow.

00:21:10.340 Currently, I want to focus on having different inputs and outputs for batch items.

00:21:17.750 We may want diverse outputs per model.

00:21:25.130 For example, while asking if fries are present, we could also inquire on how delicious the sandwich looked.

00:21:32.800 Having multiple input types is also key. We might gather names, emails, and website information along with photos.

00:21:39.180 Furthermore, we could have a numerical output, like counting fries!

00:21:46.970 We can determine the success of results based on diverse outputs.

00:21:54.460 For instance, if one worker claims there are nine fries, another says eight, and another claims seven.

00:22:00.780 We could likely accept eight fries as the result.

00:22:06.430 In terms of setup, we need to create input and output methods for the batch items.

00:22:12.480 For instance, an Instagram post will be our batch input, and its key will be its sandwich ID.

00:22:21.750 We then make sure to correlate this batch input to the corresponding task.

00:22:28.190 Next, the batch output may be framed around categories.

00:22:34.700 In this case, we specify yes/no labels with display settings.

00:22:41.220 We will also need adjudicator criteria for determining acceptance.

00:22:48.470 Think of cases where we want to confirm what we display based on input gathered.

00:22:55.070 This can involve identifying counts or categories.

00:23:01.860 I designed various different input formats and output formats for my Turk app.

00:23:09.560 Business listings were valuable for collecting addresses, emails, and other essential details.

00:23:16.600 We could also work with images, social media posts, and video.

00:23:24.000 Regarding output formats, we can work with text, numbers, and multi-select categories.

00:23:30.130 When handling multiple text outputs, it's crucial to clarify the logic used in processing results.

00:23:37.020 Next, let's discuss tips for achieving accuracy on Turk:

00:23:45.050 These are straightforward UX practices, such as providing clear instructions.

00:23:52.130 Ensure straightforward tasks; if someone needs to Google information, provide the URL.

00:23:59.870 One technique is to incorporate gold data into your tasks.

00:24:07.490 You can also screen workers using criteria for particular qualifications.

00:24:14.410 This includes approval rates and background tasks.

00:24:21.950 A key consideration involves setting prices to encourage task completion.

00:24:28.210 Remember that a market exists; the more tasks available, the better the completion rates.

00:24:35.240 Higher HIT counts lead to incentivization of workers seeking volume.

00:24:43.160 Now, let’s discuss the ethics of MTurk.

00:24:50.040 There are two notable articles highlighting its implications.

00:24:57.300 One studied the use of Turk and the prevalence of low wages.

00:25:05.500 The other is a letter-writing campaign requesting better wages.

00:25:12.230 Surprisingly, a large portion of Turk’s worker pool is made up of U.S. workers.

00:25:20.010 Yet, many are from countries with lower wages.

00:25:25.780 This raises the important question of fairness and exploitation.

00:25:32.310 Another issue stems from requester dishonesty and rejection of work.

00:25:39.950 This situation can lead to workers being unpaid for their efforts.

00:25:47.220 Another controversy involved Cambridge Analytica.

00:25:54.050 They used Facebook quizzes to collect data, many unaware they were paid to do so.

00:26:01.830 In total, 240,000 took those quizzes, and they were subsequently banned by Amazon.

00:26:09.890 It’s argued that MTurk played a role in influencing the election.

00:26:16.640 Thank you for your attention!

00:26:23.210 I hope you gained technical insights on Turk and crowdsourcing.

00:26:30.470 I enjoyed my experiences with Turk.

00:26:37.110 My goal was to coordinate people to process data more quickly.

00:26:43.670 Remember, what we do as developers is based on practice.

00:26:50.310 But it’s also about pushing our boundaries.

00:26:56.520 Thanks for listening!

00:27:02.800 By the way, I started a landing page service in New York.

00:27:09.110 We help high-growth companies create dynamic landing pages and run experiments.

00:27:16.240 I’m looking to hire Rails developers in New York or remotely.

00:27:23.590 Thank you for your time, and are there any questions?