Cyborg Method

What is the Cyborg Method for crowdsourced data?

The M Turk Cyborg Method is a strategy to improve the quality of data collected on Mechanical Turk (M Turk). 

The Cyborg Method proposes that high quality data can be collected from M Turk when using 

Using only one of these methods will increase the quality of your data, but it will miss a subset of "bad actors" and ultimately lower the quality of your data. Using both methods significantly increases the quality of your data and protects your budget. That is, you will only pay for responses that are of high quality. 

Pre-Print Available

The article is currently under review for publication. We recommend waiting until a peer-reviewed publication is available before citing this work as it is likely to be refined by the peer-review process. We believe the methodology described below, however, is valid and helpful to those interested in using this approach. 

A pre-print of the initial study that demonstrates the utility of the Cyborg Method can be found here: 

why you should use the cyborg method

We conducted a study to determine if the Cyborg method really improved data quality. In this study, we collected two samples using the same methods. In the first sample, Sample 1, we collected all of the data needed for the cyborg method, but we didn't enforce it. That is, we let everyone complete the study. Most importantly, we paid everyone who got to the end. In the second sample, Sample 2, we stopped people who the cyborg method identified as invalid. We were able to not pay these invalid participants.

Here is a flow chart of how recruitment and how the economics of the study worked out:

The big take away is the cost per participant. We budgeted $4.80 per participant. 

When using the Cyborg method, we paid $4.86 per participant. The extra $0.06 is due 32 participants who provided valid responses, but did not meet the inclusion criteria of endorsing a criterion A traumatic event. 

When we did not use the Cyborg method, we paid $19.53 per participant. That is an extra $14.73 per participant! 

Furthermore, we compared scores on a measure of PTSD (PCL-5) and depression (PHQ-9) between those who were valid and those who were not valid. Those who were valid were significantly lower than those who were invalid. 

Here is a picture of these findings

Figure Below: Comparisons of Scores on the PCL-5 and PHQ-8 across the different categories of validation. Panel A: PCL-5 scores. Panel B: PHQ-9 Scores, Panel C: Number of Trauma Types Endorsed on the Life Events Checklist. Cyborg N = 232.  IP Evaluation N = 167. Short Answer N = 256. Invalid N = 324. *** = p < .001.  ** = p < .01.  

Figure Below: Comparisons of Scores on the PCL-5 and PHQ-8 across the Cyborg and Attention Check Methods. Panel A: PCL-5 scores. Cyborg & Attention N = 235.  Cyborg Only N = 18. Attention Only N = 655. Invalid N = 235. Panel B: PHQ-9 Scores, Panel C: Number of Trauma Types Endorsed on the Life Events Checklist. *** = p < .001.  * = p < .05.

We think that the reason for these comparisons is how valid and invalid participants respond. Valid participants are answering the measures honestly. In a community sample, we would expect to see most with no or few symptoms, some with mild symptoms, fewer with moderate, and fewer still with severe symptoms. This would result in a positively skewed distribution like we see above in blue. The shape of this distribution makes us feel confident that these responses are valid. 

But what about the invalid participants? We hypothesize that they are picking responses to question at random. So every participant will have some items scored higher, some scored lower, and some in the middle. This pattern will result in the majority of participants having a score that is close to the central point on the range of scores for the item (40 for the PCL-5 and 13 on the PHQ-9). If we look at the bars for the invalid responses above for the PCL-5, you can see the peak of the distribution is close to 40. And for the PHQ-8, the red bar is around 13. 

This method is also replicable. Here are the distributions of two samples collected at separate times using this method. This is the overlap of scores on the PCL-5, PHQ-8, and LEC trauma types for both samples. The high degree of overlap suggests that this method should consistently result in a valid sample. 

So with all that - we hope that you will consider using this method! 

Here is how to set it up! 

Setting up cyborg method

Automated Evaluation

Integrate an Automated IP Check 

We have found IPHub.Info and IPQualityscore to perform well. 

These services will determine if a user is suspected of using a VPN/VPS or BOT. 

They can be integrated into Qualtrics via their APIs.

In order to use these services, you will need to create an account with them and get your unique API key. At the time of this writing, accounts were free to create and had a limited number of IP checks per day. However, we found their free accounts to be sufficient for a modest research study. 

Step 1

We recommend creating using a text "question" for participants that warns them you are going to evaluate their IP address. 

This will allow them to turn off any VPN services they might be using.

Step 2

Next, navigate to survey flow. Find your warning question and after your warning question, add a new element and select web service

Step 3 (iPHub)

In the URL box enter :${loc://IPAddress}

For Method use GET

Add Custom Header to Send to Web Service

Then the header to the web service is called X-Key

Set a Value Now is your API key. 

Then, set your Embedded Data of IP_Block = block and IP_Country = countryName

It should look like the following:

Step 3 (IPQualityScore)

For IPQualityscore

In the URL box enter :${loc://IPAddress

For Method use GET

Then you must set the embedded data, the information you want Qualtrics to pull from IPQualityScore. This service has a number of  elements, but we have found that the best indicator was Fraud Score. 

It should look like the following: 

Step 4 

Finally, add branching logic below the IP Check to block participants who fail the check from progressing

Participants that meet these criteria should be displayed a message that tells them they are blocked and then have the survey ended.

Human Evaluation

Include a self-report measure that requires a written response. We have found the Life Events Checklist - Part 2 to be an excellent measure. It requires participants to describe a traumatic event that is relevant to them. 

Upon completion of the survey, but prior to providing compensation, review written response and determine if it is a valid response. 

Unique Key for Compensation

At the conclusion of the survey, generate a unique key that the participant will enter into M Turk. This allows you to know which individuals should receive compensation in M Turk. 

We use a combination of four random words generated by random word generator. The service we use is Wordnik. Wordnik has an API that can be integrated into Qualtrics to generate a per-participant-key of several words. 

Here is an example of how to integrate this using a webservice in Qualtrics.  This will create a key of 4 words (0.word - 3.word). More or less words can be added via embedded data.