Search

Intelligent crowdsourcing for the age of Big Data

by Dr. Reynold Cheng, Computer Science
Dec 29, 2015

In the “Era of Big Data”, an extremely large amount of information is created every day, revolutionizing science and technology, governments, economy, and international development. One important source of the Big Data is human users, who provide various information (e.g., microblogs, movie comments, and photo tags) through Internet browsers and mobile devices. Crowdsourcing systems, such as Amazon Mechanical Turk (AMT), CrowdFlower, and Wikipedia, has been recently proposed to leverage the intelligence of Internet users. A requester submits to these systems a large number of “Human Intelligence Tasks” (HITs), which are hard for a computer but relatively easier for a human. Typical HITs include labeling images and translating sentences. Internet users, or workers, can then perform these HITs, in order to get rewards (e.g., money) from the requester.  Figure 1 shows the i-Tag, our group’s recently-developed crowdsourcing system that allows workers to give labels to photos.

 

Figure 1: i-Tag: A photo tagging system.

 

 

Our research group has examined several fundamental problems related to crowdsourcing.

 

·  Task Assignment: Given a set of HITs, which HIT should be assigned to a worker?  To solve this problem, we have studied online strategies that enable optimal task assignment to be performed in linear time [2]. We implemented this algorithm in the QASCA system (Figure 2), which enables requesters to put their HITs to the system, and get back the results from HITs. It is also connected to other crowdsourcing platforms, such as AMT.  In [3], we study how to determine the number of workers (called plurality) to perform a HIT, in order to obtain the optimal overall answer quality. We propose dynamic programming solutions to solve this problem. 

·  Social Tagging: We examine the task assignment problem in social tagging application (e.g., Delicious and Flickr) [4], where we study the impact of different assignment strategies to tagging quality. We also developed the i-Tag system, whose architecture is shown in Figure 3. We presented a demo for this system in [5], and its screenshot is shown in Figure 1.

 

 

Figure 2: The QASCA system architecture.

 

Figure 3: The i-Tag system architecture.

 

· Worker Selection: Given a monetary budget and a set of workers, how should workers be chosen, such that a HIT can be accomplished successfully and economically? We examine how to select the optimal set of workers, so that tasks can be accomplished with the highest expected quality under a limited budget. We developed a system called OptJS (Optimal Jury Selection System) (Figure 4). The requester can use our budget-quality table and decide the appropriate workers to be employed, based on their performance and cost.


Figure 4: OptJS

 

 

Due to the popularity of the Internet, workers from different parts of the world have participated in accomplishing HITs. In August 2012, it was officially reported by AMT that more than 500,000 workers from 190 countries had worked on HITs. These “crowdsourced” data can be used to support a variety of applications, such as product recommendation and image search. We have studied several key problems and developed crowdsourcing systems. We believe that crowdsourcing will continue to attract interest among the research and industry communities.  For more details about our work, please refer to our project homepage:

·         QASCA:  https://i.cs.hku.hk/~ydzheng2/QASCA/

·         OptJS: https://i.cs.hku.hk/~ydzheng2/Jury/

 

Research postgraduate students supervised by Dr. Reynold Cheng. From left: Ms. Caihua Shan, Mr. Siqiang Luo, Mr. Yudian Zheng, Dr. Reynold Cheng, Mr. Yixiang Fang, Mr. Jiafeng Hu, Mr. Haiqi Sun, Mr. Zhipeng Huang

References:
[1] Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng: QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications. SIGMOD 2015:1031-1046
[2] Yudian Zheng, Reynold Cheng, Silviu Maniu, Luyi Mo: On Optimality of Jury Selection in Crowdsourcing. EDBT 2015:193-204
[3] Luyi Mo, Reynold Cheng, Ben Kao, Xuan S. Yang, Chenghui Ren, Siyu Lei, David W. Cheung, Eric Lo: Optimizing plurality for human intelligence tasks. CIKM 2013:1929-1938
[4] Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung: On incentive-based tagging. ICDE 2013:685-696
[5] Siyu Lei, Xuan S. Yang, Luyi Mo, Silviu Maniu, Reynold Cheng: iTag: Incentive-based tagging. ICDE 2014:1186-1189