
Practice of Efficient Data Collection via Crowdsourcing at Large-Scale
课程网址: http://videolectures.net/kdd2019_drusta_fedorova_zerminova/  
主讲教师: Alexey Drutsa
开课单位: 扬德克斯
开课时间: 2020-03-02
课程语种: 英语
在本教程中,我们向您介绍了Yandex的领先研究人员和工程师通过众包共享的高效数据标记方面的部分独特工业实践经验。大多数ML项目都需要培训数据,通常这些数据只能通过人类标记获得。此外,人工智能的应用越多,收集人类标记数据的任务就越多。大规模生产此类数据需要建立技术管道,其中包括解决与质量控制相关的问题以及在工人之间智能分配任务。 我们将通过公共众包市场介绍数据标签,并介绍高效标签收集的关键组成部分。接下来将是一个实践课程,参与者将选择一个真实的标签收集任务,尝试选择标签过程的设置,并在Yandex启动他们的标签收集项目。Toloka是最大的众包市场之一。这些项目将在指导课程中在真正的人群中运行。最后,参与者将收到关于其项目的反馈和实用建议,以提高效率。我们邀请初学者、高级专家和研究人员学习如何收集高质量的标签数据并有效地进行。
课程简介: In this tutorial, we present you a portion of unique industrial practical experience on efficient data labeling via crowdsourcing shared by both leading researchers and engineers from Yandex. Majority of ML projects require training data, and often this data can only be obtained by human labelling. Moreover, the more applications of AI appear, the more nontrivial tasks for collecting human labelled data arise. Production of such data in a large-scale requires construction of a technological pipeline, what includes solving issues related to quality control and smart distribution of tasks between workers. We will make an introduction to data labeling via public crowdsourcing marketplaces and will present key components of efficient label collection. This will be followed by a practical session, where participants will choose one of real label collection tasks, experiment with selecting settings for the labelling process, and launch their label collection project at Yandex.Toloka, one of the largest crowdsourcing marketplace. The projects will be run on real crowds within the tutorial session. Finally, participants will receive a feedback about their projects and practical advices to make them more efficient. We invite beginners, advanced specialists, and researchers to learn how to collect labelled data with good quality and do it efficiently.
关 键 词: 通过大规模众包; 进行高效数据收集; 数据收集的实践
课程来源: 视频讲座网
数据采集: 2022-09-14:cyh
最后编审: 2022-09-19:cyh
阅读次数: 34