Large Scale Labelling Methodology

Yoav Kantor


Machine learning is the science of getting computers to act without being explicitly programmed. While traditional software solutions rely on sets of rules for executing an algorithm in order to solve a problem, machine learning based systems rely on given examples, aka labeled data, from which the algorithm "learns" how to correctly identify new instances. The quality and quantity of the labeled data significantly effects the precision and recall of the resulting solution. Collecting large amounts of high quality labeled data is well known challenge that involves various questions. E.g. what labeled data is required for solving a specific task? How can we measure the quality of the generated labeled data? How can we measure the quality of the labeling output of a specific labeler? In this lecture we will understand the importance of high quality labeled data, compare two main labeling data procedures (exhaustive and retrospective labeling), and discuss the unique issues that arise when outsourcing large scale labeling tasks to the crowd, e.g. on an outsourcing platform such as Crowdflower.


Photo of Yoav Kantor

Yoav Kantor finished his M.Sc. degree at the Technion in 2013 and works at IBM Research - Haifa lab since.

He is part of the Debating Technologies team, that develops a Machine- Learning based system that given a controversial topic can automatically generate relevant persuasive arguments by scanning massive text corpora. He will be happy to share his experience and insights gained via implementing an automatic large scale labeling mechanism, which is used on a daily basis by the project team.

Lecture languages



AI / Automation

Duration options

1 hour

Travel/delivery options

In-countryOutside of country: Open for discussionRemote via video conference



Lecture booking request

Thank you for your interest in hosting an IBM speaker. Please fill out the following form with as much detail as possible. An IBM representative will reach out to discuss your booking request. All guest lectures are subject to availability and agreements under this collaboration are not legally binding.