An overview of data programming paradigm: easing the bottleneck in supervised machine learning

Time: 09:00 to  11:00 Ngày 20/06/2019

Venue/Location: C2-714, VIASM

Speaker: TS. Trần Việt Trung, Viện Công nghệ thông tin và truyền thông, Đại học Bách Khoa Hà Nội

Content:

Labelled training data is increasingly the key development bottleneck in machine learning systems. Deep learning mechanisms obviate feature engineering, what used to be the most time-consuming development task, but they have a major upfront cost: these methods need millions examples of training sets to reach peak performance. In this talk, we discuss the new data programming paradigm that leverages weak supervision approaches, where noisier or higher-level supervision from domain expertise such as external knowledge bases, patterns or rules, are used in order to programmatically generating training data. We demonstrate our very first experiments that were conducted on Vietnamese law data and achieved promising results.