Publications by Braden Hancock

Publications by authors named "Braden Hancock"

Page 1 of 1

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale.

Stephen H Bach Daniel Rodriguez Yintao Liu Chong Luo Haidong Shao Braden Hancock

Proc ACM SIGMOD Int Conf Manag Data

January 2019

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution.

View Article and Find Full Text PDF

Training Complex Models with Multi-Task Weak Supervision.

Alexander Ratner Braden Hancock Jared Dunnmon Frederic Sala Shreyash Pandey

Proc AAAI Conf Artif Intell

January 2019

As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity.

View Article and Find Full Text PDF

A machine-compiled database of genome-wide association studies.

Volodymyr Kuleshov Jialin Ding Christopher Vo Braden Hancock Alexander Ratner

Nat Commun

July 2019

Tens of thousands of genotype-phenotype associations have been discovered to date, yet not all of them are easily accessible to scientists. Here, we describe GWASkb, a machine-compiled knowledge base of genetic associations collected from the scientific literature using automated information extraction algorithms. Our information extraction system helps curators by automatically collecting over 6,000 associations from open-access publications with an estimated recall of 60-80% and with an estimated precision of 78-94% (measured relative to existing manually curated knowledge bases).

View Article and Find Full Text PDF

Training Classifiers with Natural Language Explanations.

Braden Hancock Martin Bringmann Paroma Varma Percy Liang Stephanie Wang

Proc Conf Assoc Comput Linguist Meet

July 2018

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier.

View Article and Find Full Text PDF

Snorkel MeTaL: Weak Supervision for Multi-Task Learning.

Alex Ratner Braden Hancock Jared Dunnmon Roger Goldman Christopher Ré

Proc Second Workshop Data Manag End End Mach Learn (2018)

June 2018

Many real-world machine learning problems are challenging to tackle for two reasons: (i) they involve multiple sub-tasks at different levels of granularity; and (ii) they require large volumes of labeled training data. We propose Snorkel MeTaL, an end-to-end system for multi-task learning that leverages supervision provided at by domain expert users. In MeTaL, a user specifies a problem consisting of multiple, hierarchically-related sub-tasks-for example, classifying a document at multiple levels of granularity-and then provides for each sub-task as weak supervision.

View Article and Find Full Text PDF

Fonduer: Knowledge Base Construction from Richly Formatted Data.

Sen Wu Luke Hsiao Xiao Cheng Braden Hancock Theodoros Rekatsinas

Proc ACM SIGMOD Int Conf Manag Data

June 2018

We focus on knowledge base construction (KBC) from richly formatted data. In contrast to KBC from text or tabular data, KBC from richly formatted data aims to extract relations conveyed jointly via textual, structural, tabular, and visual expressions. We introduce Fonduer, a machine-learning-based KBC system for richly formatted data.

View Article and Find Full Text PDF