
[ Overview | Participants | Publications ]
Active learning describes the protocol where the learning algorithm is seeded with a small set of labeled training data and maintains the ability to view additional unlabeled training data and request a limited amount of labels from this data in an effort achieve maximum performance with minimal labeled data. There have been many empirical studies and recent theoretical work that demonstrate a significant reduction in labeled data requirements when using the active learning protocol.
In addition to some theoretical work using the machinery of coresets, our active learning work looks to extend previous results to more complex settings, emphasizing natural language processing problems. In this vein, we have examined active learning in structured output spaces (showing good empirical results for semantic role labeling) and active learning for pipeline models (showing good empirical results for relation extraction). We have also done some work on active sample selection for named entity transliteration.
Our present research direction is to look for other ways to interact with a domain expert, including inquiring for additional information about features and structural information.