Overview:
Information extraction (IE) is the task of extracting functional information from machine-readable documents. Machine learning approaches to IE have demonstrated superior performance and are now the dominant approach. We have developed machine learning and inference technique both for frame-like information extraction, that attempts to map free from text to database with a given schema, and for the extraction of specific relations and entities from free form text.
Information Extraction aims at supporting useful information from free form text, so that it can be accessed later on as if the data is stored in a database.
Typical IE tasks include:
- Extracting from free-text job postings the job description,
requirements, location and salary. The extracted fields can be stored
in a structured database and accessed with SQL queries.
- Extracting from Craigslist.com postings the number of bedrooms,
bathrooms, the price, the availability, and the location of a
real-estate property.
- Extracting from email seminar announcements the speaker name, organizational affiliation, talk title, date, time, and location, and then putting this data
in to one's calendar.
- Identifying events such as political assassination and extracting relevant information such as victim, perpetrators, location, causes etc. from the text. This can be used as input for quantitative social science research.
We have developed several methods for IE, focusing on accurate information extraction while requiring minimal annotation effort.
Relevant Publications:
- L. Ratinov and D. Roth, Design Challenges and Misconceptions in Named Entity Recognition. Proc. of the Annual Conference on Computational Natural Language Learning (CoNLL) (2009)
- M. Chang, L. Ratinov, and D. Roth, Constraints as Prior Knowledge. ICML Workshop on Prior Knowledge for Text and Language Processing (2008) pp. 32-39
- V. Punyakanok, D. Roth, and W. Yih, The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics (2008)
- M. Chang, L. Ratinov, and D. Roth, Guiding Semi-Supervision with Constraint-Driven Learning. Proc. of the Annual Meeting of the ACL (2007) pp. 280--287
- S. Har-Peled, D. Roth, and D. Zimak, Maximum Margin Coresets for Active and Noise Tolerant Learning. Proc. of the International Joint Conference on Artificial Intelligence (IJCAI) (2007) pp. LT
- V. Punyakanok, D. Roth, and W. Yih, The Necessity of Syntactic Parsing for Semantic Role Labeling. Proc. of the International Joint Conference on Artificial Intelligence (IJCAI) (2005) pp. 1117--1123
- V. Punyakanok and D. Roth, Inference with Classifiers: The Phrase Identification Problem. Computational Linguistics (2005)
- D. Roth and W. Yih, A Linear Programming Formulation for Global Inference in Natural Language Tasks. Proc. of the Annual Conference on Computational Natural Language Learning (CoNLL) (2004) pp. 1--8
- D. Roth and W. Yih, A Linear Programming Formulation for Global Inference in Natural Language Tasks. Proceedings of AI & Math (2004) pp. 1--8
- D. Roth and W. Yih, Probabilistic Reasoning for Entity and Relation Recognition. Proc. the International Conference on Computational Linguistics (COLING) (2002) pp. 835--841
- X. Carreras, L. M`arquez, V. Punyakanok, and D. Roth, Learning and Inference for Clause Identification. Proc. of the European Conference on Machine Learning (ECML) (2002) pp. 35--47
- V. Punyakanok and D. Roth, The Use of Classifiers in Sequential Inference. The Conference on Advances in Neural Information Processing Systems (NIPS) (2001) pp. 995--1001
- D. Roth and W. Yih, Relational Learning via Propositional Algorithms: An Information Extraction Case Study. Proc. of the International Joint Conference on Artificial Intelligence (IJCAI) (2001) pp. 1257--1263
- D. Roth, Reasoning with Classifiers. Proc. of the European Conference on Machine Learning (ECML) (2001) pp. 506--510
- X. Li and D. Roth, Exploring Evidence for Shallow Parsing. Proc. of the Annual Conference on Computational Natural Language Learning (CoNLL) (2001) pp. 107--110
- D. Roth and W. Yih, Relational Learning via Propositional Algorithms: An Information Extraction Case Study. (2001)