Tutorial 2: New Frontiers in Statistical Machine Translation
Philipp Koehn
The tutorial reviews the current state of the art of statistical machine translation and points to the open questions in the field. The main thrusts of work in the field are at this point:
- developing linguistically more motivated models such as grammar-based translation models and handling of morphology;
- employing advances of machine learning to the hard structured predicition model (discriminative training, Bayesian models, decision rules);
- integrating machine translation into wider applications such as speech translation or translation tools for human translators.
The tutorial will also feature a short session that introduces the widely used Moses toolkit.
Biography
Philipp Koehn is a reader at the University of Edinburgh. He received his PhD from the University of Southern California, where he was a research assistant at the Information Sciences Institute (ISI) from 1997 to 2003. He was a postdoctoral research associate at the Massachusetts Institute of Technology (MIT) in 2004, and joined the University of Edinburgh as a lecturer in 2005. His research centers on statistical machine translation, but he has also worked on speech, text classification and information extraction.
Besides his research, his major contribution to the machine translation community are the preparation and release of the Europarl corpus, as well as the Pharaoh and Moses decoder --- all of which are widely used. The statistical machine translation that was developed under his leadership over the last years is one of the top performers in recent DARPA, IWSLT and WMT competitions. He has been organising a series of workshops on statistical machine translation at the ACL conferences with a shared task concerning the translation between European languages. He is president of the ACL Special Interest Group on Machine Translation and author of the textbook on Statistical Machine Translation. His research is funded by DARPA (GALE project) and the European Commission (EuroMatrix, EuroMatrixPlus, and LetsMT projects).