Springer-Verlag Berlin Heidelberg, 2013. — 333 p. — ISBN 978-3-642-20127-1, ISBN 978-3-642-20128-8 (eBook)
This book came from the experience of a series of annual BUCC workshops. The first workshop of this kind was held in 2008 at LREC in Marrakech organised by Pierre Zweigenbaum, Éric Gaussier and Pascale Fung. Since then, the workshops changed the continents (Singapore in 2009, Malta in 2010, Portland, Oregon, 2011, Istanbul 2012); the organising committee included Reinhard Rapp, Serge Sharoff and Marko Tadic, but its main topic remained the same, focusing on the need to use comparable corpora as training data for linguistic research and NLP applications. The chapters for this volume were collected mostly from the best submissions to the workshops at the end of 2011 or through specific requests to the most prominent authors in this field. After completing the editorial process the collection of chapters is presented to your attention.
The volume starts with a chapter overviewing the state of the art. It discusses the rationale behind the use of comparable corpora, as well the issues involved in their collection, annotation and use. The rest of the volume consists of two parts. Part I is devoted to methods of compiling comparable corpora and measuring the degree of comparability between their documents. Part II is on applications which use comparable corpora in various contexts such as Machine Translation or computer-assisted human translation.