Delmonte R. Computational Linguistic Text Processing: Lexicon, Grammar, Parsing, and Anaphora Resolution

Файл формата djvu
размером 3,27 МБ

Добавлен пользователем Shushimora 24.10.2013 01:38
Описание отредактировано 24.10.2013 09:42

Delmonte R. Computational Linguistic Text Processing: Lexicon, Grammar, Parsing, and Anaphora Resolution

Издательство Nova Science, 2008, -401 pp.

Getaruns (General Text And Reference UNderstander System) is a system for text understanding. The aim of the system is to build a model world where relations and entities introduced and referred to in the text are asserted, searched for and ranked according to their relevance. In addition to that, the system is able to generate text, in the form of answers to queries, and in the form of short paraphrases or summaries of the input text(s). In some cases, it can also generate stories and Questions and Answers randomly from a plan and a Discourse Model.
GETARUNS is a general multilingual text and reference understander which represents a linguistically based approach to text understanding and embodies a number of general strategies on how to implement linguistic principles in a running system. The system addresses one main issue: the need to restrict access to extralinguistic knowledge of the world by contextual reasoning, i.e. reasoning from linguistically available cues.
Another important issue addressed by the system is multilinguality. In GETARUNS the user may switch from one language to another by simply unloading the current lexicon and uploading the lexicon for the new language: at present Italian, German and English are implemented. Multilinguality has been implemented to support the theoretical linguistic subdivision of Universal Grammar into a Core and a Peripheral set of rules. The system is organized around another fundamental assumption: the architecture of such a system must be modular thus requiring a pipeline of sequential feeding processes of information, each module providing one chunk of knowledge, backtracking being barred at intermodular level and allowed only within each single module. The architecture of the system is organized in such a way as to allow feedback into the parser from Anaphoric Binding: however, when pronominals have been finally bound or left free no more changes are allowed on the f- structure output of the parser.
Thus we can think of the system as being subdivided into two main meta-modules or levels: Low Level System, containing all modules that operate at Sentence Level; High Level System, containing all the modules that operate at Discourse and Text Level by updating the Discourse Model.
The books are organized as an experimental exercise: they contain both theoretical background and the output of the system, GETARUNS that enacts and applies the theory. The architecture of the system is strictly related to the structure of the books. To better describe it, we decided to dedicate one book to the lower level part of the system and another book to the Vlll Rodolfo Delmonte higher level system components. In this way, each component or module is presented in at least one chapter of the book.
Thus, we can think of the book as being organized around two scientifically distinct but in fact strictly interrelated fields of research:
- sentence level linguistic phenomena
- text or discourse level linguistic phenomena
the former to be described by means of grammatical theories, the latter requiring the intervention of extralinguistic knowledge, i.e. knowledge of the world. This distinction is usually drawn for scientific purposes and is obviously an artificial one: the sentence being at the same time the smallest domain at which rigorous linguistic analysis can hopefully be applied; but also the basic complete semantic unit whereby meaning can be conveyed, depending on the text/discourse context. We are aware of the fact that this subdivision is mainly wrought out for scientific reasons and does not really imply that such a neat subdivision of tasks can be actually envisaged in real text processing. As shall be discussed in detail in the books, semantic issues need to be tackled already at the beginning. This notwithstanding, the separation has its own "raison d'etre" and we will try to validate it in the books.
Book 1 - or the current book - addresses sentence grammar or what is usually referred to as such by theoretical linguists. It does it by dividing up - somewhat ideally and sometimes arbitrarily - what must or needs to be computed at sentence level from what need not or cannot be computed at the same level, and consequently belongs to discourse grammar. In that sense, the subdivision is not totally an arbitrary one, even though overlappings are normal cases and will be discussed where needed.
The book also indirectly does another (un)intended subdivision: the one existing between syntax and semantics. Again, it would be impossible not to deal with semantically related issues when talking about syntax or the lexicon. However, semantics with uppercase S, is only treated in Book 2 - already published - where discourse and text level grammar is tackled.
So eventually, this book deals with lexicon, morphology, tagging, treebanks, parsing, quantifiers and anaphoric or pronominal binding. In other words, all that concerns the level of sentence grammar in a computational environment.
An important contribution the books make is the argument against the simplistic idea that texts are a "bag of words" or that they can be processed in a satisfactory way using treebanks derived statistical approaches. Not that treebanks are useless as sources of grammatical information: as will be discussed in a chapter of the book, this does not support the statement that all the grammar there is to learn is contained in a single treebank.
Additionally, it cannot be proven that statistics and "bag of words" approaches are useless for NLP tasks. On the contrary, in some cases they constitute the only appropriate and sensible approach - and more than one chapter will discuss at length the pros and cons. The question is just wrongly posed: statistics cannot be treated as a panacea for all problems raised by half a century of linguistic studies and represented by a(ny) text.
Sentence level parsing covers in our perspective all the issues tackled in this book. In this sense, it speaks against those approaches - the majority in nowadays computational linguistics - that reduce sentence level parsing to a phrase structure parenthesized representation problem, with word tags and constituency labels in the style proposed and made into a standard de facto by the Penn Treebank initiative. Nor can it be represented by Dependency Structure with or without grammatical relation labels.
Sentence level Grammar - as has been purported in linguistic theories - takes care of all grammatical and linguistic relations that belong to that level. Knowledge of the world and semantic disambiguation do not interfere with the rules of sentence grammar, and can be thought of as a separate level of computation, provided that the lexicon be structured in such a way to allow such a subdivision of tasks.

Inducing Fully Specified Lexical Representations
Treebanking: From Phrase Structure to Dependency Representation
Parsing 1: The Partial Parser for Arguments and Adjuncts
Parsing 2: Deep Linguistically-Based Parsing
Parsing 3: Deep Parser between Grammar and Structure
Anaphoric Binding
Quantifiers and Anaphora
Discourse Anaphora Resolution
Linguistic Information Extraction for Text Correction and Summarization

Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
Регистрация

Узнайте сколько стоит уникальная работа конкретно по Вашей теме:
Сколько стоит заказать работу?

Смотри также

Подробнее

Abney S. Semisupervised Learning for Computational Linguistics

Раздел: Искусственный интеллект → Компьютерная лингвистика

Издательство Chapman & Hall/CRC, 2008, -322 pp. The primary audience for this book is students, researchers, and developers in computational linguistics who are interested in applying or advancing our understanding of semisupervised learning methods for natural language processing. The problem of semisupervised learning arose almost immediately when computational linguists...

4,25 МБ
добавлен 30.10.2013 18:41
описание отредактировано 01.11.2013 22:11

Подробнее

Clark A., Fox C., Lappin S. (Eds.) The Handbook of Computational Linguistics and Natural Language Processing

Раздел: Искусственный интеллект → Компьютерная лингвистика

Wiley-Blackwell, 2010 — 800 p. ISBN-10: 1118347188 This comprehensive reference work provides an overview of the concepts, methodologies, and applications in computational linguistics and natural language processing (NLP). Features contributions by the top researchers in the field, reflecting the work that is driving the discipline forward Includes an introduction to the major...

2,99 МБ
дата добавления неизвестна
описание отредактировано 05.10.2010 15:17

Подробнее

Grishman R. Computational Linguistics. An Introduction

djvu

Раздел: Искусственный интеллект → Компьютерная лингвистика

Cambridge University Press, 1989. — 203 p. Natural language is an integral part of our lives. Language serves as the primary vehicle by which people communicate and record information. It has the potential for expressing an enormous range of ideas, and for conveying complex thoughts succinctly. Because it is so integral to our lives, however, we usually take its powers and...

1,50 МБ
добавлен 28.11.2011 12:50
описание отредактировано 04.02.2023 03:03

Подробнее

Hausser R. Foundations of Computational Linguistics

djvu

Раздел: Искусственный интеллект → Компьютерная лингвистика

2 edition, Springer, 2001 Основания компьютерной лингвистики. Общение человек-компьютер на естественном языке. Theory of Language Computational language analysis Technology and grammar Cognitive foundations of semantics Language communication Using language signs on suitable contexts Structure and functioning of signs Theory of Grammar Generative grammar Language hierarchies...

4,39 МБ
дата добавления неизвестна
описание отредактировано 07.06.2011 14:19

Подробнее

Jurafsky D., Martin J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Раздел: Распознавание образов → Распознавание голоса и речи

Second edition. — Pearson Prentice Hall, 2009. — 988 p. With contents and search through the book. Speech and Language Processing is a general textbook on natural language processing, with an excellent coverage of the area and an unusually broad scope of topics. It includes statistical and symbolic approaches to NLP, as well as the main methods of speech processing. Regular...

27,06 МБ
добавлен 11.03.2012 12:05
описание отредактировано 10.12.2023 17:09

Подробнее

Mitkov Ruslan (Editor). The Oxford Handbook of Computational Linguistics

Раздел: Искусственный интеллект → Компьютерная лингвистика

Oxford University Press, 2003. — 770 pages. — ISBN: 0-I9-823882-7. Computational Linguistics is an interdisciplinary field concerned with the processing of language by computers. Since machine translation began to emerge some fifty years ago, Computational Linguistics has grown and developed exponentially. It has expanded theoretically through the development of computational...

18,00 МБ
добавлен 16.03.2016 16:25
описание отредактировано 17.03.2016 17:10

Главная

Наверх