Fellbaum C. (ed.) WordNet: An Electronic Lexical Database

Файл формата djvu
размером 5,14 МБ

Добавлен пользователем Shushimora 24.12.2011 01:44
Описание отредактировано 24.12.2011 15:06

Fellbaum C. (ed.) WordNet: An Electronic Lexical Database

Издательство MIT Press, 1998, -447 pp.

It crystallized in 1985. Latent and inchoate for twenty years, in 1985 it became explicit, a project we could mention when asked what we were working on. But in 1985 WordNet was very different from what it became ten years later.
One of the project's original presuppositions was the separability hypothesis: that the lexical component of language can be isolated and studied in its own right. The history of lexicography suggests strongly that useful contributions can be made at the level of words. The lexicon is not independent of other components of language, of course, but it does seem to be separable from them. For example, whereas phonology and grammar are mastered once and for all in the early years of life, vocabulary continues to grow as long as a person stays intellectually active. Different cognitive processes seem to be involved.
Another presupposition was the patterning hypothesis: that people could not master and have readily available all the lexical knowledge needed to use a natural language unless they could take advantage of systematic patterns and relations among the meanings that words can be used to express. Those systematic mental patterns have been a subject for speculation at least since the time of Plato, and modern linguistic studies are beginning to suggest ways of identifying them in the semantic structures of natural languages. But much otherwise excellent work along these lines runs aground on the magnitude of the problem. An author might propose a semantic theory and illustrate it with some 20 or 50 English words (usually nouns), leaving the other 100,000 words of English as an exercise for the reader.
So a third presupposition was the comprehensiveness hypothesis: that computational linguistics, if it were ever to process natural languages as people do, would need to have available a store of lexical knowledge as extensive as people have. This observation was simply a language corollary of the growing interest at that time in knowledge-based systems in the field of artificial intelligence. Roger Schank and his colleagues were building language-processing systems having small vocabularies for well- defined topics, where word meanings were represented by a few hundred LISP programs, but it was becoming clear even in 1985 that this approach would have trouble scaling up. There seemed to be a need for a comprehensive lexical database that would include word meanings as well as word forms and that could be used under computer control.
Analyzing a word's meaning into semantic components that can be captured in LISP code is a form of componential lexical semantics. That is to say, componential semantics approaches the meaning of a word in much the same way it approaches the meaning of a sentence: the meaning of a sentence should be decomposable into the meanings of its constituents, and the meaning of a word should be similarly decomposable into certain semantic primitives, or conceptual atoms. Philip N. Johnson-Laird and I had explored componential semantics with much enthusiasm in our 1976 book, Language and Perception, but in 1985 we still did not have a definitive list of the conceptual atoms and it was beginning to look as if, whatever other virtues componential lexical semantics might have, it was not the best theory for natural language processing by computers.
Was there an alternative? In 1985 many cognitive psychologists and computational linguists were formulating word meanings in terms of networks, diagrams with nodes to represent meanings and darts to represent relations between the meanings. For example, table and furniture would label two nodes and a dart between them would represent the proposition that "a table is a kind of furniture." Is-a-kind-of is a semantic relation; no claim is made that the meaning of furniture is a component of the meaning of table. As workers became more self-conscious about the assumptions that are involved in these network representations, it became increasingly obvious that relational lexical semantics is one possible alternative to componential lexical semantics. And Jerry Fodor pointed out that many years earlier Rudolph Carnap had proposed a similar type of relational semantics.
In the early days of WordNet, therefore, we thought that we were testing whether or not a relational lexical semantics could be extended to a larger vocabulary than the toy illustrations of the day. By the time we had convinced ourselves that relational theories could scale up, we had created something that seemed to have intrinsic merit of its own. Thereafter, WordNet grew by the applications we made of it, each application showing the need for a new and better system. It is most appropriate, therefore, that the description of WordNet (version 1.5) in the first part of this book should be followed by an account of some uses that people have found for it.
In those early days, however, we had no plan to construct a complete lexicon. The initial idea was to identify the most important lexical nodes by character strings and to explore the patterns of semantic relations among them. If you wanted the definitions (or pronunciations or etymologies or usages) of a word form that labeled one of those nodes, you should look it up in an on-line, machine-readable dictionary. The theory we were testing assumed that, if you got the pattern of semantic relations right, a definition could be inferred from that—it seemed redundant to include definitions along with the network of semantic relations.

Part I The Lexical Database
Nouns in WordNet
Modifiers in WordNet
A Semantic Network of English Verbs
Design and Implementation of the WordNet Lexical Database and Searching Software
Part II Extensions, Enhancements, and New Perspectives on WordNet
Automated Discovery of WordNet Relations
Representing Verb Alternations in WordNet
The Formalization of WordNet by Methods of Relational Concept Analysis
Part III Applications of Wordnet
Building Semantic Concordances
Performance and Confidence in a Semantic Annotation Task
WordNet and Class-Based Probabilities
Combining Local Context and WordNet Similarity for Word Sense Identification
Using WordNet for Text Retrieval
Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms
Temporal Indexing through Lexical Chaining
COLOR-X: Using Knowledge from WordNet for Conceptual Modeling
Knowledge Processing on an Extended WordNet
Obtaining and Using WordNet