This thesis concerns the COMMIX system, which automatically extracts
information on what a text is about, and generates that information in the highly
compacted form of compound nominal expressions. The expressions generated
are complex and may include novel terms which do not appear themselves in
the input text.
From the practical point of view, the work is driven by the need for better
representations of content: for representations which are shorter and more
concise than would appear in an abstract, yet more informative and
representative of the actual aboutness than commonly occurs in indexing
expressions and key terms. This additional layer of representation is referred to in
this work as pertaining to the essence of a particular text.
From a theoretical standpoint, the thesis shows how the compound
nominal as a construct can be successfully employed in these highly informative
representations. It involves an exploration of the claim that there is sufficient
semantic information contained within the standard dictionary glosses for
individual words to enable the construction of useful and highly representative
novel compound nominal expressions, without recourse to standard syntactic
and statistical methods. It shows how a shallow semantic approach to content
identification which is based on lexical overlap can produce some very
encouraging results.
The methodology employed, and described herein, is domain-independent,
and does not require the specification of templates with which the
input text must comply. In these two respects, the methodology developed in this
work avoids two of the most common problems associated with information
extraction.
As regards the evaluation of this type of work, the thesis introduces and
utilises the notion of percentage attainment value, which is used in conjunction
with subjects' opinions about the degree to which the aboutness terms succeed in
indicating the subject matter of the texts for which they were generated.
Date of Award | 1998 |
---|
Original language | English |
---|
Awarding Institution | |
---|
The Generation of Compound Nominals to Represent the Essence of Text The COMMIX System
Norris, J. V. (Author). 1998
Student thesis: PhD