Abstract
Retrieval of document fragments has a great potential for application in engineering information management. Frequently engineers have neither the time nor inclination to sift through long documents for small pieces of useful information. Yet it is frequently in the form of one or more long documents that the information that they seek is presented. Supporting the delivery of the right information, in the right format and in the right quantity motivates the search for better ways of handling document sub-components or fragments. Document fragment retrieval can be facilitated using modern computational technologies. This paper proposes a novel framework for information access utilising state-of-the-art computational technologies and introducing the use of multiple document structure views through decomposition schemes. The framework integrates document structure study, mark-up technologies, automated fragment extraction, faceted classification and a document navigation mechanism to achieve the target of retrieval of specific document fragments using precise, complex queries. These disparate elements have been brought together in an exploratory Engineering Document Content Management System (EDCMS). Using this, investigations using representative engineering documents have shown that information users can access and retrieve document content - at fragment level rather than at document level - both through data in a document and document metadata, through different perspectives and at different granularities, and simultaneously across multiple documents as well as within a single document.
Original language | English |
---|---|
Pages (from-to) | 401-413 |
Number of pages | 13 |
Journal | Advanced Engineering Informatics |
Volume | 20 |
Issue number | 4 |
DOIs | |
Publication status | Published - Oct 2006 |
ASJC Scopus subject areas
- Information Systems
- Artificial Intelligence
Keywords
- Computational framework
- Decomposition schemes
- Document mark-up
- Faceted classification
- Fragment retrieval
- XML