A Framework to Support Developers in the Integration and Application of Linked and Open Data

  • Timm Heuss

Student thesis: PhD

Abstract

In the last years, the number of freely available Linked and Open Data datasets has multiplied into tens of thousands. The numbers of applications taking advantage of it, however, have not. Thus, large portions of potentially valuable data remain unexploited and are inaccessible for lay users. Therefore the upfront investment in releasing data in the first place is hard to justify. The lack of applications needs to be addressed in order not to undermine efforts put into Linked and Open Data. In existing research, strong indicators can be found that the dearth of applications is due to a lack of pragmatic, working architectures supporting these applications and guiding developers. In this thesis, a new architecture for the integration and application of Linked and Open Data is presented. Fundamental design decisions are backed up by two studies: firstly, based on real-world Linked and Open Data samples, characteristic properties are identified. A key finding is the fact that large amounts of structured data display tabular structures, do not use clear licensing and involve multiple different file formats. Secondly, following on from that study, a comparison of storage choices in relevant query scenarios is made. It includes the de-facto standard storage choice in this domain, Triples Stores, as well as relational and NoSQL approaches. Results show significant performance deficiencies of some technologies in certain scenarios. Consequently, when integrating Linked and Open Data in scenarios with application-specific entities, the first choice of storage is relational databases. Combining these findings and related best practices of existing research, a prototype framework is implemented using Java 8 and Hibernate. As a proof-of-concept it is employed in an existing Linked and Open Data integration project. Thereby, it is shown that a best practice architectural component is introduced successfully, while development effort to implement specific program code can be simplified. Thus, the present work provides an important foundation for the development of semantic applications based on Linked and Open Data and potentially leads to a broader adoption of such applications.
Date of Award2016
Original languageEnglish
Awarding Institution
  • University of Plymouth
SupervisorBettina Harriehausen-Mühlbauer (Other Supervisor)

Keywords

  • Linked Data
  • Open Data
  • Integration Framework
  • Technical Architecture
  • Data Warehouse
  • Storage evaluation
  • CKAN field study

Cite this

'