NEXTWRAP - Next generation web wrapper technologies

  • Slany, Wolfgang (Co-Investigator (CoI))
  • Schindler, Christian (Co-Investigator (CoI))
  • Neuhold, Karl (Co-Investigator (CoI))
  • Wotawa, Franz (Principal Investigator (PI))

Project: Research project

Project Details

Description

Ontology Engineering in the Context of Data Extraction In this part studies and research on approaches to ontology engineering are investigated to generate a basic framework that is designed for further reuse. Ontology-based Intelligent Extraction New methods for data extraction from non-HTML documents, in particular on non-structured formats, are studied. The research is mainly conducted on two formats, namely PDF and plain text, the latter mainly in the context of 3270 applications. Novel Semantic Technologies in Wrapping In this part the main goal is to map data instances that have been extracted from e.g. HTML documents to ontologies such as RDF-Schema or OWL. The declarative logic-based language Elog of the Visual Wrapper is ideally suited for tight integration with ontology repositories. Existing RDF repositories like Jena, Sesame and KAON and various existing RDF query languages are analyzed, and the APIs of the libraries are studied to explore ways how to connect the Lixto Visual Wrapper to these repositories. Wrapper Adaptation In this part the goal is to study automatic and semi-automatic repair technologies that change a wrapper accordingly to major structural changes on the underlying Web sites. Human-Machine Communication: htmlButler htmlButler is intended to be a commodity client server based tool through which general web users can visually specify to be informed via Email about changes in a certain area of interest on a Web page.
StatusFinished
Effective start/end date1/01/0531/03/07

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.