Demonstration: Accessing Multiple Databases using Semantic Web technology

Project acronym: QUESTION-HOW
Project Full Title:Quality Engineering Solutions via Tools, Information and Outreach for the New Highly-enriched Offerings from W3C: Evolving the Web in Europe
Project/Contract No. IST-2000-28767
Workpackage 2, Deliverable D2.3

Project Manager: Daniel Dardailler <danield@w3.org>
Author of this document: Ian Johnson and Michael Wilson<m.d.wilson@rl.ac.uk>

Created: 26 May 2003. Last updated: 26 May 2003.


Table of Content:


Introduction

This report describes the production of the demonstrator of semantic web and web services technology to address the classic problem of accessing multiple heterogeneous databases. A public page presenting the demonstrator for interactive use is available on-line.

This deliverable follows from the original plan for this part of the project, and the deliverable on the initial theoretical background to the problem, and is complementary to the public presentation of the demonstration. The issues addressed in those three documents will not be repeated here unless required.

Problem definition

We tackle the problem of submitting a database query to a number of databases that have different schema. We apply semantic web techniques to the problem of converting the user query into a form suitable for submitting to multiple heterogeneous databases. In our simple case, the database tables are of the same ‘shape’ (number of columns), but the table and column names are different in each database.

We represent equivalent column names used in the different database schema by a thesaurus containing a Concept and its Preferred Terms and Used For Terms – the Used For Terms relate the different column names used in each of the databases. We use Thesaurus classes developed as part of the SWAD-E project to assist in the task of translating the query supplied by the user into a set of queries, one for each database.

Description of system

The system comprises three components:

Query Application, which takes a user query and applies it to multiple databases;

Thesaurus Service, a Web Service that provides the query translation interface required by the query application;

Translator – this uses the Thesaurus classes to access the thesaurus describing the equivalent terms in the databases.

Query Application

Design:

The outline operation of the Query Application is to translate the user search term into a set of SQL queries suitable for the schema used in each of the databases. The nature of this translation is kept at a fairly high level, so that an alternative mechanism other than a Thesaurus could be substituted.

Impl:

The Query Application makes a simple call on a Thesaurus Service interface object to determine whether a candidate database contains a column that is equivalent to the search term supplied in the user query.

Thesaurus Service

Design:

This is a Web Service interface to classes providing the Translator functionality. Thesaurus Service provides a SOAP endpoint, called from QueryInterface.

Impl:

A SOAP endpoint created with the Apache Axis SOAP toolkit. From Java interface Translator defining the containsTerm() method, use the Axis java2wsdl tool to create SOAP artefacts, namely the client-side stubs and server-side skeleton, along with the Axis WSDD deployment descriptor. Class WebSvcTranslator implements Translator; constructor creates client-side bindings to ThesaurusService SOAP class. containsTerm() method merely forwards to SOAP class containsTerm() method.

Translator

Design:

FileTranslator implements Translator – reads Thesaurus Schema and Thesaurus definition. Gathers all terms (PT and UFT) relating to a concept and allows match of input with these terms.

Impl:

Simply defines a method ‘boolean containsTerm(String term)’. Two classes implement this interface:

FileTranslator is parameterised by the location of the RDF Schema file and the Thesaurus file, and WebSvcTranslator, which interacts with the client-side of the Apache SOAP toolkit. The FileTranslator constructor creates a static instance of the Thesaurus object - all calls on method containsTerm() use this static variable.

Use of Apache Axis

Creating of Web Service endpoint was by using the Axis java2wsdl tool, which creates the client-side stub and server-side skeleton. An alternative approach would have been to write the WSDL relating to the containsTerm() method, and run wsdl2java to create the Java interface. This approach allows the programmer finer control over the types and encodings used in the generated classes. For a simple interface such as this, with only types boolean and string involved, creating WSDL by hand was not justified.

Other design possibilities

The design described here utilises a familiar object-oriented approach to the task of querying a database. It would be possible to solve this programming problem by viewing the challenge as controlling a series of XML document transformations. Frameworks such as Apache Cocoon (http://cocoon.apache.org/2.0/index.html) or Xbeans (http://www.xbeans.org) allow a higher-level, modular approach to be employed.

Cocoon is designed to allow ‘the separation of content, style, logic and management functions in an XML content based web site’. Cocoon provides the following basic components for processing XML documents:

Generation of XML documents (from content, logic, relational database, objects or any combination) through Generators;

Dispatching based on Matchers;

Transformation (to another XML document, objects or any combination) of XML documents through Transformers;

Aggregation of XML documents through Aggregators;

Rendering XML through Serializers.

A pipeline of such components may be assembled according to a central ‘sitemap’. The operation of each component may be controlled through XSL files.

To implement the multiple database query demonstrator with Cocoon, one could utilise the XmlHttp Transformer component described in ‘Creating SOAP Services with Cocoon’ (http://www.xml.com/pub/a/ws/2003/03/18/cocoon.html) to access the Thesaurus Service via SOAP. Querying the databases would be accomplished via the Cocoon SqlTransformer component. The attraction of implementing the multiple database query demonstrator in Apache Cocoon would be to clearly express the ‘business logic’ of the application in XSL files controlling ready-made components, as opposed to the traditional object-oriented approach, where the business logic is expressed in several places in the program code.

(NB it is not clear whether the SqlTransformer allows more than one JDBC connection to be used per pipeline, i.e. whether more than one database can be accessed.)

Xbeans are Java Beans that manipulate XML data and aim to allow the construction of distributed applications with little or no programming. Xbeans can be composed with any Java Bean design tool to form an XML processing pipeline. Standard Xbeans exists to performing the following tasks:

                Acting as a source of data;

                Translating and XML document into another via XSLT;

                Accessing an SQL database;

                Sending and receiving XML data over a network using a variety of protocols including SOAP.

The operation of Xbeans in an application is controlled by setting their Java Bean properties in whichever Bean design tool is being used. The attraction of implementing the multiple database query demonstrator in Xbeans would be to allow the problem to be solved via the composition of re-usable components in a Java Bean design tool. This allows the well-established procedures for describing and creating Java Beans applications to be applied this problem area.

(NB The Accessor Xbean for querying SQL databases has not been implemented in the current release of Xbeans.)


Deviations from plan

None.

The QH project has provided the necessary resources to bring together previous experience with the classic problem of heterogeneous database access, with the thesaurus development underway in the SWAD-E project, in order to develop a web service and an application that constitute this small demonstrator.