ActiveXML documentation [under development]

Authors: Evaldas Taroza, Anca Ghitescu
More about the technology you can find on the Website

Table of Contents

Referenced Resources


Introduction

This document is a report on the development of ActiveXML software. It is supposed to be useful for both: end users and developers.

ActiveXML is based on XML and Web services. An active document is an XML document with embedded service calls to Web services, which enrich the document upon activation. This technology in a peer-to-peer setting relies on the fact that peers communicate solely by means of Web services. The ActiveXML language has a predefined set of Web services for every peer, which enables distributed data management. Still, any peer can also provide other services, which actually causes the network to become a distributed database.

We can distinguish among several kinds of service calls:

Calling a Web service in any way is not a problem these days. There are libraries that allow to call a Web service synchronously, asynchronously, one-way or two-way. However, the promise of ActiveXML is to be able to call Web services declaratively. For instance, instead of creating software for calling some SOAP Web service, one could simply create an active document with a service call to that Web service and pass it to the engine to evaluate. Moreover, instead of creating software for piping Web services one could create an active document where one service call has another service call as a parameter. This shows that the correct usage of ActiveXML lies in creating, transforming, sending, receiving active documents and materializing service calls inside them.

In the following sections the current architecture of ActiveXML will be presented. The details about installation and usage will be provided. The developers will be given a more indepth information about the specific implementation points and organization of the code in the repository.

Back to top

Architecture

In this section a high level architecture of ActiveXML is presented.

Organization of Components

There are 4 software components that make up an ActiveXML peer:

  • Web server. We use Tomcat 5.5
  • XML database. Currently we use eXist
  • Axis2 Web service engine
  • ActiveXML service calls execution engine

Axis2 and ActiveXML make up a Web application that is deployed in the Tomcat 5.5 Web server. Then the Web application is configured to connect to eXist database where each peer has its own repository of documents (a separate collection).

Actually the components can be organized in several ways. In the simplest (and most expensive) case everything is in the Web container (Tomcat) as Web applications. Hence the peer has its own Web server and its own database instance. It is, nevertheless, possible that several peers share one Web server and the same database instance. Possible configurations are depicted bellow (see how to create new peers). As the picture shows there can be several peers residing in one Web server while the database can be inside or outside the Web server. The peer-to-peer network is a network of such possible configurations where, as already mentioned, the peers communicate only by the means of Web services.

Deeper Look into a Peer

First of all, one can use arbitrary XML database for peer's repository. However, there is still no standard way for accessing XML databases, querying or updating them. Relational databases have JDBC, which is a standard way to connect to SQL data sources and post queries. XQJ, the analog of JDBC for XML datasources, is on the road, as well as standard for XML updates.

Taking such situation into account the decision to create our own API is reasonable. Therefore above XML database there is a DB access API layer (see the picture). This API can be seen as a very modest XQJ with updates using XUpdate. We currently have an implementation of this API for eXist, other databases will need their own implementations (drivers). It is important that other databases support XUpdate since the core ActiveXML services are using this language for updating active documents. Worth mentioning that it is relatively easy to write a driver for XML databases that support XML:DB interface, like eXist does.

Above the DB access API is the document management layer. A peer has a bunch of documents in its repository. Those documents may have service calls inside them. This roughly describes what constitutes the model part of the software (in MVC architecture). The controller, on the other hand, is a piece of software that manages the model. It sits in between the repository and ActiveXML clients (seet the picture) and updates the state of the model. For instance, document management layer is responsible for marking service calls as active/terminated, also for executing a scheduled service call, etc. In previous version of ActiveXML this layer also kept all the repository in memory as DOM objects. Now it only keeps references to the interesting parts of active documents using XPath and XQuery.

A peer is visible to the outside world only by the exposed interface, i.e. the Web services. This can be treated as the view part in MVC architecture. So when someone pushes a button (invokes peer's Web service), the state of the peer changes, and the model (some document) is updated by the controller.

Web services in ActiveXML are facilitated by Axis2. Its beautiful design allows plugging-in new Web services quite easily. This is essential as the functionality of a peers is exposed only by Web services. For instance, if peer is able to answer a query over streams it exposes this functionality as a Web service (let's call it GenericQueryService), so that other peers could call this Web service from within an active document. There are several Web services that are inherent from ActiveXML (shown as buttons in the picture):

The above listed basic interface can be extended and is still under discussion. For example, do we need a Web service like InstallDocument, or should we add an input parameter to NewNodeOperator that would install a document without evaluating it; do we need an evaluateData operation for MaterializationService that would accept as input active document and evaluate it? All the other Web services are extensions of peer's functionality and can simply be considered as plug-ins. Beside the core ones currently there are following services available:

  1. GenericQueryService, this Web service is a relatively simple implementation of stream processing engine. As an input it gets a query declaration and parameter streams. It can also be used to answer XQuery queries over the database
  2. DummyStreamService is useful for testing. When called it streams back result of a specified query
  3. OptimaxService, distributed query optimizer. Given an active document the Web service transforms it into a distributed plan. The documents that have service calls to GenericQueryService are candidates for optimization with this Web service

For more details on peer's Web services find examples and schemas presented later.

What is a Stream?

Now some clarification about streams. A stream is a channel of data between a service call and the Web service it is calling. The important thing is that data keeps arriving to the service call continuously (and asynchronously) until the end of stream. The procedure is as follows (see the picture):

  1. A request comes to materialize some service call
  2. The service call invokes the Web service and together with the message attaches a header with information about itself
  3. The Web service repsonds with a message marked as STREAM (if it has what to stream and if it recognizes the header)
  4. The service call stays active if the Web service response was marked as STREAM
  5. Then it depends on the Web service when it sends data to the requesting service call.

In the picture above SEND is shown as a separate service, still, the streaming service can directly call RECEIVE. It is then said as behaving like SEND. So to put it simple, a stream is a channel between SEND and RECEIVE. A streaming service is responsible for SEND whereas a service call is the RECEIVE'er.

Worth noticing that a stream can be seen as a communication protocol, nothing else. Current implementation works as described (using markers), however, other implementations or extensions are apropriate. For example, it looks natural to use WS-Addressing for addressing the service call. Also a mechanism of handling sessions can be fruitfully utilized, because a service call is simply a client for a specific Web service.

Actually the concept of streams also captures non-streaming communication between service calls and Web services. In this case the stream consists of one item that is sent directly as a response to the service call and is implicitly marked with END_OF_STREAM. Therefore, materializing a service call ends up in 1) service call termination if the Web service responds with the END_OF_STREAM (it is implicitly assumed), or 2) it stays active if the Web service responds with the STREAM marker and terminates only when its RECEIVE detects the END_OF_STREAM.

A General Picture

To put things together what we have is the architecture shown in the picture. First, there is a peer-to-peer network of ActiveXML peers. The peers are extensible by plugged-in Web services. Note that Web services are only the interface to the underlying software, which can vary from a very complex system to a simple function. Although the plug-in architecture of a peer resembles much an ESB (Enterprise Service Bus) its purpose is quite different from just orchestrating Web services. In particular, the plugged-in systems are supposed to make use of ActiveXML. This means that the exposed Web services:

In essence using ActiveXML means 1) using special syntax to declaratively specify calls to Web services and 2) using the engine for materializing those service calls. Hence ideally service consumers would write active documents and materialize service calls inside them. While service providers would develop systems and expose them as Web services potentially internally also using ActiveXML.

Besides the peer-to-peer network, XML and Web services allow ActiveXML to enter the Web in general. As it is shown in the picture RECEIVE can get data from everywhere, that is, an active document can contain service calls to any Web service, be it on another peer or somewhere on the Web. Similarly it is not forbidden for external systems to call Web services inside the peer-to-peer network if it makes sense to them (because the returned data may be active). Moreover, since streaming is just a communication protocol nothing stops Web services outside the peer-to-peer network to stream data back to a requesting service call through its RECEIVE.

User Guide

In this section you will find information about configuration, creation of active documents and materialization of service calls. It will be shown what is visible to the user and what is happening behind the scenes.

Installation and Configuration

It is quite easy to start using ActiveXML when you know about what it is made of. The distribution is created in such a way that it includes everything that is needed for the start (except Java Runtime Environment), namely the 4 components: Tomcat, eXist, Axis2 and ActiveXML (see the compenents representation).

First thing to know is that the distribution is simply the Tomcat Web server with several Web applications included, specifically: eXist and MyPeer. The latter is actually integrated into Axis2 Web application in order to make use of the Web service repository management.

From the file structure you can easily see what was changed in Tomcat:

  • the root folder was renamed into 'ActiveXML'
  • unneeded Web applications under webapps/ were removed, added 'exist' and 'MyPeer'
  • some libraries were put to the shared classpath under shared/. The most important ones are Axis2 libraries and database libraries

The following configuration changes were performed for Tomcat:

  • in conf/catalina.properties paths to new shared folders were added (shared/db/*.jar, shared/axis2/*.jar)
  • in conf/server.xml ports were changed into 6969 (for HTTP requests) and 6960 (for Web server shutdown request)

So generally not much changed in Tomcat, therefore for things like changing a port, starting, stopping or anyhow configuring the Web server please follow Tomcat's documentation.

Perhaps it is also appropriate consider another Web container instead of Tomcat, for instance, Jetty to make the distribution smaller.

Running

Only 3 things are needed to know:

  1. JAVA_HOME environment variable must be set. It must point to the JDK or JRE root directory. JRE_HOME variable name is also fine (e.g. JRE_HOME=C:\Program Files\Java\jre1.5.0_09). Usually for this purpose one can use set command line tool. This point implies that you have a JVM on your computer.
  2. use bin/startup.bat or bin/startup.sh to start the Web server
  3. use bin/shutdown.bat or bin/shutdown.sh to stop the Web server

That's it. 'ActiveXML' folder can be coppied to any writable media like the hard disc or a USB key and you can consider to have ActiveXML installed.

NOTE: on Linux be sure to make bin/*.sh files executable
NOTE: if you have another instance of Tomcat running on your machine, you will need additional steps, for example, setting the CATALINA_BASE variable.

Web Application: eXist [ ActiveXML/webapps/exist ]

In the full distribution eXist is included as downloaded (*.war=~23M, unzipped ~50M). Hence using an external database instance could considerably reduce the weight of the distribution and not only in size. For instance, think that you would want to cary the ActiveXML peer on a PDA or a USB key. Therefore one should consider the (3) type of component organization.

You can browse eXist database at http://localhost:6969/exist/admin/admin.xql (login: admin/exist). Alternatively you can try the Java interface at http://demo.exist-db.org/webstart/exist.jnlp which is, however, not always working.

Web Application: MyPeer [ ActiveXML/webapps/MyPeer ]

As it was mentioned ActiveXML is merged with Axis2 Web application. This choice is made in order to utilize the internals of Axis2 when dealing with Web services, which is crucial for ActiveXML. For deeper information please read Axis2 documentation.

For now there are 4 important things to know about Axis2 in ActiveXML distribution:

  1. MyPeer is an extension of axis2.war
  2. it has a Web services repository by default configured under WEB-INF/
    • conf/axis2.xml, the repository configuration file. Here one can specify in/out phases in order to engage modules for a message flow, also one can configure transports, ports and everything else related to Web service mechanics
    • modules/*.mar, a module is a collection of handlers that are used to process an incoming/outgoing message. The modules are pluggable and engageable during a specified phase of a message flow. *.mar (module archive) is simply a zipped collection of resources, like classfiles, libraries and module.xml. There are 2 tiny modules that belong to ActiveXML:
      • ServiceCallWrapper.mar, responsible for creating a valid message for the Web service refered in the service call. For instance, a user does not write a SOAP messages, it is left to this module. It also attaches information about the service call in headers
      • ActiveXMLServiceEntry.mar, makes the ActiveXML context available for Web services. Without it the engine nor repository would not be easily accessible
    • services/*.aar, *.aar (axis archive) is a zipped collection of resources that consist a Web service. It can also be kept unzipped (the same holds for modules) like it is in case of services/OptimaxService/. So a Web service is indeed an independent piece of software, that has its own implementation, libraries and configuration, which enables pluggable architecture of ActiveXML
  3. it has a user interface for managing Web services under axis2-web/; normally it will be accessible at http://localhost:6969/MyPeer/axis2-web (login: admin/axis2). Consult it for service endpoint, WSDL, XSD and other information about web services of the peer
  4. all the Axis2 related libraries (from WEB-INF/lib/) are shared

The important files related only to ActiveXML are:

As one could guess to open a simple user interface of MyPeer the following path should be used: http://localhost:6969/MyPeer (after starting the Web server, of course).

context.xml, pay attention when using another Web container

logs

Shared Libraries [ ActiveXML/shared ]

Beside MyPeer there can be more peers configured to live in the same Web container (and possibly using the same database instance). Therefore it makes sense to share the heaviest libraries among all of them.

Creating a New Peer

A peer consists of a collection of (active) documents and a set of Web services. In the installation and configuration section it was shown, how the pieces are put together to have MyPeer up and running. If one wants to add more peers to his/her Web container the procedure is rather easy (can even be automatized):

  1. make sure the Web server is stopped (use ActiveXML/bin/shutdown.*)
  2. copy MyPeer inside ActiveXML/webapps with a given name (e.g. AnotherPeer)
  3. change the following resources:
    • ActiveXML/MyPeer/META-INF/context.xml, it was mentioned that here connection to the database is configured. In case of eXist it looks as follows:
      <Context> <Resource name="axml/repository" factory="fr.inria.gemo.axml.db.DataSourceFactory" type="fr.inria.gemo.axml.db.IDataSource" driverClassName="fr.inria.gemo.axml.db.exist.DataSourceImpl" url="xmldb:exist://localhost:6969/exist/xmlrpc/db/MyPeer" userName="admin" password="" /> </Context>
      It is enough to change the url of peer's repository into: url="xmldb:exist://localhost:6969/exist/xmlrpc/db/AnotherPeer". What actually happens is that MyPeer and AnotherPeer will have their own document repositories (collections) on the same instance of eXist (see the picture)
    • ActiveXML/MyPeer/WEB-INF/classes/log4j.xml, replace all occurances of 'MyPeer' into 'AnotherPeer' in order to get separate log files for AnotherPeer
    • ActiveXML/MyPeer/WEB-INF/web.xml, replace
      <display-name>MyPeer</display-name> into <display-name>AnotherPeer</display-name>
      because currently it is the only peer identification method. Usually in a peer-to-peer environment peers have unique ids. Although it is not implemented, normally a peer should obtain an id when it first joins the network. This is crucial for catalogging available Web services in the network
  4. start the Web server (use ActiveXML/bin/startup.*). Now AnotherPeer should be accessible at http://localhost:6969/AnotherPeer

What is important to understand that after creation of AnotherPeer it lives totally in its own context: it has its own repository of documents, and its own repository of Web services.

Service Calls and Active Documents

ActiveXML documents (or simply active documents) are XML documents with embedded service calls. Embedding a service call is just a matter of following the syntax developed for service calls so that the ActiveXML documents engine would be able to recognize them and apply semantics. Without an engine active documents are normal XML documents. The syntax for service calls is given in axml.xsd. For example, a simple active document looks like this:

BEFORE:
<example xmlns:axml="http://futurs.inria.fr/gemo/axml/"> <axml:sc axml:id="version"> <axml:return> <axml:append /> </axml:return> <axml:ws-soap endpoint="http://localhost:6969/MyPeer/services/Version"> <v:getVersion xmlns:v="unknown" /> </axml:ws-soap> </axml:sc> </example>

As shown in the example, service call is an element in a special namespace and it has the following features:

Beside the listed features service call can be setup to execute after other service call(s). It is also possible to impose periodical execution, however, this is not advised, because this feature still needs much thinking and may later be removed because it is very expensive.

AFTER:
<example xmlns:axml="http://futurs.inria.fr/gemo/axml/"> <axml:sc axml:id="version" activated="2007-10-22T19:17:33.921+02:00"> <axml:activation status="TERMINATED"/> <axml:return> <axml:append/> </axml:return> <axml:ws-soap endpoint="http://localhost:6969/MyPeer/services/Version"> <v:getVersion xmlns:v="unknown"/> </axml:ws-soap> </axml:sc> <ns:return xmlns:ns="http://axisversion.sample" axml:origin="version" axml:timestamp="2007-10-22T19:17:35.484+02:00"> Hello I am Axis2 version service , My version is 1.3 </ns:return> </example>

After materializing the service call in the example the document changed in the following way:

For information on where to put a document, how to activate a service call, evaluate the whole document, etc. please see the demo. Worth mentioning that it is done by calling peer's Web services.

Internals of Service Call Materialization[TODO]

Demo [TODO]

Interface of a Peer [TODO]

Back to top

Developer Guide

Developers that want to dive into the code and better understand the architecture of ActiveXML should find useful information here. Before going into details it is worth mentioning that it would be a good idea to transform the project into a Maven2 project which would easen the way dependencies are downloaded and the project is built.

Organization of Sources

The project is developed using Eclipse, therefore it is to start contributing to the code one needs to checkout the whole project from the CVS(or SVN) repository and configure the used libraries. Currently (2007-11-27) the new version of ActiveXML is branched (almost forked) into the branch 'eXist' and also tagged as 'V2'. In other words to get the latest changes use the branch 'eXist' and to get the last working version use the tag 'V2'.

ActiveXML depends on the following libraries:

When the code is downloaded, libraries are configured and finally the project compiles in Eclipse without errors it is fine to analyse the sources deeper. The code (end the whole software) can be divided into the following parts:

Every listed item have associated build.xml in a respective folder. There are separate output folders (where the .class files go) for every component, service, module, and the core. So development goes like this:

  1. Decide where the source should go. In most of the cases a new component or a service will need to be developed
  2. Create a folder in respective place and make it as source (Eclipse: [right click]->Build Path...->Use as Source Folder)
  3. Configure the output folder for the new sources (Eclipse: [right click]->Build Path...->Configure Output Folder). Normally it should be a bin folder inside
  4. Update the respective build.xml to get a .jar, .aar or .mar out of the sources (see how to build)

Inside ActiveXML/resources there are all the .xsd files that define syntax for different things. The schemas are quite documented so they will not be covered here:

Since the most difficult is axml.xsd it is advisable to use this schema when creating active documents from the scratch. For instance, Eclipse shows the autocomplete and documentation for elements and types when the the .xsd is included using xsi:schemaLocation attribute. Any other XML editing tool should also provide that.

Building and Creating a Distribution

When everything compiles in Eclipse still specific build procedure must be held in order to properly deploy ActiveXML. To understand the building process it is enough to go through all the build.xml files in the project. Before doing any builds or bindings ActiveXML/build.properties must be updated. Bellow the comments on build files is given:

Back to top

Development scripts[TODO]

For development use ant file buildDev.xml and buildDev.properties.

Back to top

Static Architecture [TODO]

Back to top

Functionality of ActiveXML

An active xml document contains service calls. They could be calls to any web service or to specific ActiveXML web services (MaterializationService, OptimaxService, GenericQueryService, DummyStreamService, algebra service: ReceiveOperator, SendOperator, NewNodeOperator). We would like to explain in this section the functionality of each web service, and the chain of execution from when a service call is activated until the results are received and appended to the active document.
We identify two parties:

AXML syntax

It explains the syntax of xml schema files (.xsd) described in section Organization of sources

axml.xsd
The activation status of a service call has multiple possible value:

Materializers' description

A service call is materialized (call Web service, consume result) by materializers. A service call (sc) one can set the materializer class. This class is instantiated when creating a service call object. All the materializer related classes are in fr.inria.gemo.axml.model.sc.materialization:

MaterializerUsingConstraints and DeepTreeMaterializerUsingConstraints is a helper for MaterializationService. These materializers are not designed for service calls.

Q: What happens if the materializer is not specified in the activexml document?

There is a IMaterializer interface which has one method: materialize(). There is a class implementing this interface which is able to deal with a batch of materializers, i.e. given a batch of IMaterializer instances it calls in parallel their materialize() method, it returns when all the threads join. This way you can combine simple and batched materializers to have any order of materialization.
There is also another flavor of batch materializers: materializer with constraints. It keeps a buffer of materializers which constrain its execution with afterActivated and afterTerminated, so every time something is activated or terminated it tries to execute or waits till the constraints are satisfied.
The materializers with constraints are used by MaterializationService. When you want to evaluate a document such a materializer is created by collecting the afterActivated and afterTerminated constraints, then materialize() is called and all the service calls are materialized with the given constraints.
So to materialize service calls you simply create materializer objects for them. The materializers can extend functionality of a service call, as it is with SEND and RECEIVE (therefore you need to specify the special materializers for them), but in general there is a default materializer which looks at the service call, creates the SOAP message, calls the Web service and deals with the result.

Back to top

ActiveXML Web services' implementation

GenericQueryService
Back to top

Activation: Chain of execution

Creating a Plug-In Web Service [TODO]

How to Develop Without Interfering With the Existing Code

ActiveXML currently requires two kinds of development: 1) debugging or optimization and 2) plug-ins or applications. The former kind implies changes in the current code, however, one should perfectly know what she's doing. The latter is the final goal of how code should be contributed to ActiveXML. Good systems are always pluggable platforms and ActiveXML must become one of those because of its SOA nature.

As it was mentioned earlier plug-ins in ActiveXML correspond to Web services. Every Web service can utilize the internals of the ActiveXML (activation of service calls) and bring new functionality. For example, Optimax is heavily using the engine for optimizing active documents in a distributed environment (please refer to Optimax documentation for details). Another example, could be a catalog service that would crawl the distributed network and catalog all the Web services that it finds.

It is a good place here to remind that ActiveXML also is extensible on a service call level. Axis2 allows to plug modules that would intercept a traveling SOAP message. This is one way of decorating the behavior of a Web service (and therefore a service call). On the other hand it is not always appropriate to use Axis2 handlers because the chain of execution design pattern is not always suitable which is the case, for instance, in algebra services. Therefore it is possible to attach a specific implementation of the AbstractPrimitiveMaterializer that would capture the logic of materialization of a service call. For instance, MaterializerForSEND, MaterializerForReceive, MaterializerForContinuousQuery use this feature.