Running Multiple XSLT Engines in Ant

What is Ant?

Ant is a build utility produced as part of the Apache Jakarta project. It is broadly equivalent in function to make under Linux/Unix, or nmake under Windows. These make utilities work by comparing the date of an output file to the date of the input files required to build it. If any of the input files is newer than the output file, the output file needs to be rebuilt. This is a simple rule, but one that generally produces the right results.

Unlike traditional make utilities, Ant is written in Java, so Ant is a good cross-platform solution for controlling the automatic building of files. That is good news for anyone developing cross-platform XSLT scripts, because you only need to target the one build environment. Anyone who has tried writing and maintaining equivalent Windows & Linux/Unix batch scripts knows how hard it is to get the same behaviour across different platforms.

Ant & XSLT

So why would you use Ant & XSLT together? If all you are doing is applying a single XSLT stylesheet to a single XML input file, using a single XSLT engine, then there is probably nothing to be gained. However, if

then Ant is a good, quick way to implement the workflow you need to transform your input(s) into your output(s).

A Simple Example

Using Ant for a simple "1 input, 1 stylesheet, 1 output" transformation may be overkill, but it is a good way to learn how to use Ant. Assume that the input is input.xml, the stylesheet is transform.xsl, and the output is output.html. Then a matching Ant 1.5 project file build.xml is

<project default="do-it">
  <target name="do-it">
    <xslt
      processor="trax"
      in="input.xml"
      style="transform.xsl"
      out="output.html"/>
  </target>
</project>

The root element of an Ant build file is project. It can contain a number of target elements. Its default attribute contains the name of the target to build if no targets are given on the command line. Since the example project file defaults to building the target do-it, the output file could be built equally using any of the following command lines:

$ ant
$ ant do-it
$ ant -buildfile build.xml
$ ant -buildfile build.xml do-it

Unlike the Unix make and its clones, which can use filenames for targets, Ant only uses target names defined in the build file. So every target must have a unique name. Within a target, any number of tasks can be performed. The xslt task comes by default with Ant 1.5. With the processor attribute set to trax, the xslt task uses the default JAXP/TraX XSLT engine on your computer to perform the transformation.

A Complex Example

Now a more complicated XSLT workflow. There are 3 input files (in1.xml, in2.xml and in3.xml). Each of these has the same kind of information, but the formats are different. So, they are normalized to a common format by 3 separate stylesheets (norm1.xsl, norm2.xsl and norm3.xsl respectively). A standard merging stylesheet exists, merge.xsl, but it only merges two inputs (the usual input plus a filename passed as a parameter to the stylesheet). So it has to be used twice in order to merge the 3 normalized files. The merged sum of the 3 is sorted to produce the final output file, out.xml.

Visualization of the Complex Workflow Example

A matching Ant build file (explained in detail afterwards) is

<project default="sort">
  <target name="normalize">
    <xslt
      processor="trax"
      in="in1.xml"
      style="norm1.xsl"
      out="nm1.xml"/>
    <xslt
      processor="trax"
      in="in2.xml"
      style="norm2.xsl"
      out="nm2.xml"/>
    <xslt
      processor="trax"
      in="in3.xml"
      style="norm3.xsl"
      out="nm3.xml"/>
  </target>
  <target name="check12">
    <uptodate
      property="skip.merge12"
      targetfile="m12.xml">
      <srcfiles dir=".">
        <include name="nm1.xml"/>
        <include name="nm2.xml"/>
        <include name="merge.xsl"/>
      </srcfiles>
    </uptodate>
  </target>
  <target
    name="merge12"
    depends="normalize,check12"
    unless="skip.merge12">
    <xslt
      processor="trax"
      in="nm1.xml"
      style="merge.xsl"
      out="m12.xml"
      force="true">
      <param
        name="source2"
        expression="nm2.xml"/>
    </xslt>
  </target>
  <target name="check123">
    <uptodate
      property="skip.merge123"
      targetfile="123.xml">
      <srcfiles dir=".">
        <include name="m12.xml"/>
        <include name="nm3.xml"/>
        <include name="merge.xsl"/>
      </srcfiles>
    </uptodate>
  </target>
  <target
    name="merge123"
    depends="normalize,merge12,check123"
    unless="skip.merge123">
    <xslt
      processor="trax"
      in="m12.xml"
      style="merge.xsl"
      out="123.xml"
      force="true">
      <param
        name="source2"
        expression="nm3.xml"/>
    </xslt>
  </target>
  <target
    name="sort"
    depends="merge123">
    <xslt
      processor="trax"
      in="123.xml"
      style="sort.xsl"
      out="out.xml"/>
  </target>
  <target name="clean">
    <delete>
      <fileset dir=".">
        <include name="output.html"/>
        <include name="nm*.xml"/>
        <include name="m12.xml"/>
        <include name="123.xml"/>
        <include name="out.xml"/>
      </fileset>
    </delete>
  </target>
</project>

Note that Ant takes account of timestamps on files, just like the Unix make. It will not re-run the transformation unless either the input file or the stylesheet is newer than the output file (which usually means that the input file or the stylesheet has been modified since the last build). So if in1.xml is modified, nm2.xml and nm3.xml will not be rebuilt. Alternatively, if in3.xml is modified, m12.xml will not be rebuilt. This can save a lot of development time is situations where one of the transformations takes much longer than the others.

The workings of this Ant project file are as follows:

The files for this project are provided with the zipped examples. You now know everything you need to start using the standard Ant xslt task in your own projects. However, you should also take the time to read the full description of this task in the Ant documentation.

The Difficulty With Multiple XSLT Engines

XSLT stylesheets can provide a good cross-platform solution for manipulating XML, but your different platforms may use different XSLT engines. Sites that are using the Apache Web server often use Apache Xalan. Sites that are using PHP are likely to use Sablotron. Oracle sites often use the Oracle XDK (as this may be the only XSLT engine that the operations people will allow). A lot of XML consultants use & recommend Saxon. Microsoft sites generally use MSXML. Although these XSLT engines behave similarly, there are still some differences, so you need to plan to test with all of the XSLT engines that are likely to be used with your XSLT stylesheets. For this article, we will focus on the Java XSLT engines, since they are the ones supported natively by the Ant xslt task.

When testing with multiple engines, it is useful to be able to run the same test using each XSLT engine from within the one Ant build file. However, there is a problem. The JAXP/Trax interface uses the Java javax.xml.transform.TransformerFactory property to define which class should be instantiated as a factory for creating XSLT engines. So, in order to use the XSLT engine of your choice, this property needs to be set appropriately. However, there is no easy way to do that within Ant, and hence no easy way to change XSLT engines within a single Ant build file (the best you can do is to launch a separate Java process and then call Ant from within that new process). To overcome this problem, the best solution is to create a new XSLT task for Ant, one which makes it easy to select the desired XSLT TransformerFactory.

mtxslt - The Solution

mtxslt (short for “multi-XSLT”) is an Ant task that makes it easy to select the Java XSLT engine(s) of choice within an Ant build file. mtxslt extends the standard Ant xslt task so that it maintains full compatibility with the standard task. Anything that works with the xslt task also works with mtxslt.

With mtxslt, it is possible to ignore the value of the Java javax.xml.transform.TransformerFactory property and simply load a particular XSLT engine directly. At the time of writing, mtxslt supports Xalan 2, Saxon 6/7 and Oracle XDK 9. The easiest way to explain how to use mtxslt is via an example.

A Multiple XSLT Engine Example

This example uses a few new Ant elements. A taskdef is required to associate the task name mtxslt with the Java class which implements it. Actually, you can call mtxslt anything you want just by changing the name in the taskdef. The choice is yours.

The property definitions are used to define values that can be retrieved by name throughout the build file. This is no different to a defining a string variable in a programming language. Here, property definitions are used to define short names for qualified Java class names and for file paths, since both of these tend to be long, and both reduce the readability and maintainability of the build file if repeated throughout the build file.

In this example, different XSLT engines are used to apply the same stylesheet transform.xsl to the same input input.xml. The resultant HTML files can then be compared. The targets in this example build file are explained in detail afterwards.

<project
  name="test"
  default="all">
  <taskdef
    name="mtxslt"
    classname="org.xmLP.ant.taskdefs.xslt.XSLTProcess"/>
  <property
    name="trax"
    value="org.xmLP.ant.taskdefs.optional.TraXLiaison"/>
  <property
    name="xalan2"
    value="org.xmLP.ant.taskdefs.optional.Xalan2Liaison"/>
  <property
    name="xalan2.classpath"
    value="D:\home\tony\XSLT\xalan-j_2_4_0\bin\xalan.jar"/>
  <property
    name="saxon6"
    value="org.xmLP.ant.taskdefs.optional.Saxon6Liaison"/>
  <property
    name="saxon6.classpath"
    value="D:\home\tony\XSLT\Saxon-6.5.2\saxon.jar"/>
  <property
    name="saxon7"
    value="org.xmLP.ant.taskdefs.optional.Saxon7Liaison"/>
  <property
    name="saxon7.classpath"
    value="D:\home\tony\XSLT\Saxon-7.1\saxon7.jar"/>
  <property
    name="oracle9"
    value="org.xmLP.ant.taskdefs.optional.Oracle9Liaison"/>
  <property
    name="oracle9.classpath"
    value="D:\home\tony\XSLT\xdk_java_9_2_0_3_0\lib\xmlparserv2.jar"/>
  <target
    name="all"
    depends="trax1,trax2,trax3,trax4,xalan2,saxon6,saxon7,oracle9"/>
  <target name="trax1">
    <xslt
      processor="trax"
      in="input.xml"
      style="transform.xsl"
      out="trax1.html">
      <param
        name="target"
        expression="trax1"/>
    </xslt>
  </target>
  <target name="trax2">
    <mtxslt
      processor="trax"
      in="input.xml"
      style="transform.xsl"
      out="trax2.html">
      <param
        name="target"
        expression="trax2"/>
    </mtxslt>
  </target>
  <target name="trax3">
    <xslt
      processor="${trax}"
      in="input.xml"
      style="transform.xsl"
      out="trax3.html">
      <param
        name="target"
        expression="trax3"/>
    </xslt>
  </target>
  <target name="trax4">
    <mtxslt
      processor="${trax}"
      in="input.xml"
      style="transform.xsl"
      out="trax4.html">
      <param
        name="target"
        expression="trax4"/>
    </mtxslt>
  </target>
  <target name="xalan2">
    <mtxslt
      processor="${xalan2}"
      in="input.xml"
      style="transform.xsl"
      out="xalan2.html"
      classpath="${xalan2.classpath}">
      <param
        name="target"
        expression="xalan2"/>
    </mtxslt>
  </target>
  <target name="saxon6">
    <mtxslt
      processor="${saxon6}"
      in="input.xml"
      style="transform.xsl"
      out="saxon6.html"
      classpath="${saxon6.classpath}">
      <param
        name="target"
        expression="saxon6"/>
    </mtxslt>
  </target>
  <target name="saxon7">
    <mtxslt
      processor="${saxon7}"
      in="input.xml"
      style="transform.xsl"
      out="saxon7.html"
      classpath="${saxon7.classpath}">
      <param
        name="target"
        expression="saxon7"/>
    </mtxslt>
  </target>
  <target name="oracle9">
    <mtxslt
      processor="${oracle9}"
      in="input.xml"
      style="transform.xsl"
      out="oracle9.html"
      classpath="${oracle9.classpath}">
      <param
        name="target"
        expression="oracle9"/>
    </mtxslt>
  </target>
  <target name="clean">
    <delete>
      <fileset
        dir="."
        includes="*.html"/>
    </delete>
  </target>
</project>

Note that the target parameter that is passed to the stylesheet is purely so that the Ant target name can be embedded in each HTML product file, to make identification of the files easier. It serves no other purpose.

That is all there is to it. You now not only know how to use Ant to control XSLT, you also know how to use mtxslt to control which XSLT engines are used within an Ant build. Note that all of the example files from this article can be downloaded as a ZIP archive.

Conclusion

Ant is a powerful cross-platform tool for controlling build processes, and is ideal for controlling multi-file builds involving XSLT stylesheets. Using mtxslt, you can go further and invoke multiple Java XSLT engines during a single build, which is ideal for portability testing.

People expect authors of technical articles to “eat their own dog food”. So it is worth noting that this article was written using an extended version of DocBook 4.2, and then converted to XHTML (the preferred format of the XML.com editors) using an XSLT stylesheet, with the process controlled by an Ant build file. As well as building the article, Ant controlled the extraction of the Ant build file code out of the DocBook source and into the example build files, and also the regression testing of the examples. It really works!

Resources

Example files Articles Ant XSLT Engines JAXP/TraX