The Apache FOP Project

The Apache™ FOP Project

Apache™ FOP Design: Input Parsing

Introduction

Parsing is the process of reading the XSL-FO input and making the information in it available to Apache™ FOP.

SAX for Input

The two standard ways of dealing with XML input are SAX and DOM. SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document. DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole. One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input. (DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).

Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. Instead, FOP takes SAX events and builds its own tree-like structure. Why?

See the Input Section of the User Embedding Document for a discussion of input usage patterns and some implementation details.

FOP's FO Tree Mechanism is responsible for catching the SAX events and processing them.

Validation

If the input XML is not well-formed, that will be reported.

There is no DTD for XSL-FO, so no formal validation is possible at the parser level.

The SAX handler will report an error for unrecognized namespaces.

Namespaces

To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.

See User Extensions for a discussion of standard extensions shipped with FOP, and their related namespaces.

See Developer Extensions for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.

Status

To Do

Work In Progress

Completed