Apache™ FOP: PDF/A (ISO 19005)
Overview
PDF/A is a standard which turns PDF into an "electronic document file format for long-term preservation". PDF/A-1 is the first part of the standard and is documented in ISO 19005-1:2005(E). Work on PDF/A-2 is in progress at AIIM.
Design documentation on PDF/A can be found on FOP's Wiki on the PDFAConformanceNotes page.
Implementation Status
PDF/A-1b is implemented to the degree that FOP supports the creation of the elements described in ISO 19005-1.
Tests have been performed against jHove and Adobe Acrobat 7.0.7 (Preflight function). FOP does not validate completely against Apago's PDF Appraiser. Reasons unknown due to lack of a full license to get a detailed error protocol.
PDF/A-1a is based on PDF-A-1b and adds accessibility features (such as Tagged PDF). This format is available within the limitation described on the Accessibility page.
PDF/A-2 supports new features added with PDF 1.5, 1.6 and 1.7
PDF/A-3 allows embedding of arbitrary file formats
PDF/A-1b, PDF/A-2b and PDF/A-3b does not require accessibility to be enabled
PDF/A-1a, PDF/A-2a and PDF/A-3a require accessibility to be enabled
PDF/A-2u and PDF/A-3u require unicode to be used and accessibility to be enabled
Modes
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-2u
- PDF/A-3a
- PDF/A-3b
- PDF/A-3u
Usage (fop.xconf)
Add section to pdf renderer with pdfa mode and pdf version.
<fop version="1.0">
<accessibility>true</accessibility>
<renderers>
<renderer mime="application/pdf">
<pdf-a-mode>PDF/A-1a</pdf-a-mode>
<version>1.4</version>
</renderer>
</renderers>
</fop>
Usage (command line)
To activate PDF/A-1b from the command-line, specify "-pdfprofile PDF/A-1b" as a parameter. If there is a violation of one of the validation rules for PDF/A, an error message is presented and the processing stops.
PDF/A-1a is enabled by specifying "-pdfprofile PDF/A-1a".
Usage (embedded)
When FOP is embedded in another Java application you can set a special option on the renderer options in the user agent to activate the PDF/A-1b profile. Here's an example:
userAgent.getRendererOptions().put("pdf-a-mode", "PDF/A-1b");
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, userAgent);
[..]
If one of the validation rules of PDF/A is violated, an PDFConformanceException (descendant of RuntimeException) is thrown.
For PDF/A-1a, just use the string "PDF/A-1a" instead of "PDF/A-1b".
PDF/A in Action
There are a number of things that must be looked after if you activate a PDF/A profile. If you receive a PDFConformanceException, have a look at the following list (not necessarily comprehensive):
-
All fonts are required to be embedded. You should set the font-family attribute on the fo:root element, so all text is by default mapped to a font defined in your fop.xconf.
-
Don't use PDF encryption. PDF/A doesn't allow it.
-
Don't use CMYK images without an ICC color profile. PDF/A doesn't allow mixing color spaces and FOP currently only properly supports the sRGB color space. Please note that FOP embeds a standard sRGB ICC profile (sRGB IEC61966-2.1) as the primary output intent for the PDF if no other output intent has been specified in the configuration.
-
Don't use non-RGB colors in SVG images. Same issue as with CMYK images.
-
Don't use EPS graphics with fo:external-graphic. Embedding EPS graphics in PDF is deprecated since PDF 1.4 and prohibited by PDF/A.
-
PDF is forced to version 1.4 if PDF/A-1 is activated.
-
No filter must be specified explicitely for metadata objects. Metadata must be embedded in clear text so non-PDF-aware applications can extract the XMP metadata.
PDF profile compatibility
The PDF profiles "PDF/X-3:2003" and "PDF/A-1b" (or "PDF/A-1a") are compatible and can both be activated at the same time.
Interoperability
There has been some confusion about the namespace for the PDF/A indicator in the XMP metadata. At least three variants have been seen in the wild:
| http://www.aiim.org/pdfa/ns/id.html | obsolete, from an early draft of ISO-19005-1, used by Adobe Acrobat 7.x | | http://www.aiim.org/pdfa/ns/id | obsolete, found in the original ISO 19005-1:2005 document | | http://www.aiim.org/pdfa/ns/id/ | correct, found in the technical corrigendum 1 of ISO 19005-1:2005 |
If you get an error validating a PDF/A file in Adobe Acrobat 7.x it doesn't mean that FOP did something wrong. It's Acrobat that is at fault. This is fixed in Adobe Acrobat 8.x which uses the correct namespace as described in the technical corrigendum 1.
Metadata example
See this page for more info
[..]
</fo:layout-master-set>
<fo:declarations>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title><rdf:Alt><rdf:li xml:lang="x-default">title</rdf:li></rdf:Alt></dc:title>
<dc:creator><rdf:Seq><rdf:li>Document author</rdf:li></rdf:Seq></dc:creator>
<dc:description><rdf:Alt><rdf:li xml:lang="x-default">Document subject</rdf:li></rdf:Alt></dc:description>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
</fo:declarations>