PD4ML Web is an alternative HTML to PDF conversion approach, based
on PhantomJS (Qt+WebKit) runtime. It implements the well-known PD4ML Java
API (with minor differences) and allows you to switch to another
converter engine with minimal efforts if your application hits some
generic limitations of PD4ML Java.
PhantomJS/WebKit HTML renderer of PD4ML Web has a number of advantages over the
regular PD4ML: it supports JavaScript, offers a better coverage of HTML/CSS
standards, a better performance by a conversion of extra big HTML documents. In general it can be used for web sites capture - the domain where PD4ML Java is
not strong enough, primarily due to a lack of JavaScript support.
PhantomJS is an open source
BSD licensed software; it can be
compiled for a variety of platforms, including Linux/UNIX headless environments.
Unfortunately PD4ML Web/PhantomJS conversion approach has also some drawbacks.
For an instance, each conversion request forces JVM to start a new PhantomJS
process, which is expensive from time/CPU resource consumption perspective. More
Pros and Cons are summarized in the comparison table below.
How to switch to PD4ML Web
PD4ML Web API is identical to the regular PD4ML API, but located in org.zefer.pd4ml.web.PD4ML package.
The first step would be to add PD4ML Web JAR to the classpath and to change import directive correspondingly from:
import org.zefer.pd4ml.PD4ML;
to
import org.zefer.pd4ml.web.PD4ML;
PD4ML Web class constructor expects a path to PhantomJS executable (i.e. phantomjs.exe)
as a parameter:
PD4ML pd4ml = new PD4ML( "tools/phantomjs.exe" );
That is it. Now it should generate PDFs using the alternative converter engine.
After the changes PDF layouts most probably will slightly (or sometimes
seriously) differ from previous ones, some API calls take no effect. The
comparison table should explain why.
PD4ML vs. PD4ML Web
PhantomJS PDF generator provides only basic PDF output. Some of the missing
features are added on post-processing phase by PD4ML Web. Also the post
processing fixes a number of PhantomJS's PDF generation issues (like corrupted
font kerning). The table tries to summarize the differences and comments some of
them.
Feature
PD4ML
PD4ML Web
PD4ML Web comments
HTML 3
*
*
In general PD4ML Web
provides a better HTML/CSS standards compliance, that makes it a good choice
for tasks, like web site capture.
HTML 4
* (with some
limitations)
*
HTML 5
- (only selected
tags supported)
* (with some limitations)
CSS 1
*
*
CSS 2
* (with some
limitations)
*
CSS 3
- (some
properties/selector types supported)
* (with some
limitations)
JavaScript/DOM
-
*
SVG
* (with some
limitations)
* (with some limitations)
HTML to Image
* (PNG,
multipage TIFF)
* (PNG)
More image types to be supported
HTML to RTF
*
-
To be supported soon
Table of contents
*
-
To be supported soon
Hyperlinks
*
-
PDF bookmarks
*
-
PDF headers / footers with images
*
*
Variable height PDF headers / footers
*
-
No workaround
Secured PDF
*
-
To be supported soon
PDF metadata (author, title, keywords etc)
*
*
Multicolumn PDF layout
*
-
Conditional page breaks
*
-
To be supported soon
PDF/A output
* (with PD4ML DMS
edition)
-
PDF merge
*
*
Dynamic page orientation and format change
*
-
No workaround
Output page range
*
-
To be supported soon
PDF forms
*
-
Footnotes
*
?
PDF attachments
*
-
To be supported soon
Fit document to a single page
*
-
Custom resource loaders
*
-
PDF action handlers
*
-
To be supported soon
Conversion progress monitoring
*
-
"To be supported soon" label marks issues we currently work on. "No workaround"
marks issues we currently see no solution/workaround for.
Patched version of PahantomJS for PD4ML Web
You may always download a pre-built version of PhantomJS from
the official site. However there is an important patch has not been applied
to the actual version (at the moment of writing).
The patch disables automatic content scaling and makes possible a "font
kerning problem" workaround. PD4ML Web distribution includes a patched
version of PhantomJS binaries. If you prefer to use an "official" version of
PhantomJS, set patchedPhantomJS parameter of PD4ML constructor to false
PD4ML pd4ml = new PD4ML( "tools/phantomjs.exe", false );
In the future
we plan to patch some more issues, inherited by PhantomJS from WebKit/Qt,
primarily the missing of hyperlinks support.