<< back
PD4ML: HTML to a raster image conversion
PD4ML (as HTML-to-PDF converter) consists of 2 relatively separate modules: HTML
rendering engine and PDF output pseudo-device, derived from java.awt.Graphics.
That makes an output of a rendered HTML document to an image (or to any other
Graphics device) quite a trivial task.In order to make the task even easier,
we added a image output mode to PD4ML API. With a simple output format switch
you may produce a PNG or a multipage TIFF
pd4ml.outputFormat(PD4Constants.PNG8);
// or
pd4ml.outputFormat(PD4Constants.PNG24);
// or
pd4ml.outputFormat(PD4Constants.TIFF);
The equivalents in JSP taglib:
<pd4ml:transform ... outputFormat="png8"> ... </pd4ml:transform>
<pd4ml:transform ... outputFormat="png24"> ... </pd4ml:transform>
<pd4ml:transform ... outputFormat="tiff"> ... </pd4ml:transform>
(in the case the transform tag automatically sets corresponding Content-type
HTTP header "image/png" or "image/tiff")In the command line tool:
java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out thumbnail.png -outformat png8
java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out thumbnail.png -outformat png24
java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd <URL> 1200 -out thumbnail.tiff -outformat tiff
By PNG image output PD4ML ignores page breaks, but it respects them by multipage
TIFF generation.There are some other limitations in the HTML-to-Image
conversion mode.
- No headers/footers supported
- No footnotes
- No hyperlinks (of course)
- Generated TOC has no page numbering
- No page insets applied (however document body margins are there)
- etc
Also, for a case, you need further image data processing, PD4ML API
introduces a couple of specialized renderAsImages() methods, which return
an array of BufferedImage objects, represent document pages.
The biggest source of troubles by image output is memory allocation. Even a
relatively small HTML layout 1000x5000px requires to allocate at least 20 MB for
image bytes output (plus BufferedImage class infrastructure overhead).
|