HTML to PDF converter for Java and .NET

HOME   FEATURES   PRODUCTS   DOWNLOADS   BUY NOW!   SUPPORT

See also: PD4ML v4 - Pd4Cmd: multi-purpose command-line converter

 

Pd4Cmd: HTML-to-PDF command-line converter

Pd4Cmd is included to PD4ML library starting from v3.6.0
(Tools mode is available since v3.9.3)

 

Pd4Cmd is a Java command line tool built on the top of PD4ML HTML-to-PDF converter library. The tool offers an access to virtually all PD4ML API functionality and makes possible to use PD4ML converter as a standalone application or as a part of non-Java environments/applications.

Pd4Cmd.class is included to pd4ml(_demo).jar. The source code of the tool can be found here.

 

 

Basic command lines

Pd4Cmd execution requires pd4ml.jar and ss_css2.jar are in the classpath. As long as ss_css2.jar is in the list of pd4ml.jar dependencies, it is sufficient to refer pd4ml.jar from the classpath assuming ss_css2.jar is in the same directory.

Evaluation versions of PD4ML include pd4ml_demo.jar instead of pd4ml.jar. If needed please substitute the names correspondingly in the command-line examples below.

A synopsis of Pd4Cmd parameters:

Usage 1: java -Xmx512m [-Djava.awt.headless=true] Pd4Cmd '<HTML url>' <htmlWidth> [pageFormatName|WxH] [-pdfa] [-smarttablesplit] [-permissions <NUMBER>] [-bookmarks <HEADINGS|ANCHORS>] [-orientation <PORTRAIT|LANDSCAPE>] [-insets <T,L,B,R,><mm|pt>] [-bgcolor <#RGB>] [-bgimage '<url>'] [-ttf <ttf_fonts_dir>] [-ttfrefsonly][-addstyle <CSS code>] [-pdfforms] [-multicolumn <nr,gap>] [-protectpud] [-adjustwidth] [-fitapage] [-nohyperlinks] [-noimagesplit] [-author <author name>] [-title <title override>] [-cookie <name> <value>] [-param <name> <value>] [-header '<header HTML code>'] [-footer '<footer HTML code>'] [-pagerange <page>] [-encoding <HTML encoding>] [-outformat <pdf|rtf|rtfwmf|png8|png24|tiff] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-debug] [-watermark imageurl,x,y,width,height,opacity]
 
Usage 2: java -Xmx512m [-Djava.awt.headless=true] Pd4Cmd -tools '<PDF url>' [-readpassword <password>] [-permissions <NUMBER>] [-author <author name>] [-title <title override>] [-out <output_file_path>] [-password <password>] [-merge <path> <after|before>] [-mergepassword <password>]
 
Usage 3: java -Xmx512m [-Djava.awt.headless=true] Pd4Cmd -tools '<PDF url>' [-readpassword <password>] [-printpermissions] [-printauthor] [-printtitle] [-printpagenum]

  1. HTML-to-PDF conversion with the absolute minimum of parameters

    Win32:
    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd "http://old.pd4ml.com" 1200

    UNIX-derived operating systems:
    java -Xmx512m -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd 'http://old.pd4ml.com' 1200

    The command line overrides the default Java memory heap size limit with -Xmx512m. Here it is set to 512Mb.

    On UNIX platform -Djava.awt.headless=true allows to run the application on non-graphics-enabled servers or from remote ssh/telnet sessions.

    "http://old.pd4ml.com" 1200 are HTML source URL and htmlWidth (virtual "browser" frame width) parameters.
    Please note: on Win32 the URL is enclosed, if needed, to double quotes, on UNIX - to single quotes.

    The default PDF document format: A4 / PORTRAIT

    In the example 1200px width of rendered document will be mapped to 595pt widths of A4 page format.

    As long as an output file path omitted, the output is sent to STDOUT and can be piped to another application.
     

  2. Customized HTML-to-PDF conversion 

    Win32:
    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd "http://old.pd4ml.com" 1200 LETTER -bookmarks HEADINGS -pdfforms -debug -out c:\pd4ml.pdf

    UNIX-derived operating systems:
    java -Xmx512m  -Djava.awt.headless=true -cp ./pd4ml.jar Pd4Cmd 'http://old.pd4ml.com' 1200 LETTER -bookmarks HEADINGS -pdfforms -debug -out /tmp/pd4ml.pdf

    In the examples the generated PDF is written to a file, defined with -out parameter. That makes possible to use STDOUT for debug output (-debug parameter).

    The examples also force PD4ML to produce PDF outlines (bookmarks) from <h1>-<h6> structure of the document (-bookmarks HEADINGS) and to convert HTML forms to interactive PDF forms (-pdfforms). Below is a list of all supported parameters with brief descriptions.
     

  3. PDF meta data reporting

    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd -tools file:c:/docs/test.pdf -printpermissions -printauthor -printtitle -printpagenum

    The call prints to STDOUT basic PDF info: document permissions (as a hex number), document author, document title, number of document pages (decimal number)

    The info can be also requested also by HTML-to-PDF conversion, by PDF document merge, by PDF page removal.
     

  4. PDF page removal  (Tools mode)

    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd -tools file:c:/docs/test.pdf -pagerange 2-3,5+ -out c:/docs/newdoc.pdf

    The call allows to reduce document pages to a given range.
     

  5. PDF documents merge  (Tools mode)

    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd -tools file:c:/docs/test.pdf -merge file:c:/docs/tomerge.pdf after -out c:/docs/newdoc.pdf

    Note: -pagerange option is not available by a PDF merge
     

  6. PDF permissions update  (Tools mode)

    java -Xmx512m -cp .\pd4ml.jar Pd4Cmd -tools file:c:/docs/test.pdf -permissions 28 -out c:/docs/newdoc.pdf

    -permissions 28 is a sum of permissions: AllowDegradedPrint = 4, AllowModify = 8 and AllowCopy = 16. See API reference for more details.
     

 

Command-line parameters

Pd4Cmd parameter Description
'<url>'  (mandatory) URL of HTML source.
  • Supported protocols: file, http and https (https may not work under some JDKs) 
  • If needed, enclose the URL into single quotes on UNIX-derived platforms, into double quotes on Windows.
  • Due specifics of Java, file protocol requires less (than normally) slashes by addressing absolute paths on Windows: "file:c:/path/file.html"

Examples:

'http://old.pd4ml.com'
'http://host/doc.htm;jsessionid=873465837'
'file:c:/path/file.htm'
'file:docs/doc1.htm' (relative to the current directory)

(on Windows platform use double quotes)

<htmlWidth>  (mandatory) Width of "virtual browser" frame. Base for relative width calculations.
pageFormatName|WxH Target page format. Either one of predefined names or WIDTHxHEIGHT dimensions, given in typographical points. Default value: A4

Predefined page formats:

  • A0 - 2384x3370 points
  • A1 - 1684x2384 points
  • A2 - 1190x1684 points
  • A3 - 842x1190 points
  • A4 - 595x842 points
  • A5 - 421x595 points
  • A6 - 297x421 points
  • A7 - 210x297 points
  • A8 - 148x210 points
  • A9 - 105x148 points
  • A10 - 74x105 points
  • HALFLETTER - 396x612 points
  • ISOB0 - 2836x4008 points
  • ISOB1 - 2004x2836 points
  • ISOB2 - 1418x2004 points
  • ISOB3 - 1002x1418 points
  • ISOB4 - 709x1002 points
  • ISOB5 - 501x709 points
  • LEDGER - 1224x792 points
  • LEGAL - 612x1008 points
  • LETTER - 612x792 points
  • NOTE - 540x720 points
  • TABLOID - 792x1224 points

Examples:

A3
400x400
-addstyle <CSS code> The parameter allows to apply additional styles to the source document. Multiple occurrences of the parameter in Pd4Cmd command line are allowed.

Example:

-addstyle 'TH {background-color: tomato} TR {page-break-inside: avoid}'

(on Windows platform use double quotes)

-adjustwidth Sets htmlWidth to the most right margin of the HTML block content. Calling the method would force PD4ML to build HTML layout with htmlWidth to determine the most right edge of rendered content and to use the value for PDF mapping (in other words, to virtually cut any blank area right-side).

Notes:

  • In order to use the method efficiently, it is important to set HtmlWidth value greater than the expected maximal right edge offset.
  • If the source document has HTML objects, whose width is set to 100%, than the method call is meaningless.
  • As long as htmlWidth affects HTML-to-PDF scale factor, usage of the method causes inconstancy of font/object sizes in the resulting PDF from document to document.
-author <author name> Defines document author in PDF properties

Example:

-author 'Max Mustermann'

(on Windows platform use double quotes)

-bgcolor '<#RGB>' Defines background color for PDF pages

Examples:

-bgcolor '#FFFCFE'
-bgcolor 0xFFFCFE

(on Windows platform use double quotes)

-bgimage '<url>' Defines background image for PDF pages. The image will be stretched to cover the entire page, so it makes sense to choose images with dimensions, proportional to the target page format.

Examples:

-bgimage 'http://old.pd4ml.com/i/blank.jpg'
-bgimage 'file:/resources/images/blank.jpg'

(on Windows platform use double quotes)

-bookmarks <HEADINGS|ANCHORS> Forces to generate PDF bookmarks (aka outlines).
  • If set to ANCHORS, PD4ML creates PDF bookmarks taken from <a name="destination"> Label</a> tags. If such tag is empty (Label is not defined), it uses destination string as visible label.
  • if set to HEADINGS, than PD4ML creates PDF bookmark tree structure derived from <H1>-<H6> structure.

Examples:

-bookmarks HEADINGS
-bookmarks ANCHORS
-cookie <name> <value> Allows to define a cookie to be sent with source HTML HTTP request (and all subsequent resource requests). Multiple occurrences of the parameter in Pd4Cmd command line are allowed.

Example:

-cookie JSESSIONID '9034657927465;path=/'

(on Windows platform use double quotes)

-debug Enables PD4ML debug output to STDOUT. The parameter takes no effect if -out parameter is omitted.
-encoding <HTML encoding> Document encoding override
-fitapage Forces PD4ML to downscale entire HTML layout if needed to fit a single PDF page vertically
-footer '<footer HTML code>' (PD4ML Pro only) Defines PDF page footer in HTML. $[page], $[total] and $[title] placeholders are supported.

Example:

-footer '<div width=100% align=right>$[page] of $[total]</div>'

(on Windows platform use double quotes)

-header '<header HTML code>' (PD4ML Pro only) Defines PDF page header in HTML. $[page], $[total] and $[title] placeholders are supported.

Example:

-header '<div width=100% align=right>$[page] of $[total]</div>'

(on Windows platform use double quotes)

-insets <T,L,B,R,><mm|pt> Defines page margins (Top,Left,Bottom,Right). Defaults: 25,50,25,25,pt

Examples:

-insets 10,20,10,10,mm
-insets 20,40,20,20,pt
-merge <path> <after|before> (PD4ML Pro only)Merges conversion result with an existent PDF document. after - append the existing document to the conversion result, before - prepend the document
-multicolumn <nr,gap> (PD4ML Pro only)Outputs multicolumn PDF document. nr - number of columns, gap - column padding
-nohyperlinks Disables to convert external HTML hyperlinks into PDF hyperlinks
-noimagesplitAllows to disable image splitting by page breaks. By default the splitting is enabled. If the parameter is set, than PD4ML tries to put page breaks protecting the images. If an image height (in screen pixels) is bigger than computed page height (in screen pixels), than it will be splitted regardless the option.

Similar behavior may be achieved with IMG{page-break-inside: avoid} CSS style

-orientation <PORTRAIT|LANDSCAPE> LANDSCAPE rotates 90° target page format (default is A4)

Examples:

-orientation PORTRAIT
-orientation LANDSCAPE
-out <output_file_path> Defines target file path/name. Pd4Cmd must have permissions to write the file.

Examples:

-out c:\tmp\out.pdf
-out /tmp/out.pdf
-outformat <pdf|pdfa|rtf|rtfwmf> (PD4ML Pro only) Specifies output file format. pdfa duplicates -pdfa parameter. rtf forces PD4ML to output RTF instead of PDF. rtfwmf outputs RTF and converts images to WMF file format for a better viewer compatibility.
-pagerange <page>Allows to limit a scope of generated pages. Examples: "2+" - skip the first page, "1-2" - output only the first and the second pages, "even" or "odd" - it is obvious. The rules may be combined: "3-7,odd"

Example:

-pagerange '2-3,7+'

(on Windows platform use double quotes)

-param <name> <value>Sets key/value pair to dynamically substitute placeholders in HTML template (like $[key]). Key names "page", "total" and "title" are reserved for PDF headers and footers. Also allows to pass PD4ML tweaking parameters. Multiple occurrences of the parameter in Pd4Cmd command line are allowed.

Examples:

-param date 'Feb 18, 2010'
-param pd4ml.basic.authentication usr:pwd

(on Windows platform use double quotes)

-password <password> Protects the resulting document with a password.

Example:

-password geheim
-pdfa (PD4ML Volume DMS edition only) Forces PD4ML to output PDF compliant with PDF/A specification. PDF/A specification requires all used fonts to be embedded to the resulting document. So the method call cannot guarantee the resulting doc is PDF/A, for example, if TTF embedding (-ttf) is disabled or not configured.

Place pd4ml_rc.jar to the same directory where pd4ml.jar is - it will help to avoid most of the font embedding problems.

-pdfforms Forces PD4ML to convert HTML forms into interactive PDF forms
-permissions <NUMBER> Defines document access permissions. NUMBER is a sum of permission values:
  • AllowAnnotate - (bit 6, value = 32)
  • AllowAssembly - (bit 11, value = 1024)
  • AllowContentExtraction - (bit 10, value = 512)
  • AllowCopy - (bit 5, value = 16)
  • AllowDegradedPrint - (bit 3, value = 4)
  • AllowFillingForms - (bit 9, value = 256)
  • AllowModify - (bit 4, value = 8)
  • AllowPrint - (bit 12 + bit 3, value = 2052)

Examples:

-permissions 2068  - allows to copy and to print the resulting document
-protectpudMakes PD4ML to output PDF objects respecting dimensions/font sizes given in "in", "pt", "cm" etc. By default the physical sizes are converted to pixel equivalents (using 72dpi) and scaled up or down with entire document layout.

Use the feature carefully: as it switched on, there is no single HTML-to-PDF scale factor for all HTML objects. The resulting PDF layout may appear visually corrupted.

-smarttablesplitInsert page breaks inbetween table rows to make the table portions fit PDF page height. If the table has a header (the first rows with <th> cells only) it replicates the row to each table section.

Similar behavior (excluding the header replication) may be achieved with TR, TABLE {page-break-inside: avoid} CSS style

-title <title override> Defines (or overrides) the document title

Example:

-title 'New title'

(on Windows platform use double quotes)

-ttf <ttf_fonts_dir> (PD4ML Pro only) Specifies TTF fonts directory. See reference

Examples:

-ttf c:\windows\fonts
-ttf fonts/
   (relative to the current dir)
-tools Switches Pd4Cmd to a tools mode. In the mode it expects not HTML, but PDF as an input and some HTML conversion-specific features take no effect.

Examples:

-tools file:/docs/test.pdf
-tools file:c:\docs\test.pdf
-tools file:c:/docs/test.pdf
-tools c:\docs\test.pdf
-tools http://pdfcloud.com/test.pdf
-readpassword <password> Specifies an input PDF document password for a case the document is password protected (Tools mode)

Examples:

-readpassword segretto
-mergepassword <password> Specifies a merged PDF document password for a case the document is password protected (Tools mode)

Examples:

-mergepassword segretto
-printpermissions Reads and prints PDF document permissions numberic value in hex form to STDOUT
-printpagenum Reads and prints PDF page number to STDOUT
-printauthor Reads and prints PDF document author
-printtitle Reads and prints PDF document title
Additional information
  1. HTML tags supported by PD4ML
  2. CSS properties supported by PD4ML
  3. PD4ML API reference

 

Copyright ©2004-24 zefer|org. All rights reserved. Bookmark and Share