PDF

The PDF step helps you extract content from a PDF document and sign documents using the SignDoc functionality.

The PDF extract feature is not supported on CentOS/Red Hat Enterprise Linux 7.x operating systems.

The Recorder View shows a single page of the PDF document tree and the extracted text. The robot can navigate through the document using the Next Page, Previous Page and Goto Page actions available on the Application Action menu. The menu is available when you right-click the application tab in the Recorder View.

Text extraction results depend on the internal data and structure of the PDF document. The text is split based on the formatting in the PDF document and the underlying accessibility of data and might include text outside the page boundaries or hidden by overlapping elements. If the required accessibility data is missing from (usually older) PDF documents, it might be necessary to use the Extract Text From Image step to extract the text using OCR.

The Extract text application action and the Extract text component action can be used to extract structured text from a specific area of the page.

Properties

Action

Select Open to load a PDF file.

Document Source

Local File: Specify the path to the file in the local file system in the File path field.
Robot File System: Specify the path to the file in the robot file system in the File path field.
Binary: Specify a variable or expression containing a PDF document in binary form.

Password

Select this option to specify a password for accessing the PDF if necessary.

Page number

Optionally specify the physical page to show after opening the document. If this property is not specified, the first page is shown.

Application actions

Action	Description
Goto Page	Navigates to a page you specify.
Next Page	Navigates to the next page.
Previous Page	Navigates to the previous page.
Extract text	Extracts text from a page area into the selected variable. Specify the following options when extracting the text: X: Set horizontal offset from the leftmost edge of the page. Y: Set vertical offset from the topmost edge of the page. Width: Specify the width. Height: Specify the height. Inclusive: Select to include characters whose bounding boxes are beyond the specified area. By default, this option is not selected, which means that only characters whose bounding boxes are completely within the specified area are included. All units are in Device Tree coordinates.
Insert Image	Inserts JPEG or PNG image from a local folder to the selected page in the document. RFS folders are not supported. The image is positioned based on X and Y coordinates of the upper left corner of the image relative to the upper left corner of the page. Supported units are: Device Tree coordinates Inches / Centimeters Relative to the page size (expressed as a percentage) Note the following rules: Negative values specify the opposite margin. An X value of -1 inch places the image 1 inch from the right margin. If one of the dimensions (width or height) is 0, the image is placed based on the other dimension and the aspect ratio of the image. If an image is resized based on the aspect ratio, it is placed in the center of the specified area. The final values for X, Y, Width, and Height after these calculations are not verified against page dimensions or other page transformations. This can cause parts of the image to be out of the page. Specify the following options when inserting an image: Image path: Enter the full path to an image. Units: Specify units for the image coordinates and size. X: Set horizontal offset from the leftmost edge of the page. Y: Set vertical offset from the topmost edge of the page. Width: Specify the image width. Height: Specify the image height. Keep Aspect Ratio: Keeps the original aspect ratio while inserting it within the specified coordinates and with the specified size.
Insert Image (variable)	Inserts an image from a variable to the selected page in the document. When inserting an image, specify the same options as in the Insert Image action, but instead of specifying the image path, specify the name of the binary variable with an image.
Save As	Inserts a step to save a copy of the document. Specify the full path to save a PDF file.
Save to Variable	Saves a copy of the document in a binary variable.
Close	Closes
SignDoc Actions
Sign with SignDoc	Creates a SignDoc session and submits the PDF document immediately.
Sign with SignDoc (Template)	Creates a SignDoc session based on a SignDoc template and submits the PDF document immediately.
Insert SignDoc Signature Field	Inserts a Signature Field on the current page of the PDF document. The fields are not visible on the page but they appear in the application tree as SignDoc fields.
Get SignDoc Property	Queries properties of the SignDoc session.
Complete SignDoc Request	Closes the SignDoc session and determines how the SignDoc package is processed.

See Sign Documents for information on using SignDoc to sign a document.

Component actions

Action	Description
Extract text	Extracts text from the selected component of the PDF document into a variable. Specify the following options when extracting the text: Width (optional): Set the width of the area to extract text from. Height (optional): Set the height of the area to extract text from. Inclusive: Select to include characters whose bounding boxes are beyond the specified area. By default, this option is not selected, which means that only characters whose bounding boxes are completely within the specified area are included. All units are in Device Tree coordinates.
SignDoc Actions
Insert SignDoc Signature Field	Inserts a Signature Field based on a Component Finder. The fields are not visible on the page but they appear in the application tree as SignDoc fields.
Update Field	Updates attributes of a form field in the PDF document that is identified by SignDoc as supported. Supported fields are listed in the application tree under the SignDoc node.
Assign SignDoc Signer	Assigns a signer to a Signature Field that is already present in the PDF file. These fields are not visible on the page but they appear in the application tree as SignDoc fields.