Create fixed-form projects with RecoStar Design Studio

Transact supports fixed-form extraction to capture handprint, constrained text, check boxes, and signature detection. This topic explains how to use RecoStar Design Studio to create a fixed-form project to be used in conjunction with an Transact batch class and document type. The following is a general outline:

  1. Create the RecoStar Design Studio project.
  2. Set up individual fields for extraction.
  3. Create a Transact batch class and document type.
  4. Integrate the RecoStar Design Studio project file into Transact.

Create and configure a RecoStar project

If you need assistance using RecoStar Design Studio, download the user manual from the Customer Support Portal Downloads page.

  1. Navigate to [Ephesoft_Directory]\Application\native\RecostarPlugin\RecoStarDesignStudio and double-click RecoStarDesignStudio.exe to launch the RecoStar Design Studio application.
  2. Under File > New Project, select Single Form and click Next.

    The New RecoStar Project window appears.

  3. Enter a name for the project and click Browse to select a location where the project is saved.

    The system automatically builds out the project directory based on the project name. For this tutorial, we are using the IRS 2018 1040 schedule 1 tax form.

  4. Click Next.

    The Working Image Files screen appears.

  5. Right-click the panel on the right to choose the Add Files command or click the Add Files button.

    Select a sample file of good quality that is populated with data and click Next.

  6. On the next screen, verify that USA is selected as the country and click Next.
  7. Verify the displayed information is correct and click Finish.

    The project opens inside the application with your sample file centered in the top panel.

  8. Right-click the form name and click Rename to rename the form.

    View the renamed form.

  9. Right-click IcrField and rename this operator "SSN".

    This name must match the name of the corresponding index field in your Transact batch class where this extracted value is displayed in Transact.

  10. Resize the dotted box on the preview so that it roughly surrounds the SSN field.

    Focus SSN Field

  11. Press Ctrl+W or right-click the image and choose Fit Width to zoom in the document.

    Now that you can see the details of the document more clearly, resize the dotted box so that it more closely surrounds the SSN value.

  12. Right-click BinaryImageSequence and select Add > Remove Line System.

    Confirm that BinaryImageSequence has been added.

  13. Select File > Save Project to save your work.
  14. To verify that the SSN value has been extracted correctly, click Run Selected Image.

Configuration steps in Transact

  1. Create a project in Transact and configure it to use this RecoStar Design Studio project file.

    Launch Transact and log in as a batch class administrator.

  2. Click Add to create a new batch class.

    The batch class name does not need to match the RecoStar Design Studio project.

  3. Confirm that the new batch class has been created in the Batch Class Management.
  4. Select the new batch class and click Open to begin editing.
  5. Click Doc Type and select Create New.
  6. Select the Description field and edit the description of the new document type accordingly.
  7. Select the Minimum Confidence field and reduce the Minimum Confidence value to 8. Click Apply.
  8. Select the new document type and click Upload Learn Files.
  9. In the Open window, select a blank template of the document and click Open.

    This will train Transact to recognize this document type.

  10. From the Document Types folder, navigate to [New Document Type] > Index Fields.
  11. Click Add to create a new index field.

    Name the index field SSN. This name must match the name of the field you defined in the RecoStar project. Click Apply.

  12. In the Additional Configuration column, click the down arrow to expand the drop-down list and select the Force Validation check box.

    This will force all batches to stop in the Validation module so we can verify the results of our extraction rules.

  13. Click Apply to save your changes.
  14. Navigate to Modules > Extraction > RECOSTAR_EXTRACTION and set RecoStar Extraction Switch to ON.

    Click Apply to save your changes.

  15. Open File Explorer (previously Windows Explorer) and navigate to the location of your RecoStar project.
  16. Copy (CTRL + C) the RecoStar (.rsp) file from that folder.
  17. Navigate to the location of your batch class in Transact.

    Continue to fixed-form-extraction and open the folder for your new document type.

  18. Paste (CTRL + V) the .rsp file into the document type folder.
  19. In Transact, navigate to New Document Type > Index Fields > Fixed Form Extraction.

    This is where you can map specific pages of a document type to a RecoStar project file.

  20. Click Addand then set the Page Number field to 1.

    The File Name field should auto-populate because you only have a single .rsp file in that folder. Click Apply to save your changes.

  21. Navigate to Operator > Upload Batch, click the Batch Class drop-down list and select your new batch class from the provided list.
  22. Under Upload Files, click Select Files and select a populated version of your document to be processed.

    The file is displayed in the Upload Batch screen.

  23. Click Start Batch.
  24. Navigate to Batch Instance Management screen and wait for the batch to stop in Validation.

    When the batch stops in Validation, open it for editing. The highlighting on the preview shows from where the value was extracted.

  25. Repeat these steps to add new index fields in RecoStar Design Studio, along with matching index fields in Transact.

    Each time you make changes to the RecoStar project file, you must copy the .rsp file from the RecoStar project area into your batch class's fixed-form folder for the changes to be recognized by Transact.

Configure the RecoStar Design Studio check box

Perform the following procedure to process check boxes using RecoStar Design Studio and Transact. The following figure is an example of a form with check boxes.

Sample Form for Checkbox Extraction

  1. Create a subform to help locate the desired text on the page for data extraction.

    Right-click the RecoOperators node in the form and select Add > Subform. Keep the default name.

  2. Expand the Subform node, then right-click FieldRegistration and select Insert > Regular Expression Search Field.

    A blue overlay is drawn over the document.

  3. Right-click an area inside the dashed blue overlay and select Draw Geometry.
  4. Click and drag to draw a box around a word near the fields you want to work with. In the following example, a box is drawn around Divorced to extract the Sex and Marital Status check boxes.

    Draw Geometry

  5. Right-click outside of the drawn box and select Draw Zone.
  6. Click and draw a larger box surrounding the general area around the blue box.

    This creates a red dashed box with a green line connecting it to the upper-left corner of the document.

  7. Resize the blue box around the text to reduce the amount of white space as shown in the following figure.

    For better results, match the size of the blue box to the text size in the pattern.

  8. Click the FieldRegistration node and in the Pattern field, enter Divorced.

    When RecoStar processes the document, it will use the top-left corner of the document as its first reference point, then look to the exact location of the red dashed area (relative to the top-left corner) and try to find something with a specific pattern inside the blue box. The red dashed area needs to be larger to allow the system to better account for shrinking or stretching when paper documents are printed, faxed, or scanned.

  9. Click the green arrow to run this project.

    RecoStar locates the value in the document.

  10. Create the check box definitions. Right-click the RecoOperators node inside the Subform node and click Add > Check Box Field.
  11. Change the name of the field to Sex.

    This field name needs to match the name of the index field in Transact.

  12. Resize the red dashed overlay to surround the Female and Male check boxes.
  13. Expand the Sex node, right-click HorizontalDistances and select Add > Checkbox Distance Description.
  14. Click in the preview pane and type M to open the measuring tool.
  15. Click the left edge of the Female check box, then drag to the left edge of the Male check box.

    The distance between the two left edges will be displayed at the top of the screen.

  16. Note the distance in millimeters. Click the CheckBoxDistanceDescription field and enter that value in the Distance field.

    Since we are only looking for one additional check box (after the first one), leave the Count field set to 1.

  17. Run the project again and view the results.

    In the following example, the result OX indicates that the first check box is cleared, and the second check box is selected.

    Test results

    Selected check boxes are represented with a capital X, cleared check boxes are represented with a capital O. This is the value that is passed to Transact once this project is integrated into your batch class.

When RecoStar processes this document, it will first locate the subform based on the reference point in the top-left corner and the FieldRegistration RegularExpressionSearchField operator, then it will look for the first check box.

If that first check box is found, it will then look for the second checkbox 20.57 mm to the right.

To extract a group of check boxes with the same distance apart, use the Count field to indicate how many additional check boxes should be read after the first one.

If the check boxes are not spaced evenly, use CheckBoxDistanceDescription properties operators to define the distances between the different check boxes. The following example shows the Sex check box group we just defined, plus the Marital Status check box group.

Larger Checkbox Group Results

Note how four CheckBoxDistanceDescription fields were needed to capture those check boxes because the distances between each check box were inconsistent.

This document has focused only on horizontal check boxes so far, but vertical check box groups can be processed in the same way using the CheckBoxDistanceDescription field under the VerticalDistances node as well. When measuring distances between vertical check boxes, measure from the top border of the top check box to the top border of the bottom check box.

Deploy the new project file to Transact

  1. Save your changes and copy the *.rsp project file to your batch class and document type folder as described earlier.
  2. Ensure that the RecoStar Extraction Switch option is set to ON inside the RECOSTAR_EXTRACTION plugin inside the Extraction module.
  3. To apply a RecoStar project to this document type for the first time, map the new project to the document type inside the Document Types > [Document Type Name] > Fixed Form Extraction interface.

Configure Transact check box

  1. Create new index fields in Transact with the same names (including capitalization) as the fields you created in your RecoStar Design Studio project.

    Check boxes are often best represented in Transact using the COMBO field type. Use semicolons to separate the different values in the Field Option Values List column.

  2. Create a batch instance with sample documents.

    Initial Checkbox Extraction Results

    The "XO" and "OXOOO" values represent the selected/cleared values coming from RecoStar. Use the Format Conversion feature or write an extraction script to convert X/O combinations into real values.

  3. Ensure the FORMAT_CONVERSION_PLUGIN plugin is added to the Extraction plugin and turned on.
  4. Navigate into your document type and into the Format Conversion interface.
  5. Select the Replace check box.
  6. Enter "XO" in the Replace text field and Female in the With field.
  7. Click the Validate Regex button.
  8. Click the plus sign button to add a second row. Repeat the previous steps to add "OX" and "Male".
  9. Click the Apply button to save your changes.

    Repeat the previous instructions for any other check box fields that you added in your RecoStar project.

  10. Restart your test batch at the Extraction module to view the new results.

    RecoStar project also transfers the coordinates of the check box group so that the check boxes are highlighted when the user moves through the fields in the Validation screen.

Signature detection

Signature detection is handled very similarly. A pixel count field is used in RecoStar Design Studio to measure the percentage of non-white pixels in a given area. A configurable threshold is used to determine what percentage is needed to result in a positive response.

Signature detection in RecoStar Design Studio

The sample patient registration form used in the previous steps of this tutorial has a signature field at the bottom. The first step to determine if a signature is present is to create a new subform to help locate the signature area.

  1. Follow the steps listed above to create a new subform with a Regular Expression Search Field operator for the Field Registration field.
  2. Click Set Reference Corner to select the location of the subform.

    In the figure below, the subform is located at the bottom of the document, so the selected reference corner is Lower Left.

  3. Note how the thin green line now connects the red and blue overlay to the bottom left corner of the document.

    Reference Corner Set to Lower Left

  4. Run the project and ensure that your new search field is found.
  5. Right-click the RecoOperators node in your new subform node and select Add > Pixel Count Field.
  6. Resize the red dashed box to surround the signature area.
  7. Rename the Pixel Count field with the same name that you want to use for the index field in Transact.

    Change the PixelMinRatio to a small value, such as 2. For example, if 2% of the pixels are black, a signature is present.

  8. Run the project and see what percentage is given for the new field.
  9. Run tests with a variety of sample documents.

    Adjust the PixelMinRatio property accordingly so that a positive value is returned when a signature is present.

    Larger search areas actually result in lower percentages. It is common to use a value of 2% or 3% to indicate that a signature is present. Fuzzy or dirty scans may introduce noise into signature areas that result in false positives.

  10. Depending on your use-case, you may want to adjust the following additional settings for best extraction results.
    • SyntaxMode: Specifies the expected content in the extraction zone, such as Alphanumeric, Numeric, or Amounts.

    • Font: Specifies whether the content in the extraction zone is Machine Print or Handprint.

    • HandprintHeight: The expected height of the handwritten characters.

    • HandprintPitch: The angle (slant) that a person would write at.

    • LogicalContext: A post-processing feature on the recognition results. It helps improve the extracted values by looking at the preceding values and making decisions based on context. For example, SMITH instead of SM1TH.

    • Patterns: The ability to use regular expressions to help extract the correct values.

  11. Save your changes and copy the updated *.rsp file into your batch class.

Configure the Signature Detection field in Transact

  1. Create a new index field with the same name as the pixel count operator in RecoStar Design Studio.

    Then, create a sample batch instance.

  2. When you run a test batch instance with the new pixel count field, the signature field displays as True or False in Transact.

    The highlighted area shows where the signature was detected in the document.

  3. To edit how the fields display, change the signature field to a drop-down list with the COMBO index field control.

    Refer to the Configure Transact check box section to use the Format Conversion feature.