File Boundary Classification

This feature lets you define how the document assembler plugin can manipulate the separation results in different ways based on the File boundaries. A property DA File Name Boundary Classification determines the strategy for file boundary classification which operates over the separation results generated by the DA algorithm.

The three types of File Boundary Classification strategies are:

  • UseDAGeneratedDocument: The classification results generated by the DA algorithm are used without any changes.
  • MergeDocumentsBelongingToSameFile: All pages belonging to a single multi-page input file must be a part of the same document. If the results generated by the DA algorithm contain a document that contains pages belonging to two different input files, a new document is generated at the file boundary. If pages belonging to a single input file are classified into two or more documents, the documents are merged such that all the pages from the source file are part of one document only.

    For example:

    Original file name

    Broken file name

    Page ID

    Doc ID as generated by DA

    Result

    File1.tiff

    File1-0001.tiff

    PG0

    DOC1

    DOC1

    File1-0002.tiff

    PG1

    DOC1

    DOC1

    File1-0003.tiff

    PG2

    DOC2

    DOC1

    File1-0004.tiff

    PG3

    DOC2

    DOC1

    File2.tiff

    File2.tiff

    PG4

    DOC2

    DOC2

    File3.tiff

    File3-0001.tiff

    PG5

    DOC3

    DOC3

    File3-0002.tiff

    PG6

    DOC3

    DOC3

    Original file name

    Broken file name

    Page ID

    Doc ID as generated by DA

    Result

    File1.tiff

    File1-0001.tiff

    PG0

    DOC1

    DOC1

    File1-0002.tiff

    PG1

    DOC2

    DOC1

    File1-0003.tiff

    PG2

    DOC3

    DOC1

    File1-0004.tiff

    PG3

    DOC4

    DOC1

    File2.tiff

    File2.tiff

    PG4

    DOC5

    DOC2

    File3.tiff

    File3-0001.tiff

    PG5

    DOC6

    DOC3

    File3-0002.tiff

    PG6

    DOC7

    DOC3

  • CreateNewDocumentForDifferentFile: If a document consists of pages belonging to two different input files, a new document is created at the file boundary. Pages belonging to a source file may get separated into multiple documents but no such document should span across another input file.

    For example:

    Original file name

    Broken file name

    Page ID

    Doc ID as generated by DA

    Result

    File1.tiff

    File1-0001.tiff

    PG0

    DOC1

    DOC1

    File1-0002.tiff

    PG1

    DOC1

    DOC1

    File1-0003.tiff

    PG2

    DOC2

    DOC2

    File1-0004.tiff

    PG3

    DOC2

    DOC2

    File2.tiff

    File2.tiff

    PG4

    DOC2

    DOC3

    File3.tiff

    File3-0001.tiff

    PG5

    DOC3

    DOC4

    File3-0002.tiff

    PG6

    DOC3

    DOC4

    Original file name

    Broken file name

    Page ID

    Doc ID as generated by DA

    Result

    File1.tiff

    File1-0001.tiff

    PG0

    DOC1

    DOC1

    File1-0002.tiff

    PG1

    DOC2

    DOC2

    File1-0003.tiff

    PG2

    DOC3

    DOC3

    File1-0004.tiff

    PG3

    DOC4

    DOC4

    File2.tiff

    File2.tiff

    PG4

    DOC5

    DOC5

    File3.tiff

    File3-0001.tiff

    PG5

    DOC6

    DOC6

    File3-0002.tiff

    PG6

    DOC7

    DOC7