File Boundary Classification
This feature lets you define how the document assembler plugin can manipulate the separation results in different ways based on the File boundaries. A property DA File Name Boundary Classification determines the strategy for file boundary classification which operates over the separation results generated by the DA algorithm.
The three types of File Boundary Classification strategies are:
- UseDAGeneratedDocument: The classification results generated by the DA algorithm are used without any changes.
-
MergeDocumentsBelongingToSameFile: All pages belonging to a single multi-page input file must be a
part of the same document. If the results generated by the DA algorithm contain a document that contains pages belonging to two
different input files, a new document is generated at the file boundary. If pages belonging to a single input file are classified
into two or more documents, the documents are merged such that all the pages from the source file are part of one document only.
For example:
Original file name
Broken file name
Page ID
Doc ID as generated by DA
Result
File1.tiff
File1-0001.tiff
PG0
DOC1
DOC1
File1-0002.tiff
PG1
DOC1
DOC1
File1-0003.tiff
PG2
DOC2
DOC1
File1-0004.tiff
PG3
DOC2
DOC1
File2.tiff
File2.tiff
PG4
DOC2
DOC2
File3.tiff
File3-0001.tiff
PG5
DOC3
DOC3
File3-0002.tiff
PG6
DOC3
DOC3
Original file name
Broken file name
Page ID
Doc ID as generated by DA
Result
File1.tiff
File1-0001.tiff
PG0
DOC1
DOC1
File1-0002.tiff
PG1
DOC2
DOC1
File1-0003.tiff
PG2
DOC3
DOC1
File1-0004.tiff
PG3
DOC4
DOC1
File2.tiff
File2.tiff
PG4
DOC5
DOC2
File3.tiff
File3-0001.tiff
PG5
DOC6
DOC3
File3-0002.tiff
PG6
DOC7
DOC3
-
CreateNewDocumentForDifferentFile: If a document consists of pages belonging to two different input
files, a new document is created at the file boundary. Pages belonging to a source file may get separated into multiple documents
but no such document should span across another input file.
For example:
Original file name
Broken file name
Page ID
Doc ID as generated by DA
Result
File1.tiff
File1-0001.tiff
PG0
DOC1
DOC1
File1-0002.tiff
PG1
DOC1
DOC1
File1-0003.tiff
PG2
DOC2
DOC2
File1-0004.tiff
PG3
DOC2
DOC2
File2.tiff
File2.tiff
PG4
DOC2
DOC3
File3.tiff
File3-0001.tiff
PG5
DOC3
DOC4
File3-0002.tiff
PG6
DOC3
DOC4
Original file name
Broken file name
Page ID
Doc ID as generated by DA
Result
File1.tiff
File1-0001.tiff
PG0
DOC1
DOC1
File1-0002.tiff
PG1
DOC2
DOC2
File1-0003.tiff
PG2
DOC3
DOC3
File1-0004.tiff
PG3
DOC4
DOC4
File2.tiff
File2.tiff
PG4
DOC5
DOC5
File3.tiff
File3-0001.tiff
PG5
DOC6
DOC6
File3-0002.tiff
PG6
DOC7
DOC7