Configure chunking settings for documents
You can configure the chunking settings for TotalAgility (Capture data) and non-TotalAgility documents. For TotalAgility documents, you can set chunking for specific document types.
-
Navigate to
.
The Chunking settings dialog box is displayed.
-
Configure the settings for
TotalAgility documents.
-
TotalAgility documents
-
Non TotalAgility documents
-
Specific document type
-
-
Click
Save.
-
When you execute the Add to knowledge base activity, the chunking settings configured for TotalAgility documents and non-TotalAgility documents are applied.
-
When you add a TotalAgility document to the AI knowledge base, chunking settings specific to that document type are applied. If there are no specific settings configured for the document type, the default settings configured for TotalAgility documents are applied.
- When you add a non-TotalAgility document to the AI knowledge base, the default chunking settings configured for non-TotalAgility documents are applied.
-
See also Add to knowledge base activity.
Chunking settings for TotalAgility documents
The following are the settings for the TotalAgility documents.
|
Setting |
Description |
||||||
|---|---|---|---|---|---|---|---|
|
Chunk type |
A method used to divide a document into smaller sections, known as chunks when adding it to the AI Knowledge base. Different chunking methods allow the system to handle content effectively based on its structure and requirements. Available chunking types are:
|
||||||
|
Chunk size |
Indicates the number of characters or words into which a document can be divided. This setting helps determine how much of the document should be processed at one time when it is added to a knowledge base. When documents are large, breaking them into smaller, manageable segments (or "chunks") enhances processing and information retrieval. (Default: 2000 characters and Minimum: 200) The chunk size setting is only available for a fixed size chunk type for TotalAgility documents. |
||||||
|
Overlap chunk |
Specifies the amount of content that is repeated between chunks to maintain context. Overlapping can help preserve context, particularly when the end of one chunk does not capture the beginning of important information in the next chunk. This is useful for tasks like text segmentation, where splitting text into chunks without overlap might lead to losing important context at the boundaries. (Default for section and page: 20% of number of characters in the chunk, and for fixed size: 10 % of number of characters in the chunk. An overlap value of 0 indicates that there will be no repeated content between the chunks. ) |
||||||
|
Microsoft Word |
|||||||
|
Chunk type |
The chunk type for the Microsoft Word document. Available options are: Section (default) and Fixed size. |
||||||
|
Chunk size |
The Chunk size setting is only available for a "Fixed size" chunk type for non TotalAgility Microsoft Word documents. (Default: 2000 characters and Minimum: 200) |
||||||
|
Overlap chunk |
The default overlap chunk percentage . (D for Sectionis 20% of number of characters in the chunk and for Fixed size is 10% of characters in the chunk.) |
||||||
Chunking settings for non-TotalAgility documents
Non-TotalAgility documents only support a "Fixed size" chunk type.
-
The default chunk size is 2000 characters and a minimum is 200 characters.
-
The default overlap chunk is 10% of number of characters in the chunk.
Chunking settings for a document type
You can configure the chunk settings for each type of Capture document. For example, you may want an invoice to be chunked differently than a property valuation.
-
Click
.
The Add document type chunk settings dialog box is displayed.
-
On the
Document type list, select a Capture document type to override the options for default Capture
data and Microsoft Word document .
The document types defined in the extraction group appear on the Document type list.
- Configure the settings as needed. See Chunking settings for TotalAgility documents.
-
Click
Save.
The document type is listed in the table. You can modify or delete the configured document type chunking settings.