Document Classifier API.
More...
|
typedef struct RECDCSTRUCT * | DCHANDLE |
| Handle of a Document Classifier object.
|
|
|
RECERR RECAPIKRN | kRecOpenDCProject (int sid, LPCTSTR pDCProjectFile, DCHANDLE *phDCProject) |
| Opening Document Classifier Project File.
|
|
RECERR RECAPIKRN | kRecCloseDCProject (DCHANDLE hDCProject) |
| Closing a Document Classifier Project.
|
|
RECERR RECAPIKRN | kRecGetFirstDCClass (DCHANDLE hDCProject, DCHANDLE *phDCClass) |
| Starting enumeration of Document Classes.
|
|
RECERR RECAPIKRN | kRecGetNextDCClass (DCHANDLE hDCPrevClass, DCHANDLE *phDCClass) |
| Performing enumeration of Document Classes.
|
|
RECERR RECAPIKRN | kRecClassifyPage (int sid, DCHANDLE hDCProject, HPAGE hPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident) |
| Classifying a page.
|
|
RECERR RECAPIKRN | kRecClassifyText (int sid, DCHANDLE hDCProject, LPCTSTR pText, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident) |
| Classifying text.
|
|
RECERR RECAPIKRN | kRecClassifyDocument (int sid, DCHANDLE hDCProject, LPCTSTR pFileName, int iPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident) |
| Classifying the given page of a document.
|
|
RECERR RECAPIKRN | kRecGetDCClassName (DCHANDLE hDCClass, LPTSTR *ppName) |
| Returning the name of a Document Class.
|
|
RECERR RECAPIKRN | kRecSetDCConfidenceThreshold (DCHANDLE hDCProject, int ConfidenceThreshold) |
| Set the confidence threshold of a Document Classifier Project.
|
|
RECERR RECAPIKRN | kRecGetDCConfidenceThreshold (DCHANDLE hDCProject, int *pConfidenceThreshold) |
| Get the confidence threshold of a Document Classifier Project.
|
|
Document Classifier API.
For detailed description of this module see its separated documentation https://docshield.tungstenautomation.com/OmniPageCaptureSDK/en_US/2025.1.0-m7NwYtqyAo/help/OmniPageCapture_SDKdocumentclassificationassistant/c_Welcome.html.
◆ kRecClassifyDocument()
RECERR RECAPIKRN kRecClassifyDocument |
( |
int | sid, |
|
|
DCHANDLE | hDCProject, |
|
|
LPCTSTR | pFileName, |
|
|
int | iPage, |
|
|
DCHANDLE * | phDCPredictedClass, |
|
|
unsigned * | pConfidenceLevel, |
|
|
CLASSIFY_INFO ** | pClassifyInfo, |
|
|
LPLONG | pLength, |
|
|
INTBOOL * | pIsConfident ) |
Classifying the given page of a document.
This function classifies a document or the given page of the document. The document can contain scanned pages, one page from a PDF file or plain text.
- Parameters
-
[in] | sid | Settings Collection ID. |
[in] | hDCProject | Handle of the Document Classifier Project returned by kRecOpenDCProject. |
[in] | pFileName | Name of the file containing the document. It can be image file, PDF or text file. |
[in] | iPage | The page number of the page to be processed. This parameter is not used if the input file is text file. |
[out] | phDCPredictedClass | Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL. |
[out] | pConfidenceLevel | Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100. |
[out] | pClassifyInfo | Address of a variable to store info about classifying. |
[out] | pLength | Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes. |
[out] | pIsConfident | Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold. |
- Return values
-
- Note
- Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
-
This function decides if the input is an image file, PDF or text file, based on the filename extension. DC_UNKNOWNEXTENSION_ERR is returned if the extension is unknown.
-
If the input is an image file or PDF, the function loads and preprocesses it. If text based classification is enabled, the image is recognized as well.
-
If the input is a text file (i.e. the filename extension is .txt), only text based classification is possible. The program supports the following text encodings: Unicode (both UTF-16 and UTF-8, with or without Byte Order Mark) and non-Unicode text encoded with Windows default codepage (as set in the Control Panel > Region and Language > Administrative pane > Change system locale).
-
The function returns the handle of the predicted class, and the confidence of the prediction. You can query the name of the class with kRecGetDCClassName. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
-
The array returned in pClassifyInfo should be released using kRecFree.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecClassifyDocument(int sid, DCHANDLE hDCProject, LPCTSTR pFileName, int iPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
Classifying the given page of a document.
RECERR
Error codes.
Definition RECERR_doc.h:19
Structure for information about classification.
Definition KernelApi.h:13653
-
The specification of this function in Java is:
int kRecClassifyDocument(
int sid,
DCHANDLE hDCProject, String pFileName,
int iPage,
DCHANDLE phDCPredictedClass,
long[] pConfidenceLevel, ClassifyInfoArray pClassifyInfo,
int[] pIsConfident)
struct RECDCSTRUCT * DCHANDLE
Handle of a Document Classifier object.
Definition KernelApi.h:13646
-
The specification of this function in Python is:
def
kRecClassifyDocument(sid: int, hDCProject:
"DCHANDLE", pFileName: str, iPage: int) -> Tuple[int,
"DCHANDLE", int,
"ClassifyInfoArray", bool]
◆ kRecClassifyPage()
RECERR RECAPIKRN kRecClassifyPage |
( |
int | sid, |
|
|
DCHANDLE | hDCProject, |
|
|
HPAGE | hPage, |
|
|
DCHANDLE * | phDCPredictedClass, |
|
|
unsigned * | pConfidenceLevel, |
|
|
CLASSIFY_INFO ** | pClassifyInfo, |
|
|
LPLONG | pLength, |
|
|
INTBOOL * | pIsConfident ) |
Classifying a page.
This function classifies the given HPAGE.
- Parameters
-
[in] | sid | Settings Collection ID. |
[in] | hDCProject | Handle of the Document Classifier Project returned by kRecOpenDCProject. |
[in] | hPage | Handle of the page to be classified. |
[out] | phDCPredictedClass | Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL. |
[out] | pConfidenceLevel | Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100. |
[out] | pClassifyInfo | Address of a variable to store info about classifying. |
[out] | pLength | Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes. |
[out] | pIsConfident | Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold. |
- Return values
-
- Note
- If the classifier method (defined in Document Classifier Project) is Text or Combined, the function recognizes the image unless the hPage contains letters at the entry. The langauge of the recognition is defined in the Document Classifier Project. Upon returning hPage contains the result of recognition (OCR zones, letters).
-
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
-
The array returned in pClassifyInfo should be released using kRecFree.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecClassifyPage(int sid, DCHANDLE hDCProject, HPAGE hPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
Classifying a page.
-
The specification of this function in Java is:
struct RECPAGESTRUCT * HPAGE
Handle of a page in memory.
Definition KernelApi.h:289
-
The specification of this function in Python is:
def
kRecClassifyPage(sid: int, hDCProject:
"DCHANDLE", hPage:
"HPAGE") -> Tuple[int,
"DCHANDLE", int,
"ClassifyInfoArray", bool]
◆ kRecClassifyText()
RECERR RECAPIKRN kRecClassifyText |
( |
int | sid, |
|
|
DCHANDLE | hDCProject, |
|
|
LPCTSTR | pText, |
|
|
DCHANDLE * | phDCPredictedClass, |
|
|
unsigned * | pConfidenceLevel, |
|
|
CLASSIFY_INFO ** | pClassifyInfo, |
|
|
LPLONG | pLength, |
|
|
INTBOOL * | pIsConfident ) |
Classifying text.
This function classifies the given text.
- Parameters
-
[in] | sid | Settings Collection ID. |
[in] | hDCProject | Handle of the Document Classifier Project returned by kRecOpenDCProject. |
[in] | pText | NULL terminated text to be classified. |
[out] | phDCPredictedClass | Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL. |
[out] | pConfidenceLevel | Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100. |
[out] | pClassifyInfo | Address of a variable to store info about classifying. |
[out] | pLength | Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes. |
[out] | pIsConfident | Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold. |
- Return values
-
- Note
- Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
-
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
-
The array returned in pClassifyInfo should be released using kRecFree.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecClassifyText(int sid, DCHANDLE hDCProject, LPCTSTR pText, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
Classifying text.
-
The specification of this function in Java is:
int kRecClassifyText(
int sid,
DCHANDLE hDCProject, String pText,
DCHANDLE phDCPredictedClass,
long[] pConfidenceLevel, ClassifyInfoArray pClassifyInfo,
int[] pIsConfident)
-
The specification of this function in Python is:
def
kRecClassifyText(sid: int, hDCProject:
"DCHANDLE", pText: str) -> Tuple[int,
"DCHANDLE", int,
"ClassifyInfoArray", bool]
◆ kRecCloseDCProject()
Closing a Document Classifier Project.
This function closes a Document Classifier Project opened by kRecOpenDCProject.
- Parameters
-
[in] | hDCProject | Handle of the Document Classifier Project. |
- Return values
-
- Note
- The specification of this function in C# is:
RECERR RECAPIKRN kRecCloseDCProject(DCHANDLE hDCProject)
Closing a Document Classifier Project.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecGetDCClassName()
RECERR RECAPIKRN kRecGetDCClassName |
( |
DCHANDLE | hDCClass, |
|
|
LPTSTR * | ppName ) |
Returning the name of a Document Class.
This function returns the name of a Document Class.
- Parameters
-
[in] | hDCClass | Handle of the Document Class. |
[out] | ppName | Address of a variable to store the name of the Document Class. |
- Return values
-
- Note
- Use this function to obtain the name of the Document Class.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecGetDCClassName(DCHANDLE hDCClass, LPTSTR *ppName)
Returning the name of a Document Class.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecGetDCConfidenceThreshold()
RECERR RECAPIKRN kRecGetDCConfidenceThreshold |
( |
DCHANDLE | hDCProject, |
|
|
int * | pConfidenceThreshold ) |
Get the confidence threshold of a Document Classifier Project.
The kRecGetDCConfidenceThreshold returns the confidence threshold of the given Document Classifier Project.
- Parameters
-
[in] | hDCProject | Handle of the Document Classifier Project returned by kRecOpenDCProject. |
[out] | pConfidenceThreshold | Address of an integer variable to get the confidence threshold. |
- Note
- The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed after the Document Classifier Project File is loaded.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecGetDCConfidenceThreshold(DCHANDLE hDCProject, int *pConfidenceThreshold)
Get the confidence threshold of a Document Classifier Project.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecGetFirstDCClass()
Starting enumeration of Document Classes.
This function returns the handle of the first Document Class of the given project.
- Parameters
-
[in] | hDCProject | Handle of the Document Classifier Project. |
[out] | phDCClass | Address of a variable to store the handle of the first Document Class. |
- Return values
-
- Note
- The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
-
The name of the class can be queried by kRecGetDCClassName().
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecGetFirstDCClass(DCHANDLE hDCProject, DCHANDLE *phDCClass)
Starting enumeration of Document Classes.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecGetNextDCClass()
Performing enumeration of Document Classes.
This function returns the handle of the next Document Class of the given project.
- Parameters
-
[in] | hDCPrevClass | Handle of the previous Document Class. |
[out] | phDCClass | Address of a variable to store the handle of the next Document Class. |
- Return values
-
- Note
- The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
-
The name of the class can be queried by kRecGetDCClassName().
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecGetNextDCClass(DCHANDLE hDCPrevClass, DCHANDLE *phDCClass)
Performing enumeration of Document Classes.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecOpenDCProject()
RECERR RECAPIKRN kRecOpenDCProject |
( |
int | sid, |
|
|
LPCTSTR | pDCProjectFile, |
|
|
DCHANDLE * | phDCProject ) |
Opening Document Classifier Project File.
The kRecOpenDCProject opens a Document Classifier Project File (*.dcp).
- Parameters
-
[in] | sid | Settings Collection ID. |
[in] | pDCProjectFile | Path to the Project File. |
[out] | phDCProject | Address of a variable to store the handle of the Document Classifier Project. |
- Return values
-
- Note
- Use the Document Classifier Assistant to create, train and test a Document Classifier Project. Document Classifier Assistant lets you define classes, add training and test documents to the classes, train and test the document classifier. After Training and Testing Process you can export a Document Classifier Project File, which contains all the necessary information to perform classification. CSDK provides API (Document Classifier API) for loading the Document Classifier Project File and classify documents.
-
If the project is no longer needed it should be closed by invoking the kRecCloseDCProject function.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecOpenDCProject(int sid, LPCTSTR pDCProjectFile, DCHANDLE *phDCProject)
Opening Document Classifier Project File.
-
The specification of this function in Java is:
-
The specification of this function in Python is:
◆ kRecSetDCConfidenceThreshold()
RECERR RECAPIKRN kRecSetDCConfidenceThreshold |
( |
DCHANDLE | hDCProject, |
|
|
int | ConfidenceThreshold ) |
Set the confidence threshold of a Document Classifier Project.
The kRecSetDCConfidenceThreshold sets the confidence threshold of the given Document Classifier Project.
- Parameters
-
[in] | hDCProject | Handle of the Document Classifier Project returned by kRecOpenDCProject. |
[in] | ConfidenceThreshold | The value of the current confidence threshold; |
- Note
- The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed during the after the Document Classifier Project File is loaded.
-
The specification of this function in C# is:
RECERR RECAPIKRN kRecSetDCConfidenceThreshold(DCHANDLE hDCProject, int ConfidenceThreshold)
Set the confidence threshold of a Document Classifier Project.
-
The specification of this function in Java is:
-
The specification of this function in Python is: