RecAPI
Classes | Typedefs | Functions
Document Classifier Module
KernelAPI

Document Classifier API. More...

Classes

struct   CLASSIFY_INFO
  Structure for information about classification. More...

Typedefs

typedef struct RECDCSTRUCT *  DCHANDLE
  Handle of a Document Classifier object.

Functions

RECERR RECAPIKRN  kRecOpenDCProject (int sid, LPCTSTR pDCProjectFile, DCHANDLE *phDCProject)
  Opening Document Classifier Project File.
RECERR RECAPIKRN  kRecCloseDCProject (DCHANDLE hDCProject)
  Closing a Document Classifier Project.
RECERR RECAPIKRN  kRecGetFirstDCClass (DCHANDLE hDCProject, DCHANDLE *phDCClass)
  Starting enumeration of Document Classes.
RECERR RECAPIKRN  kRecGetNextDCClass (DCHANDLE hDCPrevClass, DCHANDLE *phDCClass)
  Performing enumeration of Document Classes.
RECERR RECAPIKRN  kRecClassifyPage (int sid, DCHANDLE hDCProject, HPAGE hPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
  Classifying a page.
RECERR RECAPIKRN  kRecClassifyText (int sid, DCHANDLE hDCProject, LPCTSTR pText, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
  Classifying text.
RECERR RECAPIKRN  kRecClassifyDocument (int sid, DCHANDLE hDCProject, LPCTSTR pFileName, int iPage, DCHANDLE *phDCPredictedClass, unsigned *pConfidenceLevel, CLASSIFY_INFO **pClassifyInfo, LPLONG pLength, INTBOOL *pIsConfident)
  Classifying the given page of a document.
RECERR RECAPIKRN  kRecGetDCClassName (DCHANDLE hDCClass, LPTSTR *ppName)
  Returning the name of a Document Class.
RECERR RECAPIKRN  kRecSetDCConfidenceThreshold (DCHANDLE hDCProject, int ConfidenceThreshold)
  Set the confidence threshold of a Document Classifier Project.
RECERR RECAPIKRN  kRecGetDCConfidenceThreshold (DCHANDLE hDCProject, int *pConfidenceThreshold)
  Get the confidence threshold of a Document Classifier Project.

Detailed Description

Document Classifier API.

For detailed description of this module see its separated documentation https://docshield.kofax.com/OmniPageCaptureSDK/en_US/21.1.0-6ZFWM6PYMZ/help/OmniPageCapture_SDKdocumentclassificationassistant/c_Welcome.html.


Function Documentation

RECERR RECAPIKRN kRecClassifyDocument ( int  sid,
DCHANDLE  hDCProject,
LPCTSTR  pFileName,
int  iPage,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying the given page of a document.

This function classifies a document or the given page of the document. The document can contain scanned pages, one page from a PDF file or plain text.

Parameters:
[in] sid Settings Collection ID.
[in] hDCProject Handle of the Document Classifier Project returned by kRecOpenDCProject.
[in] pFileName Name of the file containing the document. It can be image file, PDF or text file.
[in] iPage The page number of the page to be processed. This parameter is not used if the input file is text file.
[out] phDCPredictedClass Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out] pConfidenceLevel Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out] pClassifyInfo Address of a variable to store info about classifying.
[out] pLength Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out] pIsConfident Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
This function decides if the input is an image file, PDF or text file, based on the filename extension. DC_UNKNOWNEXTENSION_ERR is returned if the extension is unknown.
If the input is an image file or PDF, the function loads and preprocesses it. If text based classification is enabled, the image is recognized as well.
If the input is a text file (i.e. the filename extension is .txt), only text based classification is possible. The program supports the following text encodings: Unicode (both UTF-16 and UTF-8, with or without Byte Order Mark) and non-Unicode text encoded with Windows default codepage (as set in the Control Panel > Region and Language > Administrative pane > Change system locale).
The function returns the handle of the predicted class, and the confidence of the prediction. You can query the name of the class with kRecGetDCClassName. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
The specification of this function in C# is:
 RECERR kRecClassifyDocument(int sid, IntPtr hDCProject, string pFileName, int nPage, out IntPtr phDCBestClass, out UInt32 confidence, out CLASSIFY_INFO[] pClassifyInfo, out bool isConfident); 
The specification of this function in Java is:
 int kRecClassifyDocument(int sid, DCHANDLE hDCProject, String pFileName, int iPage, DCHANDLE phDCPredictedClass, long[] pConfidenceLevel, ClassifyInfoArray pClassifyInfo, int[] pIsConfident) 
RECERR RECAPIKRN kRecClassifyPage ( int  sid,
DCHANDLE  hDCProject,
HPAGE  hPage,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying a page.

This function classifies the given HPAGE.

Parameters:
[in] sid Settings Collection ID.
[in] hDCProject Handle of the Document Classifier Project returned by kRecOpenDCProject.
[in] hPage Handle of the page to be classified.
[out] phDCPredictedClass Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out] pConfidenceLevel Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out] pClassifyInfo Address of a variable to store info about classifying.
[out] pLength Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out] pIsConfident Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
If the classifier method (defined in Document Classifier Project) is Text or Combined, the function recognizes the image unless the hPage contains letters at the entry. The langauge of the recognition is defined in the Document Classifier Project. Upon returning hPage contains the result of recognition (OCR zones, letters).
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
The specification of this function in C# is:
 RECERR kRecClassifyPage(int sid, IntPtr hDCProject, IntPtr hPage, out IntPtr bestClass, out UInt32 confidence, out CLASSIFY_INFO[] pClassifyInfos, out bool isClassfied); 
The specification of this function in Java is:
 int kRecClassifyPage(int sid, DCHANDLE hDCProject, HPAGE hPage, DCHANDLE phDCPredictedClass, long[] pConfidenceLevel, ClassifyInfoArray pClassifyInfo, int[] pIsConfident) 
RECERR RECAPIKRN kRecClassifyText ( int  sid,
DCHANDLE  hDCProject,
LPCTSTR  pText,
DCHANDLE phDCPredictedClass,
unsigned *  pConfidenceLevel,
CLASSIFY_INFO **  pClassifyInfo,
LPLONG  pLength,
INTBOOL *  pIsConfident 
)

Classifying text.

This function classifies the given text.

Parameters:
[in] sid Settings Collection ID.
[in] hDCProject Handle of the Document Classifier Project returned by kRecOpenDCProject.
[in] pText NULL terminated text to be classified.
[out] phDCPredictedClass Address of a variable to store the handle of the predicted Document Class. The returned handle can be NULL.
[out] pConfidenceLevel Address of a variable to store the confidence of the prediction. The returned value is between 0 and 100.
[out] pClassifyInfo Address of a variable to store info about classifying.
[out] pLength Address of a variable to store the length of the array returned in pClassifyInfo. This is equal to the number of classes.
[out] pIsConfident Address of a variable to return if the classification is confident. The returned value is TRUE if the confidence of the prediction is greater than or equal to the preset confidence threshold.
Return values:
RECERR
Note:
Use kRecOpenDCProject to open a Document Classifier Project File and obtain a handle.
The function returns the handle of the predicted class, and the confidence of the prediction. The function returns an array of CLASSIFY_INFO structures (pClassifyInfo). The length of the array is equal to the number of defined classes, and returned in pLength. The array contains the confidence levels for each class. The confidence threshold can be defined with Document Classifier Assistant. It is stored in the Document Classifier Project File, and can be queried (kRecGetDCConfidenceThreshold) and changed (kRecSetDCConfidenceThreshold) after the project is loaded.
The array returned in pClassifyInfo should be released using kRecFree.
The specification of this function in C# is:
 RECERR kRecClassifyText(int sid, IntPtr hDCProject, string pText, out IntPtr phDCBestClass, out UInt32 confidence, out CLASSIFY_INFO[] pClassifyInfo, out bool isConfident); 
The specification of this function in Java is:
 int kRecClassifyText(int sid, DCHANDLE hDCProject, String pText, DCHANDLE phDCPredictedClass, long[] pConfidenceLevel, ClassifyInfoArray pClassifyInfo, int[] pIsConfident) 
RECERR RECAPIKRN kRecCloseDCProject ( DCHANDLE  hDCProject )

Closing a Document Classifier Project.

This function closes a Document Classifier Project opened by kRecOpenDCProject.

Parameters:
[in] hDCProject Handle of the Document Classifier Project.
Return values:
RECERR
Note:
The specification of this function in C# is:
 RECERR kRecCloseDCProject(IntPtr hDCProject); 
The specification of this function in Java is:
 int kRecCloseDCProject(DCHANDLE hDCProject) 
RECERR RECAPIKRN kRecGetDCClassName ( DCHANDLE  hDCClass,
LPTSTR *  ppName 
)

Returning the name of a Document Class.

This function returns the name of a Document Class.

Parameters:
[in] hDCClass Handle of the Document Class.
[out] ppName Address of a variable to store the name of the Document Class.
Return values:
RECERR
Note:
Use this function to obtain the name of the Document Class.
The specification of this function in C# is:
 RECERR kRecGetDCClassName(IntPtr hDCClass, out string ppName) 
The specification of this function in Java is:
 int kRecGetDCClassName(DCHANDLE hDCClass, String[] ppName) 
RECERR RECAPIKRN kRecGetDCConfidenceThreshold ( DCHANDLE  hDCProject,
int *  pConfidenceThreshold 
)

Get the confidence threshold of a Document Classifier Project.

The kRecGetDCConfidenceThreshold returns the confidence threshold of the given Document Classifier Project.

Parameters:
[in] hDCProject Handle of the Document Classifier Project returned by kRecOpenDCProject.
[out] pConfidenceThreshold Address of an integer variable to get the confidence threshold.
Note:
The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed after the Document Classifier Project File is loaded.
The specification of this function in C# is:
 RECERR kRecGetDCConfidenceThreshold(IntPtr hDCProject, out int ConfidenceThreshold); 
The specification of this function in Java is:
 int kRecGetDCConfidenceThreshold(DCHANDLE hDCProject, int[] pConfidenceThreshold) 
RECERR RECAPIKRN kRecGetFirstDCClass ( DCHANDLE  hDCProject,
DCHANDLE phDCClass 
)

Starting enumeration of Document Classes.

This function returns the handle of the first Document Class of the given project.

Parameters:
[in] hDCProject Handle of the Document Classifier Project.
[out] phDCClass Address of a variable to store the handle of the first Document Class.
Return values:
RECERR
Note:
The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
The name of the class can be queried by kRecGetDCClassName().
The specification of this function in C# is:
 RECERR kRecGetFirstDCClass(IntPtr hDCProject, out IntPtr hDCClass); 
The specification of this function in Java is:
 int kRecGetFirstDCClass(DCHANDLE hDCProject, DCHANDLE phDCClass) 
RECERR RECAPIKRN kRecGetNextDCClass ( DCHANDLE  hDCPrevClass,
DCHANDLE phDCClass 
)

Performing enumeration of Document Classes.

This function returns the handle of the next Document Class of the given project.

Parameters:
[in] hDCPrevClass Handle of the previous Document Class.
[out] phDCClass Address of a variable to store the handle of the next Document Class.
Return values:
RECERR
Note:
The Document Classes can be queried using the kRecGetFirstDCClass and kRecGetNextDCClass function-pair.
The name of the class can be queried by kRecGetDCClassName().
The specification of this function in C# is:
 RECERR kRecGetNextDCClass(IntPtr hDCPrevClass, out IntPtr hDCClass); 
The specification of this function in Java is:
 int kRecGetNextDCClass(DCHANDLE hDCPrevClass, DCHANDLE phDCClass) 
RECERR RECAPIKRN kRecOpenDCProject ( int  sid,
LPCTSTR  pDCProjectFile,
DCHANDLE phDCProject 
)

Opening Document Classifier Project File.

The kRecOpenDCProject opens a Document Classifier Project File (*.dcp).

Parameters:
[in] sid Settings Collection ID.
[in] pDCProjectFile Path to the Project File.
[out] phDCProject Address of a variable to store the handle of the Document Classifier Project.
Return values:
RECERR
Note:
Use the Document Classifier Assistant to create, train and test a Document Classifier Project. Document Classifier Assistant lets you define classes, add training and test documents to the classes, train and test the document classifier. After Training and Testing Process you can export a Document Classifier Project File, which contains all the necessary information to perform classification. CSDK provides API (Document Classifier API) for loading the Document Classifier Project File and classify documents.
If the project is no longer needed it should be closed by invoking the kRecCloseDCProject function.
The specification of this function in C# is:
 RECERR kRecOpenDCProject(int sid, string pDCProjectFile, out IntPtr hDCProject); 
The specification of this function in Java is:
 int kRecOpenDCProject(int sid, String pDCProjectFile, DCHANDLE hDCProject) 
RECERR RECAPIKRN kRecSetDCConfidenceThreshold ( DCHANDLE  hDCProject,
int  ConfidenceThreshold 
)

Set the confidence threshold of a Document Classifier Project.

The kRecSetDCConfidenceThreshold sets the confidence threshold of the given Document Classifier Project.

Parameters:
[in] hDCProject Handle of the Document Classifier Project returned by kRecOpenDCProject.
[in] ConfidenceThreshold The value of the current confidence threshold;
Note:
The confidence threshold is a number between 0 and 100. It can be set with Document Classifier Assistant during the Training and Testing Process, and stored in Document Classifier Project File. The threshold can be queried and changed during the after the Document Classifier Project File is loaded.
The specification of this function in C# is:
 RECERR kRecSetDCConfidenceThreshold(IntPtr hDCProject, int ConfidenceThreshold); 
The specification of this function in Java is:
 int kRecSetDCConfidenceThreshold(DCHANDLE hDCProject, int ConfidenceThreshold)