Classification by graphical lines
Some documents just cannot be classified by layout or content, because they do not contain a typical layout or content. This script sample shows how to classify a document by graphical lines. This might be useful for scanned charts or printed diagrams, containing a grid pattern as a background. The script calculates the number of vertical and horizontal lines on the image. Depending on some thresholds these numbers are used to make the classification decision.
The function DetectGraphicLines works on the first 3 pages of a document. Internally it calls DetectGraphicLinesOnPage, which works on a single page.
It can be called in the AfterClassifyXDoc event. It is assumed that "Charts" is a valid class name of the actual project and that the reclassification is confident, so the confidence is set to 1.
A reference to Kofax Cascade Forms Processing 2.0 must be added to the script sheet where that function is implemented.
' Project Script
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
If DetectGraphicLines(pXDoc) = True Then
pXDoc.Reclassify("Charts", 1.0)
Exit Sub
End If
'...
End Sub
Private Function DetectGraphicLines(pXDoc As CASCADELib.CscXDocument) As Boolean
Dim i As Long
Dim count As Long
Dim bResult As Boolean
' search for hor. and vertical lines on the first 3 pages only
count = pXDoc.CDoc.Pages.Count
If count > 3 Then
count = 3
End If
For i = 0 To count - 1
' if we detect enough graphic lines on any of the first 3 pages, return TRUE
bResult = DetectGraphicLinesOnPage(pXDoc.CDoc.Pages(i).GetBitonalImage(Project.ColorConversion))
If bResult = True Then
DetectGraphicLines = True
Exit Function
End If
Next i
DetectGraphicLines = False
End Function
Private Function DetectGraphicLinesOnPage(pImage As CscImage) As Boolean
' counts horizontal and vertical lines on a page
' this is used to detect class "Zeichnungen"
Dim pLinesDetection As CscLinesDetection
Dim xLeft As Long
Dim xWidth As Long
Dim yTop As Long
Dim yHeight As Long
' check color format
If pImage.BitsPerSample <> 1 Or pImage.SamplesPerPixel <> 1 Then
DetectGraphicLinesOnPage = False
Exit Function
End If
Set pLinesDetection = New CscLinesDetection
' setup parameters for lines detection
pLinesDetection.DetectHorCombs = False
pLinesDetection.DetectHorDotLines = False
pLinesDetection.DetectHorLines = True
pLinesDetection.DetectVerLines = True
pLinesDetection.MinHorLineLenMM = 40
pLinesDetection.MinVerLineLenMM = 40
' start lines detection, skip a border of 5%
xLeft = pImage.Width * 0.05
xWidth = pImage.Width * 0.9
yTop = pImage.Height * 0.05
yHeight = pImage.Height * 0.9
pLinesDetection.DetectLines pImage, xLeft, yTop, xWidth, yHeight
' we require more than 8 hor. and vertical lines to return TRUE
If (pLinesDetection.HorLineCount > 8 And pLinesDetection.VerLineCount > 8) Then
DetectGraphicLinesOnPage = True
Else
DetectGraphicLinesOnPage = False
End If
End Function