Click or drag to resize

PdfTextDocument Class

Represents a PDF document for text processing operations.
Inheritance Hierarchy
SystemObject
  Atalasoft.Pdf.TextExtractPdfTextDocument

Namespace:  Atalasoft.Pdf.TextExtract
Assembly:  Atalasoft.dotImage.PdfReader (in Atalasoft.dotImage.PdfReader.dll) Version: 11.4.0.9.0.377 (.NET 4.5.2, x86)
Syntax
public class PdfTextDocument : IDisposable, 
	ITextDocument

The PdfTextDocument type exposes the following members.

Constructors
  NameDescription
Public methodPdfTextDocument(Stream)
Initalizes a new instance of the PdfTextDocument class.
Public methodPdfTextDocument(String)
Initalizes a new instance of the PdfTextDocument class.
Public methodPdfTextDocument(Stream, String)
Initalizes a new instance of the PdfTextDocument class.
Public methodPdfTextDocument(String, String)
Initalizes a new instance of the PdfTextDocument class.
Top
Properties
  NameDescription
Public propertyExtractionGranularity
Gets the extraction granularity provided by this document.
Public propertyOutputLineEnd
Gets or sets a flag indicating whether line end symbols should returned by PdfTextReader.
Public propertyOutputPageEnd
Gets or sets a flag indicating whether a page end symbol should returned by PdfTextReader.
Public propertyPageCount
Gets the document page count.
Top
Methods
  NameDescription
Public methodDispose
Releases all resources used by the PdfTextDocument
Protected methodDispose(Boolean)
Releases the unmanaged resources used by the PdfTextDocument and optionally releases the managed resources
Protected methodDisposePages
Disposes all pages in a cache.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetPage
Retrieves a PdfTextPage from document.
Public methodGetPdfTextReader
Creates a new PdfTextReader for all pages in document.
Public methodGetPdfTextReader(Int32)
Creates a new PdfTextReader for specified page.
Public methodGetPdfTextReader(Int32, Int32)
Creates a new PdfTextReader for specified pages range.
Public methodGetTextPage
Gets the ITextPage at the specified index.
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Protected methodStatic memberInitialize Obsolete.
Protected methodMakePages
Initializes a pages cache.
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Examples
Extracting Text from a PDF (C#)
using Atalasoft.Pdf.TextExtract;
using System.Drawing;
using System.IO;

// some examples of using the classes in the
// Atalasoft.Pdf.TextExtract namespace

// get the number of pages in a PDF
public int GetPageCount(Stream s)
{
    using (PdfTextDocument doc = new PdfTextDocument(s)) 
    {
        return doc.PageCount;
    }
}

// get the number of characters on a page in a PDF
public int GetCharCount(Stream s, int pageNum)
{
    using (PdfTextDocument doc = new PdfTextDocument(s)) 
    {
        PdfTextPage textPage = doc.GetPage(pageNum);                    
        return textPage.CharCount;
    }
}

// Extract Text from a PDF
public String GetText(Stream s, int pageNum, int index, int count)
{
    using (PdfTextDocument doc = new PdfTextDocument(s)) 
    {
        PdfTextPage textPage = doc.GetPage(pageNum);                    
        return textPage.GetText(index, count);
    }
}

// Find out where a character is in a PDF
public PointF GetCharPos(Stream s, int pageNum, int index)
{
    using (PdfTextDocument doc = new PdfTextDocument(s)) 
    {
        PdfTextPage textPage = doc.GetPage(pageNum);                    
        return textPage.CharOrigin(index);
    }
}
Extracting Text from a PDF (Visual Basic)
Imports System.Drawing
Imports System.IO
Imports Atalasoft.Pdf.TextExtract

' some examples of using the classes In the
' Atalasoft.Pdf.TextExtract Namespace


' get the number of pages in a PDF
Public Function GetPageCount(ByVal s As Stream) As Integer
    Using doc As New PdfTextDocument(s)
        Return doc.PageCount
    End Using
End Function

' get the number of characters on a page in a PDF
Public Function GetCharCount(ByVal s As Stream, ByVal pageNum As Integer) As Integer
    Using doc As New PdfTextDocument(s)
        Dim textPage As PdfTextPage = doc.GetPage(pageNum)
        Return textPage.CharCount
    End Using
End Function

' Extract Text from a PDF
Public Function GetText(ByVal s As Stream, ByVal pageNum As Integer, ByVal index As Integer, ByVal count As Integer) As String
    Using doc As New PdfTextDocument(s)
        Dim textPage As PdfTextPage = doc.GetPage(pageNum)
        Return textPage.GetText(index, count)
    End Using
End Function

' Find out where a character is in a PDF
Public Function GetCharPos(ByVal s As Stream, ByVal pageNum As Integer, ByVal index As Integer) As PointF
    Using doc As New PdfTextDocument(s)
        Dim textPage As PdfTextPage = doc.GetPage(pageNum)
        Return textPage.CharOrigin(index)
    End Using
End Function
See Also