Tungsten Extract Service Documentation

Tungsten Extract Service

The Tungsten Extract Service recognizes text on images and PDF documents using Optical Character Recognition (OCR).

OCR is a machine-learning technique used to recognize machine-printed text on images, such as forms, contracts, invoices, or other business documents. The text is extracted as words with coordinates relative to the page, essentially creating a digital version of the text that is searchable and selectable.

With Tungsten Extract Service, you can eliminate or significantly reduce the need for manual data entry using Tungsten Extract Service APIs.

Getting Started

Start with the Getting Started Guide that contains information about how to use Extract Service and provides example code for reference.

Release Notes

See the Tungsten Extract Service Release Notes to learn about the product features.

API Documentation ›

The Tungsten Extract Service provides several RESTful APIs that perform OCR on the submitted documents and tasks related to them.

For details on the APIs summarized below, see Tungsten Extract Service API Documentation.

Main APIs

Both of the main APIs take a document image as input and return JSON-formatted output containing the recognized text:

performSync: Recommended for smaller documents where the caller can wait for the OCR to complete.
performAsync: Recommended for larger documents where OCR needs to be asynchronous. The API immediately returns an operation ID that can be used with the getResult API to retrieve the result once it is ready.

Other APIs

languages: This API retrieves a list of supported languages so the caller can build a user interface (if needed) where the language for OCR can be selected.
licenseVolume: This API returns the remaining license volumes and the expiry date of the licenses.

Tungsten Extract Serviceversion 1.0.0