Public
Documentation Settings

extracttable.com

ExtractTable API allows extracting structured tabular data from images or PDF files. The motivation is to make it easier for developers to not worry about the table area, column or row coordinates, rotation et al in the input.

If you are a Python Developer check out the official pip library ExtractTable-py - takes only 3 lines of code for the output

image

image

python
from ExtractTable import *
et_sess = ExtractTable(api_key=YOUR_API_KEY)    # Replace your VALID API Key here
print(et_sess.check_usage())                    # Validates API Key & show credits usage 
table_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format="df")

Prerequisite

The service is authenticated with an API key, grab some FREE credits here for trial or most welcome to BUY credits here. ExtractTable.com offers the best accuracy, lowest $/credit with longest validity and credit refund on bad outputs(literally, no one does this). For any information needed/assistance/troubleshooting feel free to email me at saradhi@extracttable.com. A detailed competitive comparison here.

Authentication

How/Where to use the API Key?

  • For API usage - must be passed through headers as "x-api-key" in every request
  • For Web usage - go to Web-PRO and provide the API Key when prompted

API Request - Common mistakes

Firstly, note that there are 3 different API endpoints - all endpoints must need a valid API Key passed in header as “x-api-key”

UsageEndpointRequest MethodsMust containCommon Mistakes
Validate API Keyhttps://validator.extracttable.com/GETa valid API Key passed in the request header as “x-api-key”missing API key in request headers
Trigger a requesthttps://trigger.extracttable.com/POSTfile data (not file location) in body as “input” of multipart/form-datarequesting as x-www-form-urlencoded
Retrieve the resulthttps://getresult.extracttable.com/GETJobId- recieved in the trigger responseInvalid JobId / POST instead of GET

What are the common error codes & How to resolve?

CodeDescriptionProTip
400Bad Request - Requirements not matchedEndpoint is expecting query params or body, but not sent in request
401Invalid "x-api-key" found in headersCheck 'x-api-key' value in the headers
403"x-api-key" is not found in headersAttach APIKey as 'x-api-key' in the headers

API response - Explained

What are all the possible responses?

Every triggered job can have one of the below 4 status

JobStatusDescription
SuccessProcess completed. Check the response for tables
FailedProcess Failed, No Credits used
ProcessingStill in process, use "JobId" to retrieve the output later
IncompleteProcess finished, but all pages are not processed. Partial output

What is the output of the process?

The output response will be in the below format, based on the API Key Plan type.

javascript
{
    "JobStatus": <string>,                              # Status of the triggered Process  @ JOB-LEVEL
    "Pages": <integer>,                                 # Number of pages processed in this request @ PAGE-LEVEL
    "Tables": [<list of key-value objects of table>     # List of all tables found @ TABLE-LEVEL
        {
            "Page": <integer>,                              ## Page number in which this table is found
            "CharacterConfidence": <float>,                 ## Accuracy of Characters recognized from the input-page
            "LayoutConfidence": <float>,                    ## Accuracy of table layout's design decision
            "TableJson": <dict>,                            ## Table Cell Text in key-value format with index orientation - {row#: {col#: <str>}}
            "TableCoordinates": <dict>,                     ## Top-left & Bottom-right Cell Coordinates - {row#: {col#: <list(x1,y1,x2,y2)>}}
            "TableConfidence": <dict>                       ## Cell level accuracy of detected characters - {row#: {col#: <float>}}
        },
    {...}                                               ## ... more "Tables" objects
    ],
    "Lines": [<list of key-value objects>               # Pagewise Line details @ PAGE-LEVEL
        {
            "Page": <integer>,                          # Page number in which the lines are found
            "CharacterConfidence": <float>,             # Average Accuracy of all Characters recognized from the input-page
            "LinesArray": [
                <list of key-value objects of line>     # Ordered list of lines in this page @ LINE-LEVEL
                {
                    "Line": <str>,                          ## Detected text of the complete line
                    "WordsArray": [
                        <list of key-value objects>         ## Word level datails in this line @ WORD-LEVEL
                        {
                            "Conf": <float>,                    ### Accuracy of recognized characters of the word
                            "Word": <str>,                      ### Detected text of the word
                            "Loc": [x1, y1, x2, y2]             ### Top-left & Bottom-right coordinates, w.r.t the input-page width-height dimensions
                        },
                    {...}                                   ### More "WordsArray" objects
                    ]
                },
            {...}                                       ## More "LinesArray" objects
            ]
        },
    {...}                                               # More Pagewise "Lines" details
    ]
}

Output objects are based on the API Key Plan type. Available plan types are

Purchased Plans

  • "LITE" - only table data in the output
  • "FULL" - table and text data in the output
  • "EXTRA" - table, text data along with cell & word coordintates and character detection accuracy

Promotional Plans: Any plan other than Purchased plans are promotional

  • "free_trial", "camelotpro" - these are promotional API Keys, gives only table data equivalent to "LITE" plan type

Explained below is the object level details of the output
Key NameParentTypeDescriptionAvailability
JobStatusJobStringStatus of the triggered processALL Plans
PagesJobIntegerNumber of pages processed in the requestALL Plans
TablesJobArrayList of all tables foundALL Plans
Tables[0].PageTableIntegerPage number in which the table is foundALL Plans
Tables[0].CharacterConfidenceTableDecimalAccuracy of Characters recognized from the imageALL Plans
Tables[0].LayoutConfidenceTableDecimalAccuracy of table layout's design decisionALL Plans
Tables[0].TableJsonTableJson/dictTable Cell Text in key-value format with index orientation - {row#: {col#: }}ALL Plans
Tables[0].TableCoordinatesTableJson/dictTop-left & Bottom-right Cell Coordinates - {row#: {col#: }}EXTRA Plan
Tables[0].TableConfidenceTableJson/dictCell level accuracy of detected characters - {row#: {col#: }}EXTRA Plan
LinesJobArrayList of page-wise lines textFULL, EXTRA
Lines[0].PagePageIntegerPage number in which the lines are foundFull Plan
Lines[0].CharacterConfidencePageDecimalAverage Accuracy of all Characters recognized from the input-pageFull Plan
Lines[0].LineArrayPageArrayOrdered list of lines of the page
Lines[0].LineArray[0].LineLineStringDetected text of the complete lineFull Plan
Lines[0].LineArray[0].WordsArrayLineArrayWord level datails in this lineEXTRA Plan
Lines[0].LineArray[0].WordsArray[0].ConfWordDecimalAccuracy of recognized characters of the wordEXTRA Plan
Lines[0].LineArray[0].WordsArray[0].WordWordStringDetected text of the wordEXTRA Plan
Lines[0].LineArray[0].WordsArray[0].LocWordArrayTop-left & Bottom-right coordinates, w.r.t the input-page width-height dimensionsEXTRA Plan

Additional resources - transform response to a CSV

How to transform the "TableJson" to csv?

ResponseJSON["Tables"], an array, contain tabular json data of all tables extracted from the file.

ResponseJSON["Tables"][0].TableJson is a dictionary with row number as key and column data as values.

Gathered helpful resources to make it easier for you to convert the JSON response into a CSV file.

LanguageLibraryHelpful Link
Javascriptjson2csv - Open Source libraryhttps://github.com/zemirco/json2csv
Pythonstandard csv libraryhttps://docs.python.org/3/library/csv.html#csv.DictReader
Pythonpandas - orient="index"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
Nodejsonexport - Open Source libraryhttps://github.com/kauegimenes/jsonexport
Rubyjson2csv - Open Source libraryhttps://github.com/korczis/json2csv
PHPdeveloper hackhttps://stackoverflow.com/a/20667637/6041169
GOstandard csv libraryusage - https://stackoverflow.com/a/16482056/6041169
Rrjson library to convert json to csvusage - https://www.tutorialspoint.com/r/r_json_files.htm

Do you have website recommendations to VIEW the recieved JSON resonse as a CSV?

https://json-csv.com

Let's START this