extracttable.com

ExtractTable API allows extracting structured tabular data from images or PDF files. The motivation is to make it easier for developers to not worry about the table area, column or row coordinates, rotation et al in the input.

If you are a Python Developer check out the official pip library ExtractTable-py - takes only 3 lines of code for the output
image

python

from ExtractTable import *
et_sess = ExtractTable(api_key=YOUR_API_KEY)    # Replace your VALID API Key here
print(et_sess.check_usage())                    # Validates API Key & show credits usage 
table_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format="df")

Prerequisite

The service is authenticated with an API key, grab some FREE credits here for trial or most welcome to BUY credits here. ExtractTable.com offers the best accuracy, lowest $/credit with longest validity and credit refund on bad outputs(literally, no one does this). For any information needed/assistance/troubleshooting feel free to email me at saradhi@extracttable.com. A detailed competitive comparison here.

Authentication

How/Where to use the API Key?

For API usage - must be passed through headers as "x-api-key" in every request

For Web usage - go to Web-PRO and provide the API Key when prompted

API Request - Common mistakes

Firstly, note that there are 3 different API endpoints - all endpoints must need a valid API Key passed in header as “x-api-key”

Usage	Endpoint	Request Methods	Must contain	Common Mistakes
Validate API Key	https://validator.extracttable.com/	GET	a valid API Key passed in the request header as “x-api-key”	missing API key in request headers
Trigger a request	https://trigger.extracttable.com/	POST	file data (not file location) in body as “input” of multipart/form-data	requesting as x-www-form-urlencoded
Retrieve the result	https://getresult.extracttable.com/	GET	JobId- recieved in the trigger response	Invalid JobId / POST instead of GET

What are the common error codes & How to resolve?

Code	Description	ProTip
400	Bad Request - Requirements not matched	Endpoint is expecting query params or body, but not sent in request
401	Invalid "x-api-key" found in headers	Check 'x-api-key' value in the headers
403	"x-api-key" is not found in headers	Attach APIKey as 'x-api-key' in the headers

API response - Explained

What are all the possible responses?

Every triggered job can have one of the below 4 status

JobStatus	Description
`Success`	Process completed. Check the response for tables
`Failed`	Process Failed, No Credits used
`Processing`	Still in process, use "JobId" to retrieve the output later
`Incomplete`	Process finished, but all pages are not processed. Partial output

What is the output of the process?

The output response will be in the below format, based on the API Key Plan type.

javascript

{
    "JobStatus": <string>,                              # Status of the triggered Process  @ JOB-LEVEL
    "Pages": <integer>,                                 # Number of pages processed in this request @ PAGE-LEVEL
    "Tables": [<list of key-value objects of table>     # List of all tables found @ TABLE-LEVEL
        {
            "Page": <integer>,                              ## Page number in which this table is found
            "CharacterConfidence": <float>,                 ## Accuracy of Characters recognized from the input-page
            "LayoutConfidence": <float>,                    ## Accuracy of table layout's design decision
            "TableJson": <dict>,                            ## Table Cell Text in key-value format with index orientation - {row#: {col#: <str>}}
            "TableCoordinates": <dict>,                     ## Top-left & Bottom-right Cell Coordinates - {row#: {col#: <list(x1,y1,x2,y2)>}}
            "TableConfidence": <dict>                       ## Cell level accuracy of detected characters - {row#: {col#: <float>}}
        },
    {...}                                               ## ... more "Tables" objects
    ],
    "Lines": [<list of key-value objects>               # Pagewise Line details @ PAGE-LEVEL
        {
            "Page": <integer>,                          # Page number in which the lines are found
            "CharacterConfidence": <float>,             # Average Accuracy of all Characters recognized from the input-page
            "LinesArray": [
                <list of key-value objects of line>     # Ordered list of lines in this page @ LINE-LEVEL
                {
                    "Line": <str>,                          ## Detected text of the complete line
                    "WordsArray": [
                        <list of key-value objects>         ## Word level datails in this line @ WORD-LEVEL
                        {
                            "Conf": <float>,                    ### Accuracy of recognized characters of the word
                            "Word": <str>,                      ### Detected text of the word
                            "Loc": [x1, y1, x2, y2]             ### Top-left & Bottom-right coordinates, w.r.t the input-page width-height dimensions
                        },
                    {...}                                   ### More "WordsArray" objects
                    ]
                },
            {...}                                       ## More "LinesArray" objects
            ]
        },
    {...}                                               # More Pagewise "Lines" details
    ]
}

Output objects are based on the API Key Plan type. Available plan types are

Purchased Plans

"LITE" - only table data in the output

"FULL" - table and text data in the output

"EXTRA" - table, text data along with cell & word coordintates and character detection accuracy

Promotional Plans: Any plan other than Purchased plans are promotional

"free_trial", "camelotpro" - these are promotional API Keys, gives only table data equivalent to "LITE" plan type

Explained below is the object level details of the output

Key Name	Parent	Type	Description	Availability
JobStatus	Job	String	Status of the triggered process	ALL Plans
Pages	Job	Integer	Number of pages processed in the request	ALL Plans
Tables	Job	Array	List of all tables found	ALL Plans
Tables[0].Page	Table	Integer	Page number in which the table is found	ALL Plans
Tables[0].CharacterConfidence	Table	Decimal	Accuracy of Characters recognized from the image	ALL Plans
Tables[0].LayoutConfidence	Table	Decimal	Accuracy of table layout's design decision	ALL Plans
Tables[0].TableJson	Table	Json/dict	Table Cell Text in key-value format with index orientation - {row#: {col#: }}	ALL Plans
Tables[0].TableCoordinates	Table	Json/dict	Top-left & Bottom-right Cell Coordinates - {row#: {col#: }}	EXTRA Plan
Tables[0].TableConfidence	Table	Json/dict	Cell level accuracy of detected characters - {row#: {col#: }}	EXTRA Plan
Lines	Job	Array	List of page-wise lines text	FULL, EXTRA
Lines[0].Page	Page	Integer	Page number in which the lines are found	Full Plan
Lines[0].CharacterConfidence	Page	Decimal	Average Accuracy of all Characters recognized from the input-page	Full Plan
Lines[0].LineArray	Page	Array	Ordered list of lines of the page
Lines[0].LineArray[0].Line	Line	String	Detected text of the complete line	Full Plan
Lines[0].LineArray[0].WordsArray	Line	Array	Word level datails in this line	EXTRA Plan
Lines[0].LineArray[0].WordsArray[0].Conf	Word	Decimal	Accuracy of recognized characters of the word	EXTRA Plan
Lines[0].LineArray[0].WordsArray[0].Word	Word	String	Detected text of the word	EXTRA Plan
Lines[0].LineArray[0].WordsArray[0].Loc	Word	Array	Top-left & Bottom-right coordinates, w.r.t the input-page width-height dimensions	EXTRA Plan

Additional resources - transform response to a CSV

How to transform the "TableJson" to csv?

ResponseJSON["Tables"], an array, contain tabular json data of all tables extracted from the file.

ResponseJSON["Tables"][0].TableJson is a dictionary with row number as key and column data as values.

Gathered helpful resources to make it easier for you to convert the JSON response into a CSV file.

Language	Library	Helpful Link
Javascript	json2csv - Open Source library	https://github.com/zemirco/json2csv
Python	standard csv library	https://docs.python.org/3/library/csv.html#csv.DictReader
Python	pandas - orient="index"	https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html
Node	jsonexport - Open Source library	https://github.com/kauegimenes/jsonexport
Ruby	json2csv - Open Source library	https://github.com/korczis/json2csv
PHP	developer hack	https://stackoverflow.com/a/20667637/6041169
GO	standard csv library	usage - https://stackoverflow.com/a/16482056/6041169
R	rjson library to convert json to csv	usage - https://www.tutorialspoint.com/r/r_json_files.htm

Do you have website recommendations to VIEW the recieved JSON resonse as a CSV?