ExtractTable API allows extracting structured tabular data from images or PDF files. The motivation is to make it easier for developers to not worry about the table area, column or row coordinates, rotation et al in the input.
If you are a Python Developer check out the official pip library ExtractTable-py - takes only 3 lines of code for the output
image
python
from ExtractTable import*
et_sess =ExtractTable(api_key=YOUR_API_KEY) # Replace your VALIDAPI Key here
print(et_sess.check_usage()) # Validates API Key & show credits usage
table_data = et_sess.process_file(filepath=Location_of_Image_with_Tables, output_format="df")
Prerequisite
The service is authenticated with an API key, grab some FREE credits here for trial or most welcome to BUY credits here. ExtractTable.com offers the best accuracy, lowest $/credit with longest validity and credit refund on bad outputs(literally, no one does this). For any information needed/assistance/troubleshooting feel free to email me at saradhi@extracttable.com. A detailed competitive comparison here.
Authentication
How/Where to use the API Key?
For API usage - must be passed through headers as "x-api-key" in every request
For Web usage - go to Web-PRO and provide the API Key when prompted
API Request - Common mistakes
Firstly, note that there are 3 different API endpoints - all endpoints must need a valid API Key passed in header as “x-api-key”
Endpoint is expecting query params or body, but not sent in request
401
Invalid "x-api-key" found in headers
Check 'x-api-key' value in the headers
403
"x-api-key" is not found in headers
Attach APIKey as 'x-api-key' in the headers
API response - Explained
What are all the possible responses?
Every triggered job can have one of the below 4 status
JobStatus
Description
Success
Process completed. Check the response for tables
Failed
Process Failed, No Credits used
Processing
Still in process, use "JobId" to retrieve the output later
Incomplete
Process finished, but all pages are not processed. Partial output
What is the output of the process?
The output response will be in the below format, based on the API Key Plan type.
javascript
{"JobStatus":<string>, # Status of the triggered Process @ JOB-LEVEL"Pages":<integer>, # Number of pages processed inthis request @ PAGE-LEVEL"Tables":[<list of key-value objects of table> # List of all tables found @ TABLE-LEVEL{"Page":<integer>, ## Page number in which this table is found
"CharacterConfidence":<float>, ## Accuracy of Characters recognized from the input-page
"LayoutConfidence":<float>, ## Accuracy of table layout's design decision
"TableJson":<dict>, ## Table Cell Text in key-value format with index orientation -{row#:{col#:<str>}}"TableCoordinates":<dict>, ## Top-left & Bottom-right Cell Coordinates -{row#:{col#:<list(x1,y1,x2,y2)>}}"TableConfidence":<dict> ## Cell level accuracy of detected characters -{row#:{col#:<float>}}},{...} ## ... more "Tables" objects
],"Lines":[<list of key-value objects> # Pagewise Line details @ PAGE-LEVEL{"Page":<integer>, # Page number in which the lines are found
"CharacterConfidence":<float>, # Average Accuracy of all Characters recognized from the input-page
"LinesArray":[<list of key-value objects of line> # Ordered list of lines inthis page @ LINE-LEVEL{"Line":<str>, ## Detected text of the complete line
"WordsArray":[<list of key-value objects> ## Word level datails inthis line @ WORD-LEVEL{"Conf":<float>, ### Accuracy of recognized characters of the word
"Word":<str>, ### Detected text of the word
"Loc":[x1, y1, x2, y2] ### Top-left & Bottom-right coordinates, w.r.t the input-page width-height dimensions
},{...} ### More "WordsArray" objects
]},{...} ## More "LinesArray" objects
]},{...} # More Pagewise "Lines" details
]}
Output objects are based on the API Key Plan type. Available plan types are
Purchased Plans
"LITE" - only table data in the output
"FULL" - table and text data in the output
"EXTRA" - table, text data along with cell & word coordintates and character detection accuracy
Promotional Plans: Any plan other than Purchased plans are promotional
"free_trial", "camelotpro" - these are promotional API Keys, gives only table data equivalent to "LITE" plan type
Explained below is the object level details of the output
Key Name
Parent
Type
Description
Availability
JobStatus
Job
String
Status of the triggered process
ALL Plans
Pages
Job
Integer
Number of pages processed in the request
ALL Plans
Tables
Job
Array
List of all tables found
ALL Plans
Tables[0].Page
Table
Integer
Page number in which the table is found
ALL Plans
Tables[0].CharacterConfidence
Table
Decimal
Accuracy of Characters recognized from the image
ALL Plans
Tables[0].LayoutConfidence
Table
Decimal
Accuracy of table layout's design decision
ALL Plans
Tables[0].TableJson
Table
Json/dict
Table Cell Text in key-value format with index orientation - {row#: {col#: }}