Scraping information from a pdf

Peter · April 19, 2021, 11:37am

Hi all,

Does anyone know of a widget/Pipe/API we can use to scrape pdf (in this case a CV) for information like address, telephone number, etc…
Did anyone of the user circle of Tadabase add this feature to a Tadabase application?

Thanks and kind regards
Peter

dtellogaete · April 19, 2021, 4:50pm

Hi Peter,
I think that is possible if you use tadabase Api, I had used python libraries for scrapping data.

tim.young · April 21, 2021, 4:06am

There’s a lot of OCR & text parsing services out there.

https://docparser.com/ is great. It’s super simple to use (I’ve used it with Integromat) but it’s expensive.

Other options like AWS Textract or Google Cloud Vision are nice but much more difficult to return specific pieces of a document.