1. # apt install tesseract-ocr
2. $ tesseract input.jpg | <image file fomat> outputfile
3. cat outputfile.txt
You have now extractet text from any image into a .txt-file.
The .txt-extension is added by tesseract.
Yes, it IS that easy.
Convert a pdf-file with convert input.pdf output.tiff and feed the .tiff-file to tesseract then.
@vi If I got a pdf, but yeah. Thats the solution for some of our internal newsletters that are printed, signed and then scanned to be send around via email. :)
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!