Scanned PDF file not able to view as Text after converting same to OCR

sumit.shrivastava · October 27, 2014, 7:09am

Hi,

I have one scanned image as PDF and I upload the same in my application with below steps:-

1) Upload Scanned Image
2) Scanned image is converted to OCR and document is saved in database.
3) I preview the document in groupdocs but it displays the document as Image instead of text.

But When I download the same its converted to ocr and I can search the text in other softwares such as adobe reader.

Also please find the attached document below.
Please responds ASAP.

denisgvardionov · October 27, 2014, 7:39am

Hello Sumit,

We’ve downloaded and investigated the document that you’ve attached to the forum post. In the “scannedfiles.zip” archive there is one document “Scanned-OCRDoc(1).pdf” (62 067 bytes). This document doesn’t have a text layer - Adobe reader, Adobe Acrobat and all other PDF viewers that we’ve tried were not able to select text, make search and so on.

Please send us a document that you obtain after the OCR process and which has a text layer.

Thanks and waiting for a document.

sumit.shrivastava · October 27, 2014, 7:47am

Hi,

Please find both documents scanned & searchable text.

denisgvardionov · October 28, 2014, 3:39pm

Hello Sumit,

One more time, thank you for the uploaded documents. Yes, there is a text layer in the Scanned-OCRDoc.pdf file. As for the GroupDocs.Viewer, the situation is very interesting. In the new HTML‑based rendering mode GroupDocs.Viewer cannot extract the text: you cannot select it, and search is not working. But in the image‑based rendering mode both of these functions are working, as you can see on the screenshot.

GroupDocs.Viewer rendering comparison

For this time we suggest you use image‑based mode. From our side, our developers have begun to investigate the document. We will notify you in this forum thread when new info arises.

Thanks.