Hello,
we updated GroupDocs.Parser from version 22.6.0 to 23.8.0. After this update, images extracted from pdf files have worse quality. It is problem for us because we use 3rd party OCR technology to extract text from these images and now we get worse text extraction results.
I observed this problem on several files. Here is one so you can test it yourself.
TestFile.pdf (2.5 MB)
I created GroupDocs.Parser object from that document, using GroupDocsParser.GetImages() call extracted PageImageArea and then saved it using PageImageAre.Save(). Same code but got different .jpeg images. One from version 22.6.0 has 2.6MB and the second one from 23.8.0. version has 462KB.
ExtractedImages.zip (2.9 MB)
The difference is not when saving the images, because the already extracted PageImageArea is different (different size of bytes).
What is the reason this was changed? I did not find it in the changelogs. Is it a bug?
Thank you