Parse a large PDF file using Java

Hi, i try to use Parser 20.5 JAVA, i realize its significantly slower than free Tika parser.
For example my PDF file, 110MB with > 7000 pages.
Groupdocs Parser took more than 500s to parse this file, compared to Tika only 48s.
And also Groupdocs Parser in the end of parsed string it also including title page (duplicate two or three times)?

Example Code with GroupDocs
Parser parser = new Parser(“sample”);
TextReader reader = parser.getText();
String text = reader.readToEnd();

I hope you can try to improve Parser’s performance ?

mentioned sample file data-mining.pdf - Google Drive

1 Like

@shockvip1331,

We’re investigating this scenario. Your investigation ticket ID is PARSERJAVA-152. As there’s any update, you’ll be notified.

The issues you have found earlier (filed as PARSERJAVA-152) have been fixed in this update. This message was posted using Bugs notification tool by Atir_Tahir