Parse a large PDF file using Java

shockvip1331 · May 17, 2020, 3:35am

Hi, i try to use Parser 20.5 JAVA, i realize its significantly slower than free Tika parser.
For example my PDF file, 110MB with > 7000 pages.
Groupdocs Parser took more than 500s to parse this file, compared to Tika only 48s.
And also Groupdocs Parser in the end of parsed string it also including title page (duplicate two or three times)?

Example Code with GroupDocs
Parser parser = new Parser(“sample”);
TextReader reader = parser.getText();
String text = reader.readToEnd();

I hope you can try to improve Parser’s performance ?

mentioned sample file data-mining.pdf - Google Drive

atir.tahir · May 17, 2020, 9:32am

@shockvip1331,

We’re investigating this scenario. Your investigation ticket ID is PARSERJAVA-152. As there’s any update, you’ll be notified.

aspose.notifier · February 28, 2021, 8:54am

The issues you have found earlier (filed as PARSERJAVA-152) have been fixed in this update. This message was posted using Bugs notification tool by Atir_Tahir