Free Support Forum - groupdocs.com

Parse a large PDF file using Java

Hi, i try to use Parser 20.5 JAVA, i realize its significantly slower than free Tika parser.
For example my PDF file, 110MB with > 7000 pages.
Groupdocs Parser took more than 500s to parse this file, compared to Tika only 48s.
And also Groupdocs Parser in the end of parsed string it also including title page (duplicate two or three times)?

Example Code with GroupDocs
Parser parser = new Parser(“sample”);
TextReader reader = parser.getText();
String text = reader.readToEnd();

I hope you can try to improve Parser’s performance ?

mentioned sample file https://drive.google.com/file/d/1imv3Z95HOAESblRHrqL9scUOdfR3xWwB/view?usp=sharing

1 Like

@shockvip1331,

We’re investigating this scenario. Your investigation ticket ID is PARSERJAVA-152. As there’s any update, you’ll be notified.