Free Support Forum - groupdocs.com

Parse a large PDF to HTML using Java

Hi, i used groupdocs.parser version 20.1 to parse file large PDF around >50MB, but it only parsed around 10-14 first pages and missed a lot of file pdf content. So please check this issue (version 18.12 dont have this problem).

Example file pdf 500MB: https://ia800304.us.archive.org/19/items/nasa_techdoc_19880069935/19880069935.pdf

@shockvip1331,

Please share the sample code or application using that issue could be reproduced at our end.

I just used basic code like examples on Github:

try (Parser parser = new Parser(filePDFPath)) {
try {
try (TextReader reader = parser.getText()) {
String read = reader.readToEnd();
OutputStream outputStream = new FileOutputStream(fileOutputPath);
outputStream.write(read.getBytes(StandardCharsets.UTF_8));
outputStream.close();

            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
1 Like

@shockvip1331,

This issue is reproduced at our end. Hence, it has been logged in our internal issue tracking system with ID PARSERJAVA-110. As there is any further update, you’ll be notified.

During the fixing time, i have to downgrade to version 18.12 to make sure my app works fine.

@shockvip1331,

Alright.