Hi, i used groupdocs.parser version 20.1 to parse file large PDF around >50MB, but it only parsed around 10-14 first pages and missed a lot of file pdf content. So please check this issue (version 18.12 dont have this problem).
Example file pdf 500MB: https://ia800304.us.archive.org/19/items/nasa_techdoc_19880069935/19880069935.pdf
@shockvip1331,
Please share the sample code or application using that issue could be reproduced at our end.
I just used basic code like examples on Github:
try (Parser parser = new Parser(filePDFPath)) {
try {
try (TextReader reader = parser.getText()) {
String read = reader.readToEnd();
OutputStream outputStream = new FileOutputStream(fileOutputPath);
outputStream.write(read.getBytes(StandardCharsets.UTF_8));
outputStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
1 Like
@shockvip1331,
This issue is reproduced at our end. Hence, it has been logged in our internal issue tracking system with ID PARSERJAVA-110. As there is any further update, you’ll be notified.
During the fixing time, i have to downgrade to version 18.12 to make sure my app works fine.