OutOfMemoryError when viewing a PDF file

jorgeeflorez · April 28, 2021, 5:05pm

Hello,
I am using GroupDocs Viewer Java 19.11 to render a pdf file as HTML. I am getting the following error:
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.setCapacity(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.addItem(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lI$lf.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lI$2.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4j.lj.le(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4j.lj.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5n.l1f.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5n.l1f.lu(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lI.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lk.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4y.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3h.lI.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3h.lj.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2y.lu.lj(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2y.lu.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2y.lu.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2y.lu.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ApsUsingConverter.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.l6h.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ADocument.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ADocument.save(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.Document.save(Unknown Source)
at com.groupdocs.viewer.converter.a.q.a(Unknown Source)
at com.groupdocs.viewer.converter.a.j.aS(Unknown Source)
at com.groupdocs.viewer.converter.a.j.d(Unknown Source)
at com.groupdocs.viewer.converter.a.j.aU(Unknown Source)
at com.groupdocs.viewer.converter.a.j.bU(Unknown Source)
at com.groupdocs.viewer.converter.a.bY(Unknown Source)
at com.groupdocs.viewer.handler.ViewerHandler.a(Unknown Source)

This is the code I am using:
File file = new File(path);
HtmlOptions options = new HtmlOptions();
options.setEmbedResources(true);
options.setPageNumbersToRender(Collections.singletonList(1));
List pages = htmlHandler.getPages(file.getAbsolutePath(), options);

I have assigned to my programs 8GB of memory (-Xms512M -Xmx8192M -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512m), and the file’s size is 2.32MB, it should be enough, right?.

Unfortunatelly, I am not allowed to share the file.

I tried using GroupDocs 20.7

File file = new File(path);
int pageNumber = 1;
HtmlViewOptions options = HtmlViewOptions.forEmbeddedResources();
try (Viewer viewer = new Viewer(file.getAbsolutePath())) {
viewer.view(options, pageNumber);
}

and I get:
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.setCapacity(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.addItem(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI$lf.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI$2.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4u.lj.le(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4u.lj.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5y.l2if.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5y.l2if.lu(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lk.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3j.lI.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3j.lj.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lj(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.l12h.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.l12h.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ApsUsingConverter.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.l7if.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ADocument.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.ADocument.save(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.Document.save(Unknown Source)
at com.groupdocs.viewer.a.a.a.c.c.o.a(Unknown Source)
at com.groupdocs.viewer.a.a.a.a.a(Unknown Source)
at com.groupdocs.viewer.a.e.b.a(Unknown Source)
at com.groupdocs.viewer.a.e.b.a(Unknown Source)
at com.groupdocs.viewer.Viewer.view(Unknown Source)

I tried using GroupDocs 21.2 and I get a similiar stack trace:

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.setCapacity(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.ms.System.Collections.Generic.l0t.addItem(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI$lf.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI$2.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4u.lj.f(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4u.lj.lb(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5y.l2if.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l5y.l2if.lu(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lk.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l4h.lI.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3j.lI.a(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l3j.lj.lI(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lj(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.c(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.b(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.internal.l2h.lu.lf(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.fo.a(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.fo.eij(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.j.a(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.kQ.a(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.a.c(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.a.b(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.Y.b(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.a.a(Unknown Source)
at com.groupdocs.viewer.internal.c.a.pd.Y.a(Unknown Source)
at com.groupdocs.viewer.domain.documents.a.c.c.p.a(Unknown Source)
at com.groupdocs.viewer.domain.documents.a.a.a(Unknown Source)
at com.groupdocs.viewer.domain.e.b.a(Unknown Source)

When I try to use Viewer.getViewInfo in version 20 and 21 I get
java.lang.ClassCastException: com.groupdocs.viewer.internal.c.a.ms.core.System.IO.JavaInputStream cannot be cast to com.groupdocs.viewer.results.ViewInfo

I hope all I have written may be of help…

Regards.

Jorge Flórez

vladimir.litvinchik · April 28, 2021, 5:56pm

Hi @jorgeeflorez

Unfortunately, we’ve failed to reproduce these issues with 21.2 and sample_app.zip (3.7 MB). Could you please update the app or share the code you’re using. Please also try reproducing these issues with another file that you can share to make sure the issue is not specific to the file.

jorgeeflorez · April 28, 2021, 6:12pm

Hi Vladimir, thank you for your reply.
I have used the project you provided, commented the setLicense (we only have 19 and 20 licences) and modified the path to my file. Here is the output: output.zip (1.8 KB)

I will check again if it is possible to share the file although I doubt it.

I think maybe this is the case, because the pdf pages are scanned paper pages, probably the file was generated by Scanner software.
file properties.png (15.6 KB)
Anyway, what is odd is that all memory is consumed trying to render the first page.
Is it possible to check that the file will be renderable before attempting it? to avoid the OutOfMemoryError?

Thanks.

vladimir.litvinchik · April 28, 2021, 7:03pm

@jorgeeflorez

Anyway, what is odd is that all memory is consumed trying to render the first page.
Is it possible to check that the file will be renderable before attempting it? to avoid the OutOfMemoryError?

I do agree with you that such behaviour is unexpected but there is no way to check how much memory or predict how much memory it will require before rendering the file.

I think maybe this is the case, because the pdf pages are scanned paper pages, probably the file was generated by Scanner software.

Thank you for sharing the details about the file. We’ll try creating a similar file that contains scans to reproduce the issue. In case of any updates we’ll notify you.

jorgeeflorez · May 1, 2021, 5:07pm

Hi,
I just want to add some of things that may help…
The file has the following fonts, according to Adobe Acrobat Pro X: fonts.png (7.2 KB)

This is some of the text contained in the pdf file. The images have spanish text altough it seems the software that created the pdf and performed OCR “saw” some asian symbols: text sample.png (7.0 KB)

三洋）＿小二C言、 r」∴FIT』JIA01461 L〕FL 9／7m19 UN（DADRESTiTUCIONDETiERRASDE
－＝＋．lL ）
半＼＝∴∴∴∵忙［、i⊃二一二・qUTELAR O4006PROTFCCiONJURiDICADELPRED10
日NASQUE川iT－RVIENENENELACTO（X－Tituiardederechorea－de－d。mI…01I－TItuiardedom一nl0incompieto）
OFICINA DE REGISTRO DEINSTRUMENTOS PUBLiCOS

I have been instructed to try to use another libraries (itext, pdfbox, aspose) in order to check if it is possible to detect the flaw of the pdf file, in order to avoid the memory consumption. We have an Aspose Total 17 license and there I also saw the OutOfMemoryError (in this case when trying to extract the text): error aspose.png (8.9 KB)

As far as know Groupdocs uses libraries from Aspose, right?
Maybe the problem is related to the fonts that were used when creating the pdf…

vladimir.litvinchik · May 1, 2021, 7:15pm

@jorgeeflorez

As far as know Groupdocs uses libraries from Aspose, right?

Yes, GroupDocs uses Aspose libraries.

Maybe the problem is related to the fonts that were used when creating the pdf…

It is possible that this is font-related issue and it would be much sipler to reproduce the issue and found the real root cause with the file but we do understand that some files can’t be shared like in your case.

vladimir.litvinchik · May 6, 2021, 11:13am

@jorgeeflorez

Unfortunately, we’ve failed to reproduce this issue on our side with the files we have and the files we’ve created to reproduce this issue. For the further investigation we need a sample file that can be used to reproduce this issue.