Viewer slowness

hi guys, I am evaluating a viewer and have a temporary license.
I am using java application to start with and using maven dependency:
com.groupdocs
groupdocs-viewer
25.12
I am testing pdf and generating single file html files (one html file per page). Those are “self contained” files (all in html only) . I am also using high resolution of images to make sure quality is as original.
I have notices that this is quite slow, even on relatively good machine:

file-sample_150kB.pdf - 5.794 sec
tracemonkey.pdf - 63.05 sec
file-example_PDF_500_kB.pdf - 4.364 sec
file-example_PDF_500_kB.pdf (458.5 KB)

file-sample_150kB.pdf (139.4 KB)

Those files are not very big but it take relatively long time. I am afraid that when having really big files for production use cases it will not perform too well… Could I improve it somehow?

compressed.tracemonkey-pldi-09.pdf (992.5 KB)

Hello @anonymousP ,

Thank you for your interest in our product.

First of all, we would like to note that the performance of the Viewer when rendering documents to HTML is directly related to the content and structure of the input document. We are continuously working on improving the overall performance of our product.

To better assist you, could you please share the sample code you are using to convert PDF to HTML? We will analyze it along with the files you provided and do our best to offer recommendations for performance optimization, if possible.

We look forward to your response.

Sure,
here is my code:

    InputStream stream = new FileInputStream(file);
    Viewer viewer = new Viewer(stream);
    
    Path outputFolder = Path.of("").toAbsolutePath()
            .resolve("output")
            .resolve(file.getName().split("\\.")[0]);

    HtmlViewOptions viewOptions = HtmlViewOptions.forEmbeddedResources(outputFolder.resolve("page_{0}.html"));

    //pdfOptions are only taken into consideration when input file is pdf !
    PdfOptions pdfOptions = new PdfOptions();
    pdfOptions.setImageQuality(ImageQuality.HIGH);
    viewOptions.setPdfOptions(pdfOptions);


    //generic options
    viewOptions.setMinify(false); //Minification can break some layout fidelity (e.g., whitespace-sensitive glyph alignments).
    viewOptions.setForPrinting(false); // prevents layout changes
    viewOptions.setRemoveJavaScript(false);

    viewer.view(viewOptions);

Hello @anonymousP ,

Thank you for the information provided.

We will get back to you with the results once our investigation is complete.

Hello @anonymousP ,

Thank you for your patience.

After conducting some investigation on our side, we have prepared several recommendations that may help improve the rendering performance when converting your PDF document to HTML.

1. Use HtmlViewOptions.forExternalResources(...) instead of HtmlViewOptions.forEmbeddedResources(...)

Whenever possible, we recommend using HtmlViewOptions.forExternalResources(...) to store page resources (CSS, fonts, images) as separate files instead of embedding them directly into the HTML.

This approach often improves rendering and page loading performance, especially for large documents. During our testing with your file, this change improved rendering performance by approximately 25%.
Additionally, using forExternalResources requires significantly less Java heap space.

2. Use a lower image quality setting

You may also consider using a lower image quality value, such as:

ImageQuality.LOW

We understand that this may affect image quality, but if it is acceptable for your use case, it can reduce rendering time by up to 5%.

3. Enable PdfOptions.setWrapImagesInSvg(true)

This option wraps raster images from the PDF page into an SVG container, which helps preserve the precise positioning of elements. It is also particularly useful for scanned documents, technical drawings, and PDFs containing a large number of graphical elements. Using this option may improve rendering performance by up to 10%. Based on these recommendations, we updated your code example and attached the modified version below. Please try using it and let us know your feedback.

    InputStream stream = new FileInputStream(file);
    Viewer viewer = new Viewer(stream);
    
    Path outputFolder = Path.of("").toAbsolutePath()
            .resolve("output")
            .resolve(file.getName().split("\\.")[0]);
    String pageExternalFilePathFormat = outputFolder.resolve("page_{0}.html").toString();
    String resourceFilePathFormat = outputFolder.resolve("page_{0}_{1}").toString();
    String resourceUrlFormat = outputFolder.resolve("page_{0}_{1}").toString();
    HtmlViewOptions viewOptions = HtmlViewOptions.forExternalResources(pageExternalFilePathFormat, resourceFilePathFormat,
                resourceUrlFormat);

    //pdfOptions are only taken into consideration when input file is pdf !
    PdfOptions pdfOptions = new PdfOptions();
    pdfOptions.setImageQuality(ImageQuality.LOW);
    pdfOptions.setWrapImagesInSvg(true);
    viewOptions.setPdfOptions(pdfOptions);


    //generic options
    viewOptions.setMinify(false); //Minification can break some layout fidelity (e.g., whitespace-sensitive glyph alignments).
    viewOptions.setForPrinting(false); // prevents layout changes
    viewOptions.setRemoveJavaScript(false);

    viewer.view(viewOptions);

Finally, during our investigation and testing with your document, we noticed that page 13 requires significantly more processing time than the other pages. Rendering this page alone takes more than 50% of the total rendering time for the entire document.

Therefore, we have scheduled a more detailed investigation of this case. If possible, we will try to further improve PDF-to-HTML rendering performance in future versions of our product.