Rendering a large paged Word file to HTML takes a lot of memory in .NET

rramsahoye · April 2, 2020, 11:32pm

Hello,

I’m currently trialing the GroupDocs.Viewer and I’m observing some pretty strange behavior in terms of memory. With a 20 MB file, I’m seeing increases of 3 GB of RAM when trying to grab one page. I’m wondering if this is typical behavior.

I’m currently use the Viewer library to create an HTML file so that there will be less time between user requests and page retrievals. But the utility to generate HTML files will be a resource concern if this is normal. Thanks.

atir.tahir · April 3, 2020, 6:35am

@rramsahoye,

Please note that if we have two documents of same file size. One may take longer time/memory to be rendered than other. Because it depends on the file/document’s content as well. Document with more clip-arts/graphs/charts or images may take more memory/time to be rendered than a simple text based document.
However, you can share the problematic DOCX file and sample (console based) application with us using that issue could be reproduced. We’ll then further investigate this scenario. Also, please specify the API version (e.g. 19.10, 20.1) that you integrated in the application and API variant (Java or .NET).

rramsahoye · April 3, 2020, 3:51pm

Here is the document, 9_Random Text-18901.docx - Google Drive

What are the recommended specifications for running GroupDocs.Viewer on a server? If this is typical behavior, I would like some more information on how much resources to consider. Thanks.

atir.tahir · April 3, 2020, 8:39pm

@rramsahoye,

Please share sample application/code that you used to render source file in HTML as well. Also specify, are you using Java or .NET varient of the API.

rramsahoye · April 3, 2020, 8:56pm

I am using .NET with GroupDocs.Viewer 20.1.0.
Here is the sample code:

Blockquote
string filePathFormat = string.Format(@"{0}{1}{1}",rootPath, attachmentFileName) + “-{0}.html”;

        // Building 
        List<int> pagesNotProcessed = new List<int>();
        int[] pageNumbers = new int[numOfPages];
        for (int i = 0; i < pageNumbers.Length; i++)
        {
            if (File.Exists(string.Format(filePathFormat, i + startingPage)) == false)
            {
                pagesNotProcessed.Add(i + startingPage);
            }
            pageNumbers[i] = i + startingPage;
        }

        if (pagesNotProcessed.Count > 0)
        {
            int[] pagesToBeProcessed = pagesNotProcessed.ToArray();

            using (Viewer viewer = new Viewer(attachmentPath))
            {
                HtmlViewOptions options = HtmlViewOptions.ForEmbeddedResources(filePathFormat);
                options.Minify = true;
                viewer.View(options, pagesToBeProcessed);
            } 
        }

Blockquote

atir.tahir · April 4, 2020, 7:52am

@rramsahoye,

Thanks for the details. We have logged this scenario in our internal issue tracking system with ID VIEWERNET-2382. We’ll further investigate it and let you know if there’s any update or a work-around.

atir.tahir · April 6, 2020, 3:26pm

@rramsahoye,

Let us share our findings. This is the expected behavior (memory consumption). When opening a document the model of the document is built behind the scenes. The more pages a document has the more memory it will require to open and process a document.

The best configuration for a server would be a configuration that is backed by the measurements. First of all name the requirements e.g. “your application is expected to process N of this or that files in T time”. Then by using some X server configuration check if the requirements are satisfied. If the requirements are not satisfied add resources to a server and measure one more time. Of course, don’t forget about resource utilization on the multi-core server it may be worth processing files in multiple threads and put processing files in a queue. So, as you can see there is no single solution for all of the cases. Resources/server requirements, they depend on your needs.