Partial rendering of a document in .NET using C#

wolfgang.gogg · October 14, 2019, 2:41pm

Hello,
what is the suggested strategy to optimize rendering performance for very large files, like f.ex. 300 pages word documents. Is it generally recommended to render the document in blocks, i.e. 5 pages blocks to enable the first pages being viewed quickly while the additional pages are constantly rendered in the background? Does the Groupdocs Viewer object cache/use previously generated rendering information or does it fully reevaluate the whole document on each rendering block (seperate call to View())? Would it be possible to split up the background processing to multiple background threads? If so would it be save to share the same Viewer object or would every background thread require its own Viewer object?

Thanks!

usman.aziz · October 14, 2019, 6:07pm

@wolfgang.gogg,

Yes, partial rendering in the case of large documents is recommended so that the end-user doesn’t have to wait for too long to view the document. You can render the first few pages when the document is loaded and display them. The remaining pages can be rendered when the user scrolls or navigates to the next page. For more details on how to render the selected pages, please visit this documentation article.

In case you have enabled the cache, the API avoids processing the document again and again and fetches the already rendered pages from the cache. See how to use the cache here.

You can split up the rendering process using single or multiple Viewer objects depending upon your application’s needs as well as your scenario.

wolfgang.gogg · October 14, 2019, 6:33pm

Hello,

In case you have enabled the cache, the API avoids processing the document again and again and fetches the already rendered pages from the cache. See how to use the cache here.

→ understood, but i was more referring to internal structure of the document once it was analyzed, and not the already converted paged themselves. I just wanted to ask this as it seemed to me that rendering the whole document in blocks of f.ex. 5 page per call seemed to be significantly slower than rendering all pages in one call. I gave me the impression that some heavy processing is done over and over again on each seperate call of View().

You can split up the rendering process using single or multiple Viewer objects depending upon your application’s needs as well as your scenario.

→ so we can conside the Viewer object to be absolutely multithread-safe?

Thanks again!

usman.aziz · October 15, 2019, 5:03am

@wolfgang.gogg,

Your observation is correct and it is expected because each time when you render a document, it is first processed into the memory before the rendering starts. The cache feature helps to minimize this time when already cached HTML pages or images are used. This way, the time required for the rendering of the document pages can be saved.

In order to provide you the details, we have created an investigation ticket for this scenario as VIEWERNET-2170. We shall share the information with you very soon.

wolfgang.gogg · October 15, 2019, 6:41am

Hello,

Your observation is correct and it is expected because each time when you render a document, it is first processed into the memory before the rendering starts. The cache feature helps to minimize this time when already cached HTML pages or images are used. This way, the time required for the rendering of the document pages can be saved.

OK, so far this is clear. What i do not really understand is why the same Viewer object can not cache/reuse some of this on subseuquent calls, i.e. the document was already loaded to memory on rendering the first 5 pages, why does it needs to be added to memory again when i render the next 5 pages.

Sample code:
private void RenderFile(string sFile)
{
// The {0} and {1} patterns will be replaced with current processing page number and resource name accordingly.
string pageFilePathFormat = $"{sOutputFolder}/page{{0}}.html";
string resourceFilePathFormat = $"{sOutputFolder}/page{{0}}{{1}}";
string resourceUrlFormat = $"{sOutputFolder}/page{{0}}{{1}}";

using (Viewer viewer = new Viewer(sFile))
{
HtmlViewOptions options = HtmlViewOptions
.ForExternalResources(pageFilePathFormat, resourceFilePathFormat, resourceUrlFormat);

   //HtmlViewOptions options = HtmlViewOptions
   //   .ForEmbeddedResources(pageFilePathFormat);

   ViewInfoOptions infoOpt = ViewInfoOptions.FromHtmlViewOptions(options); 
   var viewInfo = viewer.GetViewInfo(infoOpt);
          
   _nPages = viewInfo.Pages.Count;
   _nCurPage = 1;

   _nCurRender = 1;
   int nBlockSize = 5;
   while (_nCurRender <= _nPages)
   {
      if (_nCurRender + nBlockSize > _nPages)
         nBlockSize = (_nCurRender + nBlockSize) - _nPages;

       viewer.View(options, Enumerable.Range(_nCurRender, nBlockSize).ToArray());

       _nCurRender += nBlockSize;
        UpdateData();
    }
}

}

As i already load the full document into memory on GetViewInfo i would expect that the same Viewer caches all gatherered information and reuses it on each View call - so it does not need to reload and regather all that information over an over again. I guess using a cache would not help here, right? Any (internal) reason why this is not done/possible? I would expect by this rendering blockwise could be almost equally fast as rendering the whole document at once.
Any chances this could be enhanced?

Thanks!

usman.aziz · October 15, 2019, 5:39pm

@wolfgang.gogg,

In your provided scenario, the document is loaded into the memory only once as all the subsequent calls to Viewer.View() method are being done within the scope of a single Viewer object. This means that the document will be loaded in the memory only on the first call and the API will use the already loaded document for the subsequent calls of Viewer.View() method.

We tested rendering a document at once as well as in the chunks (5 pages on each call) just like you have done in your provided code. The results that we obtained using BenchmarkDotNet showed that rendering the document at once is a bit faster than the other scenario (see results). Rendering document in chunks is slower because we’re re-applying the options on the document each time when View method is called. However, we have this in our plans to improve the performance in future releases (logged as VIEWERNET-2174).

The Viewer class and its methods are not thread-safe. This means that applying multithreading on a single Viewer object is likely to be failed. What you can do instead is create a new instance of Viewer in each thread. The following code is expected to work well but please keep in mind that the document will be loaded into memory for each thread. So possibly, it can slow down the rendering process.

new Thread(() =>
{
    using (Viewer viewer = new Viewer(path))
    {
        HtmlViewOptions options = HtmlViewOptions.ForExternalResources();
        viewer.View(options, 1, 2, 3);
    }
}).Start();

new Thread(() =>
{
    using (Viewer viewer = new Viewer(path))
    {
        HtmlViewOptions options = HtmlViewOptions.ForExternalResources();
        viewer.View(options, 4, 5, 6);
    }
}).Start();

wolfgang.gogg · October 22, 2019, 6:54am

Hello,

unfortunately i can not access the results you attached to the post. Can you make it available for me?
Anyway i did my own tests again and actually found a huge performance difference. I tested a 300 pages word document. If i render it in one call it took about 35 seconds, which is actually impressive. But if i render it in chunks of 10 (using the code above you tested yourself) it took 128 seconds, so almost 3,5 times longer. I can share the document if you want to test on your end.
Is this the expected performance gap when rendering in chunks, an error in my code or a bug?
Thanks!

usman.aziz · October 22, 2019, 7:30am

@wolfgang.gogg,

In case you are unable to access the attached image, you can view the results here.

Yes, it is the expected behavior of the API when rendering the document in chunks. As I mentioned in the previous reply, rendering the document in chunks is slower because we’re re-applying the options on the document each time when View method is called. However, we have this in our roadmap to improve the performance in such a case.

wolfgang.gogg · October 22, 2019, 7:33am

Hello,

ok - i was just asking again as you said rendering at once is “a bit” faster. This confused me a little as i would not consider 3,5 times “a bit”.
Do you have any kind of idea, how reasonable the performance enhancements migth be considered in one of the next releases?
Thanks!

usman.aziz · October 22, 2019, 7:47am

@wolfgang.gogg,

I apologize for the inconvenience caused. This observation was particular for the shared results.

I am afraid that, at the moment, we are unable to share the expected performance improvements.

wolfgang.gogg · October 22, 2019, 8:13am

Hello,
no need to apologize - thanks for the great support so far!

usman.aziz · October 22, 2019, 9:48am

@wolfgang.gogg,

You’re welcome.

atir.tahir · January 17, 2020, 9:48am

@wolfgang.gogg,

We’ve improved performance when rendering documents in chunks. Please download v19.12 and share your feedback.