Performance: loading time of documents with lot of pages

Hello,


We have detected that when you are trying to visualize a document (example: 16 mb, not so big) with a lot of pages (like 1200), the viewer takes so much time to load that document, in some cases the viewer only shows “loading content”.

Also we have noticed that when you are trying to print a document with lot of pages, it takes so much time to show the print menu.

Document example via mail.

It`s very important for us and ower clients, solve that problem.


Thank you so much.

Alexis Rosano.

Hello Alexis,

We are sorry to hear that you have this issue. We have obtained and investigated your files:
1. PC8000-6 OMM ESP 12089-ES-GB-0.pdf - 17 MiB - 506 pages
2. HD785-7 SM 7001 ESP GSN01274-03.pdf - 42 MiB - 1274 pages

By default, when you open a web-page with GroupDocs.Viewer widget, and the target document is invoked (for displaying), on the server-side GroupDocs.Viewer converts the document to the images and text (for the image-based rendering) or to the HTML, images, fonts, and CSS (for the HTML-based rendering). Only when this conversion process is finished, GroupDocs.Viewer begins to transmit data to the client-side and end-user can see content of the document in the browser. That’s why it takes so much time - GroupDocs.Viewer needs to convert all document, and they are really big. The conversion speed depends on the performance of the server - CPU, memory, storage.

However, there is a “PreloadPagesCount” method. When using it, for example, “.PreloadPagesCount(1)”, GroupDocs.Viewer begins to transmit data to the client-side when only first page of the document is converted. After applying this method to the 1st document (17 MiB), GroupDocs.Viewer shows its first page almost instantly. All other pages are shown on demand, when you try to open them.

There are some other methods to increase performance:
1. Try to use HTML-based rendering mode (“UseHtmlBasedEngine(true)” method) if it is suitable for you and there are no document distortions.
2. Use “PreloadPagesCount(1)” method. When using this method GroupDocs.Viewer will begin to load document to the client-side when first page of the document will be converted. When this method is not used, GroupDocs.Viewer converts all pages of a document and only then sends it to the client-side.
3. Use “MinimumImageWidth(value)” method. MinimumImageWidth: if set, the Viewer will load page images with the specified width from the server when starting. It will not load page images from the server again after zooming if the current page image size is smaller than the value specified and if the original (not scaled) page image size is smaller than the value specified. It means that in most cases the GroupDocs.Viewer will load page images only once and will not reload them after zooming in/out.
4. If this is possible for your business-logic, disable thumbnails (“ShowThumbnails(false)”).
5. If this is possible for your business-logic, disable ability to select text by using “SupportTextSelection(false)” method.
6. Use “ShowViewerStyleControl(false)” method - it will disable “double page flip” option and will increase performance.

About printing - this feature requires the target document to be converted entirely, so you cannot use it instantly.

We also noticed that GroupDocs.Viewer displays the 2nd document (42 MiB) incorrectly when using the “.PreloadPagesCount(1)”. Our developers are investigating this issue. We will notify you when there will be any update.

Sorry for the inconvenience.

The issues you have found earlier (filed as WEB-1099) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by groupdocs.notifier.

Denis,
We’ve already done all the recommended changes that you suggest, but it’s not a good option to lose functionalities such as text selection and search.
Is there something else to improve the performance?
Do you have this performance issues in your roadmap?
For us and our customers this issues are very important.

We appreciate your comments in order to give our customers an answer about the problem.

Best regards,
Juan

Hello Juan, Alexis,

The document “HD785-7 SM 7001 ESP GSN01274-03.pdf” is working incorrectly in the GroupDocs.Viewer - this issue is reproduced, developers are already working on it.

Now what about a performance in general. Actually, when using a “.PreloadPagesCount(1)” method, first page of a document is displayed very fast, almost instantly. This is the main way to increase the performance. Of course, in reality the performance is the same, but using this method GroupDocs.Viewer transmits the document content to the client-side without waiting a completion of all conversion process.

The ultimate way to increase overall performance is an ahead-of-time (AOT) caching. With this method you generate cache for the document using the “DocumentCache” class before the moment, when end-user opens the document for displaying. And when the document is opening, GroupDocs.Viewer uses the previously generated cache.

If you will have more questions please feel free to contact us.

Hi Denis,
As you said with “.PreloadPagesCount(1)” improves the load performance for the first page. But if the user scrolls down, the loading time increases because of the conversion process. So this improvement is temporal and virtual if you let me said that.

About document caching, do you have a code example? Maybe a c# project? If so, it would be great.

Thanks for the support.
Juan

Hello Juan,

Well, using a “DocumentCache” is so simple, that it does not require a specific project.

DocumentCache dc = new DocumentCache(
@"\license.lic", //license file
Server.MapPath("~/testfiles/") //root storage path
);
dc.GenerateImages(“document.pdf”); //for image-based rendering
dc.GenerateHtml(“document.pdf”, “/”, false); //for HTML-based rendering


Usually this operation is performed when document is not displaying; for example, when it is uploaded to the web-site. GroupDocs.Viewer generates a cache for the specified file (“document.pdf”) and stores it in the “\temp” subfolder. And when this file is requested in the widget (Viewer.ClientCode().FilePath("
document.pdf")…) then GroupDocs.Viewer uses the already generated cache and shows the document content instantly.

So, concluding, when using a AOT caching, you need to spend the same amount of time for converting the target document to the HTML form. But you spend this time period not in the moment when document should be displayed for the end-user, but earlier. The only scenario when this method is not applicable, is when your end-user needs to upload document and display it right after the uploading; in that case you simply have no time gap for performing an AOT caching.

If you will have more questions please feel free to contact us.

Hi Denis,
I believe that’s enough example for now. We will try it.
Thanks for your help.
Best regards,
Juan

Using the DocumentCache is possible to generate the cache with the watermark of the documents?


You said about printing: “…this feature requires the target document to be converted entirely,
so you cannot use it instantly.” Therefore, considering this scenario, it is possible to display a message to the user informing what is happening?

Thank you.
About using DocumentCache, GroupDocs.Viewer generates a cache for a specified file in the temp folder. As you've said, to get the document you have to
invoke (Viewer.ClientCode().FilePath(" document.pdf")...), so that's works when the file is cached. But considering the possible scenario that not all the documents
are cached in that path, Viewer.ClientCode().FilePath(" document.pdf")... this method will not work. Is not possible that Viewer.ClientCode().Stream (...) method
recognize if a document is entirely cached? That method recibes as parameter: the file name, a key and the extension. According to what we understand with that parameters, the method Stream (...) recognize if the file is cached or not.

Also we have detected this behavior with document with lot of pages:

  • Navigating page 1: OK.
  • Navigating page 100: OK.
  • Navigating page 1000: OK.
  • Back to page 200: Wrong

Hello Alexis,

Our developers have prepared an new version of the GroupDocs.Viewer for .NET 2.5.5436.23292, where a simplified printing progress message is implemented: GroupDocs.Viewer shows “Getting a printable version of the document” and then “Preparing the pages”. Now our developers are implementing the more sophisticated progress message for every page, but for now you can use this one.

We’ve sent this new version of the GroupDocs.Viewer to your email address.

Hello Alexis,

We suppose that this post explains most of your questions. Additionally, we highly recommend you to investigate the article “How to Use GroupDocs.Viewer with Streams in ASP.NET MVC or WebForms Projects” and especially its sections: “How GroupDocs.Viewer Works with Streams”, “Streams and Cache”, “Streams and Databases” and “StreamCreator Parameter”.

Thanks.

Video sample: https://dl.dropboxusercontent.com/u/69142139/videoMuestra.mp4


The document is sent via mail.

Hi,


Thank you for that new version of the GroupDocs.Viewer.

By the way, the translation to spanish:
  • “Getting a printable version of the document” “Preparando documento para imprimir”.
  • “Preparing the pages” → “Preparando las páginas”
  • “Printing” → "Imprimiendo"
Exists the possibility to add a loading gif icon to the message “Preparing Pages”?

Thanks

I can’t visualize this document Dropbox - File Deleted - Simplify your life (1758 pages), with previous cache and without it.


Can you reproduce that?

Thank you.

Hello Alexis,

1. DocumentCache class converts the original document to the web-compatible format and saves this data in the “temp” subfolder. This data has nothing common with the watermarks. Watermarks are the text labels, which GroupDocs.Viewer places over the document only while displaying the page. So the answer is - no, it is not possible to generate the cache with the watermark of the documents at this moment.

2. About displaying the message while document is preparing for the printing - this feature was added to the roadmap, we will notify you when it will be ready.

3. About issues while navigation - thanks for the screencast and for the file, we are investigating this issue at this moment.

Thanks.

Hello Alexis,

We’ve just sent you a new version of the GroupDocs.Viewer 2.5.5437.31272. It contains the advanced progress displaying mechanism, which shows the number of processed/all pages like “page 1/10”, “2/10” etc. Unfortunately it doesn’t have the translated phrased which you’ve posted yesterday - it will be injected into the library in the next release. As usual, we’ve sent the link via email.

About loading icon - yes, our designers will improve the overall representation of this process.

Thanks.

Hello Alexis,

We’ve prepared a new version of GroupDocs.Viewer for .NET 2.5.5443.31314 where Spanish translations, which you’ve sent to us, are present. The link was sent, us usual, via email.

Thanks.

Hello Alexis,

Yes, we also cannot open this huge document, GroupDocs.Viewer hangs on when trying to open it. File is sent to our developers for the investigation, we will notify you when we will have news about it.

Thanks.