Performance differences between SupportTextSelection(true) and false in the image mode

Hello.


Exists to many difference in the loading time when the property SupportTextSelection is set in true (image mode selected).

For our clients is essential search in their documents, so disable SupportTextSelection (therefore disable searchs in the document) is not an option.

Exists any way to SupportTextSelection without detriment to loading time? For example, SuportTextSelection in the first 5 pages, and go loading text selection in the rest of the document?

Thanks.

Video sample with SupportTextSelection(true): Dropbox - File Deleted - Simplify your life

Video sample with SupportTextSelection(false): Dropbox - File Deleted - Simplify your life

Document sample sent via mail.

Hello Alexis,

Thank you for the sample document and for the screencasts. Our developers are investigating your suggestion and requirement. We will notify you about any update on this issue. You didn’t show the source code of the GroupDocs.Viewer widget, but from the screencast we assume that you are using the “.PreloadPagesCount(1)” method, is that correct?

Also, what about the ahead-of-time caching, which was discussed earlier at another topic? Is this tool not an option?

Thanks.

The issues you have found earlier (filed as WEB-1179) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by groupdocs.notifier.

The issues you have found earlier (filed as WEB-1179) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by GroupDocs Notifier.
"you are using the ".PreloadPagesCount(1), method, is that correct?"

Yes, it is.

Now Im trying to generate the cache for de document using the “DocumentCache” class.


I have send you the aspx and the aspx.vb.

The caché is generated OK (see attach in the email), but the viewer is freeze in "Loading document"

Thanks for your help.


Hello Alexis,

We are sorry to hear that you have this issue. We’ve reviewed the screenshots and the source code, thanks.

From your code and screenshots we tried to suppose, what you want to do. We believe, that you want to generate a cache for the specific document, which exists as a stream, and then use that cache in the GroupDocs.Viewer.

From them we see why there is an error. In the “.FilePath” method you try to specify an absolute path to the file, which is located in the “temp” folder (temporalVisor\temp\Cache\temp\S\413_3.pptx). This is not correct. Now we will explain how it works.

When displaying a document in the image-based mode, GroupDocs.Viewer requires rasterized page images in three sizes: original width, 852 pixel width, and 150 pixel width for thumbnails. If you want to perform all AOT cache generations before the document will be opened by the end-user, you should pre-generate cache for all these sizes.

If you create a cache from the stream, in the GroupDocs.Viewer widget you also should use a stream with the “useCachedStreamContentsIfPossible” value set to “true”.

Below there are 2 big pieces of source code: code-behind of the ASPX page and it’s code-front. They are maximally truncated in order to show the essence without unimportant details. It’s a C#, but the code should be understandable.

Code-behind:

public partial class page10Alexis2 : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
String root_storage = Server.MapPath("~/testfiles/");//it’s your root storage path
String license_path = @"\GroupDocsViewer.lic";//it’s your license path

Groupdocs.Web.UI.Viewer.SetRootStoragePath(root_storage);
Groupdocs.Web.UI.Viewer.SetLicensePath(license_path);

String unique_document_name = “uniquedocument.pdf”;//just obtain it from somewhere or generate dynamically

System.IO.Stream file_content = System.IO.File.OpenRead(root_storage + “Article.pdf”);//obtain it from DB or from somewhere else

Groupdocs.Web.UI.DocumentCache dc = new Groupdocs.Web.UI.DocumentCache(license_path, root_storage);
dc.GenerateImages(file_content,
unique_document_name, “pdf”, true,
null, null, null, null);

dc.GenerateImages(file_content,
unique_document_name, “pdf”, true,
null, null, 852, null);

dc.GenerateImages(file_content,
unique_document_name, “pdf”, true,
null, null, 150, null);
}
}


Code-front:

<%@ Page Language=“C#” AutoEventWireup=“true” CodeBehind=“page10Alexis2.aspx.cs” Inherits=“ViewingFromStream.page10Alexis2” %>




Page10 - Alexis2
<%= Groupdocs.Web.UI.Viewer.CreateScriptLoadBlock().LoadJquery().LoadJqueryUi().UseHttpHandlers()%>

body
{
overflow: hidden;
}






<%= Groupdocs.Web.UI.Viewer.ClientCode()
.TargetElementSelector("#visordiv")
.Stream(System.IO.File.OpenRead(Server.MapPath("~/testfiles/") + “Article.pdf”), “uniquedocument.pdf”, “pdf”, “Unique Document.pdf”, true)
.UseHtmlBasedEngine(false)
%>





In case when your document exists as a file, you can use the “.IgnoreDocumentAbsence(true)” method - it allows you to display a file, if it is absent, and only its cache exists.

If you will have more questions please feel free to contact us.

Thanks for your explanation.


I can use New System.Net.WebClient().OpenRead(string URL) instead of System.IO.File.OpenRead(…) ?

Because using WebClient().OpenRead(URL) I can see that in the temp\cache\temp\S\unique_name_document a lot of folders generated for the same document, so it means that is caching more than once?

Thanks for your help.

By the way, using the ahead-of-time caching will improve the loading time of a document if the configuration is set in SupportTextSelection (true) ?


Thanks.

Hello Alexis,

1. Yes, you can use the “System.Net.WebClient” and all other ways/methods/approaches, which return a System.IO.Stream or classes, which implement System.IO.Stream, like MemoryStream, FileStream and so on.

2. If you see that there is a “temp\cache\temp\S\unique_name_document” folder and there are subfolders inside it, then it means that the cache was generated.

3. Yes, when using an ahead-of-time caching, GroupDocs.Viewer parses and extracts the text from the document (if document has a text layer).

4. Important moment, I’ve already talked about it, but I’ll repeat it one more time. AOT caching is useful, if there is a time gap between the moment, when document was appeared on the server, and the moment, when it should be displayed using the GroupDocs.Viewer. For example, you are the administrator of a web-site, and you upload a document on the server. You can begin to perform an AOT caching operation right after the moment when uploading is finished. Then, GroupDocs.Viewer generates the cache, let’s say, during 2 minutes (when document is really big). And then, later, after some time, when some end-user will want to display that document, cache will be ready, and document will be displayed immediately.

In your case, in the web-page, which you’ve sent to us, you perform an AOT caching in the code-behind of the page, “Page_Load” event. This is not an error, but such scenario is useless in real life, because you generate the cache in the moment when document is already requested for displaying.

This is Okay for testing purposes, but the AOT caching is intended to be used, when you are able to separate in time the cache generation and the displaying of the document.

If you will have more questions please feel free to contact us.

Ok, thanks for your answers.


I have some other questions:
  1. Why in the temp content sometimes the folders are named like “100@x.Pdf”, or “@x.Pdf”?
  2. Its necesary rasterize page images in the original widht? (We want to reduce loading times).
  3. If I generate images with the original width, in the path “…\temp\Processing\temp\S\660_1.pdf\2014-12-03T16_29_48@x.Pdf” the files remain there after finishing the process. May be its a bug or its the expected behavior?
  4. With the AOT caching, the loading time of the print menu will be better?
Thank you.


Hi,


I have implemented the AOT caching following your recomendations.

The problem is when I try to open the cached file (850 pages) the browser collapses. I have send you via email the document. May be you can reproduce that.

It’s very important to us that the AOT works good, because our roadmap it’s based largely in that implementation.

Thank you.

Any news about that?


Thank you.

Hello Alexis,

Some time ago I gave you a pattern how to use DocumentCache with streams
At this time please try to open the “MH.pdf” file with the next GroupDocs.Viewer widget.

Instead of
<%= Groupdocs.Web.UI.Viewer.ClientCode()
.TargetElementSelector(“#visordiv”)
.Stream(System.IO.File.OpenRead(Server.MapPath(“~/testfiles/”) + “Article.pdf”), “uniquedocument.pdf”, “pdf”, “Unique Document.pdf”, true)
.UseHtmlBasedEngine(false)
%>
pattern please use the next:
<%= Groupdocs.Web.UI.Viewer.ClientCode()
.TargetElementSelector(“#visordiv”)
.Stream(null, “uniquedocument.pdf”, “pdf”, “Unique Document.pdf”, true, delegate() { Stream s = System.IO.File.OpenRead(Server.MapPath(“~/testfiles/”) + “Article.pdf”); ; return s; })
.UseHtmlBasedEngine(false)
%>

And then check, the exception will occur or not?

Thanks and waiting for your reply.

Hello, you can view this topic also. The main problem is like yours, so maybe you find some idea in the discussion…

Hello Alexis,

Sorry for the delay. Here are answers:

1. These folders (named like "100@x.Pdf", or "@x.Pdf") contain rasterized page images in native size.
2. At this moment - yes, it is necessary to rasterize the pages to the images with the the native (original) width (that's how the GroupDocs.Viewer mechanism is working).
3. It's expected behaviour - GroupDocs.Viewer leaves these page images with native width for possible future usage.
4. Yes, printing mechanism uses cached page images, so if the cache already exists, printing mechanism will use it without creating the new one.
5. We checked the file "MH.pdf" that you've sent to us - thanks. We can reproduce some strange errors (exception when displaying), and along with this when displaying the file using the ".FilePath", and if the cache is absent, it works well. Our developers investigate this issue right now, maybe this is a bug, or something else, we don't know yet. We will notify you when there will be some info about that.


Thanks and sorry for the inconvenience.

I have changed that code, but the browser still crashing when I try to open this file.


I have sent you the new code via mail.

By the way, the cache directory is correct? (See attach)

Thanks.

Sorry, wich topic?

Hello,


Do you have any news about that?

Hello Alexis,

We are sorry to hear that you have this issue. Please tell us a bit more about the crashing of the browser? Which browser is crashing while displaying the “MH.pdf” document? Are there any messages? Can you reproduce the crash using some another browser?

We cannot reproduce the issue on our side. We use a Mozilla Firefox.

About the “Cache” folder structure - it has acceptable structure.

After discussion with the developers we want to suggest you a new way, which can deliver the maximum performance.

First of all you should define a native width of the specific document. In order to find out this width, you should use the “.ShowImageWidth(true)” method in the GroupDocs.Viewer widget. For example, for your document “MH.pdf” the image width is 1224 pixels. This value depends not only on the target document, but it also depends on the specific browser, device, resolution etc. So on your machine it can be different. Then you should generate cache on the code-behind:
Groupdocs.Web.UI.DocumentCache dc = new Groupdocs.Web.UI.DocumentCache(license_path, root_storage);
dc.GenerateImages(file_content,
unique_document_name, “pdf”, true,
null, null, 1224, 100);
dc.GenerateImages(file_content,
unique_document_name, “pdf”, true,
null, null, 150, 100);
Only these two methods are required. And then in the GroupDocs.Viewer widget you should use the “.MinimumImageWidth(1224)” method.

Thanks and waiting for your reply.