Viewer caching

Bipo · October 8, 2024, 9:09am

Hi,

We’d like to know more about the caching mechanism. We use GroupDocs Viewer in Linux container. Now if we deploy to multiple containers, if we scroll down in viewer, to the following pages, there may be error occured, we understand it because we use InMemory cache, is this correct? The first document preview was served by container A, but the next pages are loaded with container B which doesn’t have the cache.

So, what is the recommended way of caching strategy if we need to have High-Availability (multiple machines running the app)? Is using Custom Cache (Redis) a good solution?

Also another scenario that probably related, if for example the document contains a lot of pages, how does the caching mechanism help? Currently we’re trying a 400 pages document (1MB size), with preload size of 10, on every next loading of 10 pages, it takes a lot of time. How to optimize this?

Thank you.

Bipo · October 8, 2024, 9:11am

Forgot to mention, the error message for the multi container issue is:

Value cannot be null. (Parameter ‘s’)

Professionalize.Discourse · October 8, 2024, 10:08am

@Bipo

To address your concerns regarding the caching mechanism in GroupDocs Viewer, especially in a high-availability environment with multiple containers, here are some insights and recommendations:

Caching Mechanism Overview

When using an in-memory cache, each container maintains its own cache. This means that if a document is processed in one container (e.g., Container A), and then subsequent pages are requested in another container (e.g., Container B), the second container will not have access to the cached data from the first. This can lead to errors or delays in rendering as the second container has to process the document from scratch.

Recommended Caching Strategy

For a high-availability setup, it is advisable to implement a distributed caching solution. Using a custom cache, such as Redis, is a good approach. Redis allows multiple containers to share cached data, ensuring that once a document is processed in one container, other containers can access the cached data without needing to reprocess the document. This can significantly improve performance and reduce loading times.

Handling Large Documents

When dealing with large documents, such as a 400-page document, caching can help optimize performance. However, the preload size and the number of pages loaded at once can impact the loading time. Here are some strategies to optimize this:

Adjust Preload Size: Experiment with different preload sizes to find a balance between performance and resource usage. Loading too many pages at once can lead to increased memory consumption and slower performance.
Incremental Loading: Instead of loading a fixed number of pages at once, consider implementing a strategy that loads pages incrementally based on user interaction (e.g., when the user scrolls down).
Use Efficient Caching: Ensure that your caching mechanism is optimized for large documents. This includes using a cache that can handle large data efficiently and ensuring that the cache is thread-safe if multiple users are accessing it simultaneously.
Profile Performance: Monitor and profile the performance of your document loading to identify bottlenecks. This can help you make informed decisions about caching strategies and preload sizes.

By implementing a distributed caching solution and optimizing your loading strategy, you should be able to enhance the performance of the GroupDocs Viewer in your application.

Example Code for Custom Cache Implementation

Here’s a simple example of how to set up a custom cache using Redis in Java:

import com.groupdocs.viewer.Viewer;
import com.groupdocs.viewer.ViewerSettings;
import com.groupdocs.viewer.caching.Cache;
import com.groupdocs.viewer.caching.redis.RedisCache;
import com.groupdocs.viewer.options.HtmlViewOptions;

public class UseRedisCache {
    public static void run() {
        Cache cache = new RedisCache("localhost", 6379); // Connect to Redis server
        ViewerSettings settings = new ViewerSettings(cache);
        
        try (Viewer viewer = new Viewer("path/to/document.docx", settings)) {
            HtmlViewOptions options = HtmlViewOptions.forEmbeddedResources("output/page_{0}.html");
            viewer.view(options);
        }
    }
}

This code snippet demonstrates how to configure the GroupDocs Viewer to use Redis as a caching mechanism, which can help in a multi-container setup.

By following these recommendations, you should be able to improve the performance and reliability of your document viewing application.

Sources

[1]: Enable Caching for Faster Document Processing | GroupDocs.Viewer .NET API
[2]: Advanced Usage (Caching) | GroupDocs.Viewer .NET API - Aspose
[3]: How to make cache thread-safe in your Java application

vladimir.litvinchik · October 9, 2024, 6:54pm

@Bipo

We use GroupDocs Viewer in Linux container. Now if we deploy to multiple containers, if we scroll down in viewer, to the following pages, there may be error occured, we understand it because we use InMemory cache, is this correct?

I can’t say for sure. Does it happen for any or some specific files?

So, what is the recommended way of caching strategy if we need to have High-Availability (multiple machines running the app)? Is using Custom Cache (Redis) a good solution?

Redis stores files in memory. As Redis may provide some benefits but it not quite different from MemoryCache. Same as Redis you can run 1 instance with MemoryCache or Redis in case you want to have more freedom in configuring the cache.

When using GroupDocs.Viewer.UI you can specify you own cache provider. I have added the example.

Typically you would like to use a single instance of cache provider. For example, our online application uses Amazon S3 storage while there are a number of worker instances that convert documents to HTML.

Also another scenario that probably related, if for example the document contains a lot of pages, how does the caching mechanism help? Currently we’re trying a 400 pages document (1MB size), with preload size of 10, on every next loading of 10 pages, it takes a lot of time. How to optimize this?

Since GroupDocs.Viewer.UI has internal caching where the instance of a document is stored for a short period of time you can use session persistence based on file ID or file path, so the user or users that try to view the same file will be forwarded to the same worker instance.

As a second option, when you have a list of files beforehand you can generate the pages manually. See the example here.

I hope this will give you some ideas how you can organize caching.

Can you please share more details about your use-case? What file types do you process and what other difficulties you experience while integrating GroupDocs.Viewer?

Bipo · October 16, 2024, 6:01am

Hi Vladimir,

Unfortunately, we can’t use sticky session, because our worker instances can be scaled up or scaled down anytime.

Our current use case is for report previewing. We generated the report in PDF format and store it to S3 storage (accessible to any worker instance), and created IFileStorage.ReadFileAsync to read from the S3.
In the below screenshot of our log, upon scrolling to page of >100, each loadDocumentPages took more than 10 minutes. In the browser itself we’re receiving HTTP504 error, because it’s too long. We can see that the request went to different worker instance (the different IP), and it seems to work, but it just takes very long time to load subsequent page.
Screenshot 2024-10-16 135053.png (33.0 KB)

I will try to implement the Redis caching and see if it helps.

vladimir.litvinchik · October 16, 2024, 6:37pm

@Bipo

Thank you for update. Please let us know if Redis cache will work for you.

Bipo · October 17, 2024, 10:12am

Hi Vladimir,

I just implemented the Redis cache, but it only helps when user refresh the preview screen.

When user first open the preview screen, system is still struggling to load the half end pages of the document. I can see the cache saving to Redis.
redis-cache.png (24.0 KB)
And if user refresh the preview screen, the loading of pages is from the Redis cache.
redis-cache2.png (20.7 KB)

And it seems if I jump from page 20 to page 100, it’s loading incrementally by batches (10 pages per batch?). Is there a way to improve the rendering?

Bipo · October 17, 2024, 10:17am

Is there a way to force the pages caching in advanced, without waiting for the page to be accessed from preview screen?

vladimir.litvinchik · October 17, 2024, 11:28am

@Bipo

And it seems if I jump from page 20 to page 100, it’s loading incrementally by batches (10 pages per batch?). Is there a way to improve the rendering?

That’s correct. You can change a batch size

builder.Services
    .AddGroupDocsViewerUI((config) =>
    {
        const int countPagesInBatch = 10; // 0 - all the pages
        config.SetPreloadPageCount(countPagesInBatch);
    });

Also, you can configure pages loading based on your requirements directly in the Angular application. In this case you would need to build you custom GroupDocs.Viewer.UI package or use the assemblies.

The code that is handling how pages are loaded on the client side can be found in app.component.ts#L68. Let me know if you need a guidance of how to build Angular app or NuGet package with your changes.

Is there a way to force the pages caching in advanced, without waiting for the page to be accessed from preview screen?

Yes, you can find the code which build up the cache in this Program.cs. The code instantiates all the required services and renders documents.