Clarification Needed: Cloud vs On-Prem Evaluation, Performance & Security Questions

Hello,

We are currently evaluating GroupDocs.Comparison and related document processing products for integration into our platform, and we would appreciate clarification on several technical points.

To clarify upfront:
We are evaluating both Cloud API and on-premises SDK options . We have not yet decided which deployment model fits our requirements best, so answers covering both options are important for us.

In previous discussions, we were mainly pointed to documentation links. While helpful, the documentation does not address the concrete production-level questions we need answered in order to make a deployment decision.

Below are the specific areas where we need clarification:


1. Performance & File Size Limits

We need clear guidance on:

  • Maximum supported file sizes (Cloud vs SDK)
  • Practical limits for large DOCX/XLSX/PDF comparison workloads
  • Execution time limits
  • Whether Cloud timeouts are configurable
  • Recommended architecture for processing very large regulatory-style documents
  • Whether async/background job processing is supported to avoid request timeouts

In testing similar solutions, we observed timeouts around ~20 minutes for large document comparisons. We need to understand what the realistic expectations are with GroupDocs.


2. SDK vs Cloud Differences

Since we are evaluating both deployment models, we need clarity on:

  • Feature parity between Cloud API and SDK
  • Performance differences
  • Concurrency limitations
  • Throughput considerations
  • Any architectural constraints specific to either option

We are open to any self-hosted deployment approach as long as it can run on Linux .


3. Security & Safety Considerations (SDK)

We will be processing user-uploaded documents, so security is critical.

Specifically, we would like to understand:

  • Does the SDK include safeguards against malicious or malformed documents?
  • How are embedded macros, external references, malformed XML, or zip bombs handled?
  • Are there configurable limits for:
    • Memory usage
    • CPU usage
    • Maximum document structure depth
    • Embedded object size
  • Is document processing fully in-memory or is streaming supported?
  • Do you provide recommended sandboxing practices for Linux deployments?

If a maliciously crafted DOCX/XLSX/PDF is uploaded, what protections exist within the SDK itself?


4. Field Updates / TOC Recalculation

We generate a TOC using a proper Word TOC field (XML/field-code based template).

We need confirmation that:

  • The SDK and/or Cloud service can recalculate Word field codes
  • TOC fields are properly populated during processing (e.g., during conversion to PDF)
  • No manual opening in Microsoft Word is required to update the TOC

This is a strict requirement for us.


We are trying to make an informed architectural decision between Cloud and on-premises deployment. Concrete answers regarding limits, performance characteristics, and security safeguards are essential for that evaluation.

Thank you in advance for your detailed clarification.

@ristomattip

Thanks for the report. I have not confirmed this yet and can only confirm it after reproducing on our side. I need a bit more information to answer precisely for Aspose.Comparison.

  • What is the maximum document size (in MB/GB) you anticipate comparing in production, and which file formats (e.g., DOCX, XLSX, PDF) are most critical for you?
  • How many concurrent comparison operations do you expect to run, and what target response time (or maximum acceptable latency) do you need for typical and “large” documents?
  • Are there any specific security or compliance standards (e.g., ISO 27001, GDPR, PCI‑DSS) that your deployment must satisfy, or particular sandboxing constraints you plan to enforce on the Linux host?

A support engineer will review your answers and follow up with detailed information.

Once I have the missing details, I’ll try to reproduce this. If I can confirm it, I’ll log it internally and follow up here.

A support engineer will follow up in this thread if needed.

Hello,

Thank you for the follow-up questions — please see our clarifications below.


1. Typical and Maximum File Sizes

During testing, I performed a comparison using two DOCX files of approximately 170 MB each . That led to timeout when I used the Cloud API for comparison (two 170MB DOCX files), or for conversion (e.g. DOCX → PDF).

In our real-world usage:

  • Typical file sizes may range from a few megabytes up to several hundred megabytes
  • We may occasionally handle documents around 200 MB
  • Larger documents are possible, and the higher the supported limits, the better for us

These are typically large regulatory-style documents with complex structure, embedded content, and tracked revisions.

Could you clarify what the practical maximum supported file sizes are for:

  • GroupDocs Cloud comparison
  • Self-hosted SDK comparison

And whether there are recommended configurations for handling files in the ~150–250 MB range?


2. Self-Hosted Deployment Setup

We have not yet set up a dedicated comparison server , so we are flexible regarding:

  • Programming language (.NET, Java, etc.)
  • Linux distribution
  • CPU/RAM allocation

Our key question is:

Which setup would you recommend for optimal performance and stability when handling very large document comparisons?

For example:

  • Is .NET preferred over Java for memory efficiency?
  • Are there recommended minimum RAM/CPU specifications per comparison job?
  • Do you have reference architectures for high-load or large-document workloads?

3. Asynchronous Processing & Time Expectations

Yes, asynchronous/background processing would be preferred if needed to avoid HTTP timeouts.

For very large documents, processing may take significant time — that is acceptable to us, provided it is reliable and predictable.

That said:

  • Faster processing is obviously preferred
  • We would like guidance on realistic processing time expectations for ~150–200 MB DOCX comparisons
  • Is there a recommended maximum processing time per document?

4. Security & Compliance Requirements

Security and compliance are important for us.

We need clarity on:

  • Whether GroupDocs Cloud meets specific compliance standards (e.g., GDPR alignment, ISO 27001, SOC 2, etc.)
  • Data handling and retention policies
  • Whether documents are stored temporarily and for how long
  • Encryption in transit and at rest

Additionally — and this part was not fully addressed previously — for the self-hosted SDK :

  • What safeguards exist against malicious or malformed documents?
  • How are zip bombs, malformed XML, embedded macros, or resource exhaustion attacks handled?
  • Are there configurable resource limits (memory usage, recursion depth, object size)?
  • Do you provide sandboxing or hardening recommendations for Linux environments?

Since we will be processing user-uploaded documents, this is a critical decision factor for us.


5. Outstanding Clarification

In previous responses, some of these SDK-related security and performance questions were redirected without concrete answers.

Because we are evaluating both Cloud and on-prem options , it is important for us to receive clear technical guidance for both models before making an architectural decision.

We would appreciate more detailed clarification on:

  • Practical performance limits
  • Recommended production setup
  • Built-in security mechanisms in the SDK

Thank you again — we are looking forward to your detailed guidance so we can proceed with the evaluation.

Best regards,
Risto-Matti

hi @ristomattip ,
thank you for your detailed questions.

As I understand, you already have a trial license, so you are able to perform tests using your real documents. Please note that the trial license does not have any performance limitations in the SDK and fully matches the commercial license. The results you get while using the trial version will be the same as with a paid license.

In this mesage I will only address the Self-Hosted SDK aspects below, following the structure of your questions.

Performance & File Size Limits

Technically, the GroupDocs.Comparison SDK does not have strict file size limitations. The comparison time mainly depends on the specific document content and complexity. The library works synchronously, and currently it does not support cancellation tokens.

We don’t frequently receive requests regarding very long processing times for large documents. However, you mentioned ~200 MB DOCX files — we already have an internal ticket related to long comparison times for large Word documents, and our team is working on performance improvements in this area.

If possible, could you please provide sample files? This would help us investigate your scenario more precisely.

Also, the CompareOptions class has the DetalisationLevel option (by default it is set to DetalisationLevel.Middle).
You may try setting it to DetalisationLevel.Low and check whether the comparison time and output are more appropriate for your needs:

Security & Safety Considerations (SDK)

The GroupDocs.Comparison library itself does not include built-in safeguards against malicious documents. It is a document processing component rather than a security or firewall solution, so we assume the hosting environment is responsible for input sanitization and security controls.

The SDK does not work with archive formats directly.
Documents can be provided as file paths or streams, and the SDK loads them into memory to parse the full content before comparison.
Sdk doesn’t have configurable resource limits (memory usage, recursion depth, object size) settings.

Field Updates / TOC Recalculation

GroupDocs.Comparison recognizes Word fields and can compare them (field formula). However, we currently have an open issue related to a specific field-code comparison case:

Regarding recalculating field values during conversion to PDF — this scenario should generally work as expected. I would suggest testing it with your documents using the trial license. If possible, you can also share sample files containing fields, and we will verify the behavior on our side.

Self-Hosted Deployment Setup

GroupDocs.Comparison SDK is available for .NET, Java, Python, and Node.js. The base implementation is developed in .NET, which means releases there are more frequent, and new features or fixes usually appear in the .NET version first.

All SDKs are cross-platform and support Linux environments.
Some OS-level details and requirements can be found here:


Please let us know if you have any additional questions or need further clarification. We will share additional information about Cloud also.

Hi alexndr,

Thank you for the detailed response regarding the self-hosted SDK.

However, a significant part of our evaluation depends on the Cloud API model, and those aspects were not addressed in your reply. Since we are actively deciding between Cloud vs. on-prem, we would appreciate clearer technical guidance on the missing points below.

1. GroupDocs Cloud – Practical File Size Limits

Our original question was specifically about Cloud comparison and conversion:

  • What is the maximum supported upload size per document for GroupDocs Cloud comparison?
  • Are there enforced limits for DOCX/PDF conversion jobs?
  • Are there recommended approaches for handling documents in the 150–250 MB range without HTTP timeouts?

In our testing, comparing two ~170 MB DOCX files via Cloud resulted in timeouts, so we need to understand:

  • Whether this is an expected limitation
  • Whether larger documents are officially supported
  • Whether asynchronous job handling exists in Cloud

2. Cloud Processing Model & Async Support

For large regulatory documents, synchronous HTTP processing is not realistic.

Could you clarify:

  • Does GroupDocs Cloud provide asynchronous/background comparison jobs?
  • Are there recommended timeout configurations or polling workflows?
  • What are typical processing times for ~150–200 MB DOCX comparisons in Cloud?

3. Cloud Security, Compliance & Data Retention

Security and compliance are critical decision factors for us.

Could you provide concrete information on the Cloud service regarding:

  • GDPR alignment
  • ISO 27001 / SOC 2 or other certifications
  • Data retention policies (are documents stored temporarily, and for how long?)
  • Encryption in transit and at rest

We would appreciate detailed clarification on these remaining points so we can continue the evaluation.

Best regards,
Risto-Matti

Hi @ristomattip

Thank you for your patience. Below is a brief clarification focused specifically on the GroupDocs Cloud aspects of your evaluation.

Some metrics are currently being verified internally to provide accurate figures, so where exact numbers are pending, I will clearly state that.

1. GroupDocs Cloud – Practical File Size Limits

This is not a straightforward question, as GroupDocs Cloud services operate in a microservices-based architecture, and practical limits depend on multiple runtime factors, including container memory constraints, execution node memory allocation, current infrastructure load, and the internal processing pipeline used for specific document formats.

We are currently performing a metrics review to determine realistic and supportable operating ranges for large-document workloads. We will be back with details once verified.

2. Cloud Processing Model & Asynchronous Support

For GroupDocs.Conversion Cloud, asynchronous processing is available and documented:

https://docs.groupdocs.cloud/conversion/convert-document-async/
https://reference.groupdocs.cloud/conversion/

For GroupDocs.Comparison Cloud this feature is not yet present and if there is demand, we can prioritize it in our short-term planning.

3. Cloud Security, Compliance & Data Handling

Data Storage Model

Uploaded files are stored in GroupDocs Cloud Storage, which:

  • Is isolated per application (App SID / App Key)
  • Is accessible only to the authenticated application owner
  • Requires API authentication for all operations

Files are not publicly accessible.

Encryption

  • Data in transit: HTTPS/TLS encrypted
  • Data at rest: Stored within secured cloud infrastructure

Data Retention

Files remain in storage until explicitly deleted by the user/application.

If strict retention control is required, recommended practice is:

  • Upload
  • Process
  • Download result
  • Explicitly delete source and output files via API

4. Alternative Option: Self-Hosted Cloud

We also provide a self-hosted GroupDocs Cloud deployment via Docker:

https://hub.docker.com/r/groupdocs/comparison-cloud

This allows:

  • Full control over CPU and memory allocation
  • Deployment inside your own Linux environment
  • Custom timeout policies
  • Alignment with strict internal security controls

5. Current Status

We are:

  • Reviewing current infrastructure-level processing ceilings and validating practical performance ranges for large DOCX comparisons
  • Confirming formal compliance certifications

We will follow up with concrete technical data once verified internally.

If you can share anonymized samples, that would allow us to benchmark your exact scenario and provide deterministic guidance.

hi @ristomattip ,
I also wanted to share one more tip about GroupDocs.Comparison SDK which might be more applicable for you in terms of Word document comparison results.

By default, when the GroupDocs.Comparison SDK compares Word documents, the result document is generated as a new file (since the library applies its own styling for inserted/removed content, formatting changes, etc.). In such cases, if the source documents are large, rendering the resulting document may take a noticeable amount of time.

There is an option to output the comparison result using Microsoft Word Track Changes mode instead. In this case, all changes are highlighted directly in the document text and are also visible in the Word revisions panel. This approach works much faster, especially for large Word documents.

You can try this with your trial license by enabling the following comparison option: