Clarification Needed: Cloud vs On-Prem Evaluation, Performance & Security Questions

Hello,

We are currently evaluating GroupDocs.Comparison and related document processing products for integration into our platform, and we would appreciate clarification on several technical points.

To clarify upfront:
We are evaluating both Cloud API and on-premises SDK options . We have not yet decided which deployment model fits our requirements best, so answers covering both options are important for us.

In previous discussions, we were mainly pointed to documentation links. While helpful, the documentation does not address the concrete production-level questions we need answered in order to make a deployment decision.

Below are the specific areas where we need clarification:


1. Performance & File Size Limits

We need clear guidance on:

  • Maximum supported file sizes (Cloud vs SDK)
  • Practical limits for large DOCX/XLSX/PDF comparison workloads
  • Execution time limits
  • Whether Cloud timeouts are configurable
  • Recommended architecture for processing very large regulatory-style documents
  • Whether async/background job processing is supported to avoid request timeouts

In testing similar solutions, we observed timeouts around ~20 minutes for large document comparisons. We need to understand what the realistic expectations are with GroupDocs.


2. SDK vs Cloud Differences

Since we are evaluating both deployment models, we need clarity on:

  • Feature parity between Cloud API and SDK
  • Performance differences
  • Concurrency limitations
  • Throughput considerations
  • Any architectural constraints specific to either option

We are open to any self-hosted deployment approach as long as it can run on Linux .


3. Security & Safety Considerations (SDK)

We will be processing user-uploaded documents, so security is critical.

Specifically, we would like to understand:

  • Does the SDK include safeguards against malicious or malformed documents?
  • How are embedded macros, external references, malformed XML, or zip bombs handled?
  • Are there configurable limits for:
    • Memory usage
    • CPU usage
    • Maximum document structure depth
    • Embedded object size
  • Is document processing fully in-memory or is streaming supported?
  • Do you provide recommended sandboxing practices for Linux deployments?

If a maliciously crafted DOCX/XLSX/PDF is uploaded, what protections exist within the SDK itself?


4. Field Updates / TOC Recalculation

We generate a TOC using a proper Word TOC field (XML/field-code based template).

We need confirmation that:

  • The SDK and/or Cloud service can recalculate Word field codes
  • TOC fields are properly populated during processing (e.g., during conversion to PDF)
  • No manual opening in Microsoft Word is required to update the TOC

This is a strict requirement for us.


We are trying to make an informed architectural decision between Cloud and on-premises deployment. Concrete answers regarding limits, performance characteristics, and security safeguards are essential for that evaluation.

Thank you in advance for your detailed clarification.

@ristomattip

Thanks for the report. I have not confirmed this yet and can only confirm it after reproducing on our side. I need a bit more information to answer precisely for Aspose.Comparison.

  • What is the maximum document size (in MB/GB) you anticipate comparing in production, and which file formats (e.g., DOCX, XLSX, PDF) are most critical for you?
  • How many concurrent comparison operations do you expect to run, and what target response time (or maximum acceptable latency) do you need for typical and “large” documents?
  • Are there any specific security or compliance standards (e.g., ISO 27001, GDPR, PCI‑DSS) that your deployment must satisfy, or particular sandboxing constraints you plan to enforce on the Linux host?

A support engineer will review your answers and follow up with detailed information.

Once I have the missing details, I’ll try to reproduce this. If I can confirm it, I’ll log it internally and follow up here.

A support engineer will follow up in this thread if needed.

Hello,

Thank you for the follow-up questions — please see our clarifications below.


1. Typical and Maximum File Sizes

During testing, I performed a comparison using two DOCX files of approximately 170 MB each . That led to timeout when I used the Cloud API for comparison (two 170MB DOCX files), or for conversion (e.g. DOCX → PDF).

In our real-world usage:

  • Typical file sizes may range from a few megabytes up to several hundred megabytes
  • We may occasionally handle documents around 200 MB
  • Larger documents are possible, and the higher the supported limits, the better for us

These are typically large regulatory-style documents with complex structure, embedded content, and tracked revisions.

Could you clarify what the practical maximum supported file sizes are for:

  • GroupDocs Cloud comparison
  • Self-hosted SDK comparison

And whether there are recommended configurations for handling files in the ~150–250 MB range?


2. Self-Hosted Deployment Setup

We have not yet set up a dedicated comparison server , so we are flexible regarding:

  • Programming language (.NET, Java, etc.)
  • Linux distribution
  • CPU/RAM allocation

Our key question is:

Which setup would you recommend for optimal performance and stability when handling very large document comparisons?

For example:

  • Is .NET preferred over Java for memory efficiency?
  • Are there recommended minimum RAM/CPU specifications per comparison job?
  • Do you have reference architectures for high-load or large-document workloads?

3. Asynchronous Processing & Time Expectations

Yes, asynchronous/background processing would be preferred if needed to avoid HTTP timeouts.

For very large documents, processing may take significant time — that is acceptable to us, provided it is reliable and predictable.

That said:

  • Faster processing is obviously preferred
  • We would like guidance on realistic processing time expectations for ~150–200 MB DOCX comparisons
  • Is there a recommended maximum processing time per document?

4. Security & Compliance Requirements

Security and compliance are important for us.

We need clarity on:

  • Whether GroupDocs Cloud meets specific compliance standards (e.g., GDPR alignment, ISO 27001, SOC 2, etc.)
  • Data handling and retention policies
  • Whether documents are stored temporarily and for how long
  • Encryption in transit and at rest

Additionally — and this part was not fully addressed previously — for the self-hosted SDK :

  • What safeguards exist against malicious or malformed documents?
  • How are zip bombs, malformed XML, embedded macros, or resource exhaustion attacks handled?
  • Are there configurable resource limits (memory usage, recursion depth, object size)?
  • Do you provide sandboxing or hardening recommendations for Linux environments?

Since we will be processing user-uploaded documents, this is a critical decision factor for us.


5. Outstanding Clarification

In previous responses, some of these SDK-related security and performance questions were redirected without concrete answers.

Because we are evaluating both Cloud and on-prem options , it is important for us to receive clear technical guidance for both models before making an architectural decision.

We would appreciate more detailed clarification on:

  • Practical performance limits
  • Recommended production setup
  • Built-in security mechanisms in the SDK

Thank you again — we are looking forward to your detailed guidance so we can proceed with the evaluation.

Best regards,
Risto-Matti

hi @ristomattip ,
thank you for your detailed questions.

As I understand, you already have a trial license, so you are able to perform tests using your real documents. Please note that the trial license does not have any performance limitations in the SDK and fully matches the commercial license. The results you get while using the trial version will be the same as with a paid license.

In this mesage I will only address the Self-Hosted SDK aspects below, following the structure of your questions.

Performance & File Size Limits

Technically, the GroupDocs.Comparison SDK does not have strict file size limitations. The comparison time mainly depends on the specific document content and complexity. The library works synchronously, and currently it does not support cancellation tokens.

We don’t frequently receive requests regarding very long processing times for large documents. However, you mentioned ~200 MB DOCX files — we already have an internal ticket related to long comparison times for large Word documents, and our team is working on performance improvements in this area.

If possible, could you please provide sample files? This would help us investigate your scenario more precisely.

Also, the CompareOptions class has the DetalisationLevel option (by default it is set to DetalisationLevel.Middle).
You may try setting it to DetalisationLevel.Low and check whether the comparison time and output are more appropriate for your needs:

Security & Safety Considerations (SDK)

The GroupDocs.Comparison library itself does not include built-in safeguards against malicious documents. It is a document processing component rather than a security or firewall solution, so we assume the hosting environment is responsible for input sanitization and security controls.

The SDK does not work with archive formats directly.
Documents can be provided as file paths or streams, and the SDK loads them into memory to parse the full content before comparison.
Sdk doesn’t have configurable resource limits (memory usage, recursion depth, object size) settings.

Field Updates / TOC Recalculation

GroupDocs.Comparison recognizes Word fields and can compare them (field formula). However, we currently have an open issue related to a specific field-code comparison case:

Regarding recalculating field values during conversion to PDF — this scenario should generally work as expected. I would suggest testing it with your documents using the trial license. If possible, you can also share sample files containing fields, and we will verify the behavior on our side.

Self-Hosted Deployment Setup

GroupDocs.Comparison SDK is available for .NET, Java, Python, and Node.js. The base implementation is developed in .NET, which means releases there are more frequent, and new features or fixes usually appear in the .NET version first.

All SDKs are cross-platform and support Linux environments.
Some OS-level details and requirements can be found here:


Please let us know if you have any additional questions or need further clarification. We will share additional information about Cloud also.