Hello,
We are currently evaluating GroupDocs.Comparison and related document processing products for integration into our platform, and we would appreciate clarification on several technical points.
To clarify upfront:
We are evaluating both Cloud API and on-premises SDK options . We have not yet decided which deployment model fits our requirements best, so answers covering both options are important for us.
In previous discussions, we were mainly pointed to documentation links. While helpful, the documentation does not address the concrete production-level questions we need answered in order to make a deployment decision.
Below are the specific areas where we need clarification:
1. Performance & File Size Limits
We need clear guidance on:
- Maximum supported file sizes (Cloud vs SDK)
- Practical limits for large DOCX/XLSX/PDF comparison workloads
- Execution time limits
- Whether Cloud timeouts are configurable
- Recommended architecture for processing very large regulatory-style documents
- Whether async/background job processing is supported to avoid request timeouts
In testing similar solutions, we observed timeouts around ~20 minutes for large document comparisons. We need to understand what the realistic expectations are with GroupDocs.
2. SDK vs Cloud Differences
Since we are evaluating both deployment models, we need clarity on:
- Feature parity between Cloud API and SDK
- Performance differences
- Concurrency limitations
- Throughput considerations
- Any architectural constraints specific to either option
We are open to any self-hosted deployment approach as long as it can run on Linux .
3. Security & Safety Considerations (SDK)
We will be processing user-uploaded documents, so security is critical.
Specifically, we would like to understand:
- Does the SDK include safeguards against malicious or malformed documents?
- How are embedded macros, external references, malformed XML, or zip bombs handled?
- Are there configurable limits for:
- Memory usage
- CPU usage
- Maximum document structure depth
- Embedded object size
- Is document processing fully in-memory or is streaming supported?
- Do you provide recommended sandboxing practices for Linux deployments?
If a maliciously crafted DOCX/XLSX/PDF is uploaded, what protections exist within the SDK itself?
4. Field Updates / TOC Recalculation
We generate a TOC using a proper Word TOC field (XML/field-code based template).
We need confirmation that:
- The SDK and/or Cloud service can recalculate Word field codes
- TOC fields are properly populated during processing (e.g., during conversion to PDF)
- No manual opening in Microsoft Word is required to update the TOC
This is a strict requirement for us.
We are trying to make an informed architectural decision between Cloud and on-premises deployment. Concrete answers regarding limits, performance characteristics, and security safeguards are essential for that evaluation.
Thank you in advance for your detailed clarification.