Large PDF files comparison takes a long time in .NET

I am trying to compare two large PDF documents that are approximately 35 pages and 600KB. I tried performing my comparison using the .NET package as well as the web application that is provided free on the web, and both comparisons took 5+ minutes. Is there any way to speed the process up?

Thanks

1 Like

@echeyne

Document comparison process and it’s performance relies on multiple factors. For example, if you compare two PDF files (e.g. 500KB+ each) and these files have only textual information/content within. It may take less time and resources as compare to the PDF files (e.g. 500KB+ each) with more tabular, images, clip-art, formatting styles, charts or graphs information.
However, if you share following details we can further look into this scenario:

  • API version that you are using (e.g. 19.10, 20.5)
  • Sample code
  • Source and target PDF files

@atirtahir3 thanks for getting back to me.

I think it’s to do with the size of our files. We are okay with the comparison taking a long time, but now we are encountering issues with the quality of comparison.

We are using API version 20.4.2.0 and have the SensitivityOfComparison set to 100.

Our input documents are;
dummy doc 1.pdf (92.2 KB)
dummy doc 2.pdf (93.0 KB)

Which result in a comparison of:
dummy doc 1 v1.0 - compared to - dummy doc 1 v1.1.pdf (171.5 KB)

Sample Code:
image.png (46.0 KB)

We really like the GroupDocs comparison API, but the majority of the files that our tool will need to support comparing will include tables so we are hoping there is a way we can improve the comparison result.

Thanks

@echeyne

As your source and target files have tabular content and that lacks (tables are missing, content is there, but it’s either trimmed or scattered) or doesn’t appear in the output document. We have logged this issue and investigating it. Your investigation ticket ID is COMPARISONNET-2370. As there’s any update, you’ll be notified.