Multiple unexpected changes between 25.11.0 and 26.6.0

Hello,

I identified at least 2 unexpected differences between the current and previous GroupDocs versions via our unit tests…
repro.zip (3.3 MB)

This is a repro project. When you execute it with the current and previous GroupDocs versions, you see the difference in the output:
repro-csharp-project.zip (8.7 KB)

  1. The attached docx file is not indexable anymore. According to the event handler, it’s corrupted… but it worked with the previous version

  2. The attached CHM file is still indexable, but the search (see the repro C# project we attached) we executed previously does not return a result anymore.

We also have seen multiple “larger” HTML files that cannot be extracted anymore (I attached one).
large.zip (317.0 KB)

GroupDocs.Parser.Exceptions.GroupDocsParserException: The document appears to be corrupted and cannot be loaded.\r\n at \u0008\u001B\u0003.\u0008()\r\n at \u0008\u001B\u0003.\u000E\u0010\u001A\u0003\u0016\u0002(Nullable`1 \u0002, \u000F\u0017\u0002 \u0008)\r\n at GroupDocs.Parser.Parser.GetText(TextOptions options)\r\n at \u000F\u000F\u0002.\u0002(Parser \u0002, Boolean \u0008, FileFormat \u0005)\r\n at \u000E\u000F\u0002.\u0003\u0003\u001B\u0003\u0016\u0002(String \u0002, Boolean \u0008)\r\n at \u0002\u001A\u001B.\u0002(Document \u0002, \u0006\u0003\u001B \u0008, String \u0005)"

Hi, @jamsharp !

Thank you for the feedback! We reproduced the issue and started to fix it.
It’s good to have such files because none of our test ones in regression raised similar exceptions and errors. I will keep you updated tomorrow with the results.

1 Like

Hi @jamsharp !

Please be aware that the hot fix was published on Nuget

Thanks. I can confirm that the 2 files I tried work now.

Please feel free to close this topic.

It’s good to have such files because none of our test ones in regression raised similar exceptions and errors. I will keep you updated tomorrow with the results.

BTW: Do you have larger amounts of testing data, e.g. Govdocs1 – Digital Corpora ? We are currently creating a larger test files folder to see when some of them are not indexable anymore.

Hi @jamsharp !

That’s very helpful for us to run such stress tests on this different content and to build a large-scale indexing scenario across various documents.

I assume some of the latest forum topics you created were based on similar documents, as mentioned in the post you shared.

We included this subject in our backlog for the investigation and implementation analysis!

Thank you!

1 Like