Hello,
I identified at least 2 unexpected differences between the current and previous GroupDocs versions via our unit tests…
repro.zip (3.3 MB)
This is a repro project. When you execute it with the current and previous GroupDocs versions, you see the difference in the output:
repro-csharp-project.zip (8.7 KB)
-
The attached docx file is not indexable anymore. According to the event handler, it’s corrupted… but it worked with the previous version
-
The attached CHM file is still indexable, but the search (see the repro C# project we attached) we executed previously does not return a result anymore.
We also have seen multiple “larger” HTML files that cannot be extracted anymore (I attached one).
large.zip (317.0 KB)
GroupDocs.Parser.Exceptions.GroupDocsParserException: The document appears to be corrupted and cannot be loaded.\r\n at \u0008\u001B\u0003.\u0008()\r\n at \u0008\u001B\u0003.\u000E\u0010\u001A\u0003\u0016\u0002(Nullable`1 \u0002, \u000F\u0017\u0002 \u0008)\r\n at GroupDocs.Parser.Parser.GetText(TextOptions options)\r\n at \u000F\u000F\u0002.\u0002(Parser \u0002, Boolean \u0008, FileFormat \u0005)\r\n at \u000E\u000F\u0002.\u0003\u0003\u001B\u0003\u0016\u0002(String \u0002, Boolean \u0008)\r\n at \u0002\u001A\u001B.\u0002(Document \u0002, \u0006\u0003\u001B \u0008, String \u0005)"