We are seeing the error below fairly often when we upload PDFs (I’d love to upload examples, but the info within is sensitive, and the only non-sensitive example is 5MB, which is over your 4MB upload size limit):
Error during text extraction from [path_to_pdf]
GroupDocs.Parser.Exceptions.UnsupportedDocumentFormatException: Exception of type 'GroupDocs.Parser.Exceptions.UnsupportedDocumentFormatException' was thrown.
The issue is that from then on, whenever we upload a new document to the same folder, we get those same errors due to it reindexing, even though the new document has been indexed successfully.
We don’t want to just ignore errors coming from Groupdocs.Search but we also don’t want to constantly report on issues that are not related to the current upload.
Can you advise us on the best course of action?
If you are interested in the code we are using for the indexing, I’ve included it below.
public string AddOrUpdateIndex(string indexFolderLocation, string filesToIndexFolderLocation)
{
var settings = GetStandardIndexSettings();
var indexIsNew = Directory.GetFiles(indexFolderLocation).Length < 1;
var index =new GroupDocs.Search.Index(indexFolderLocation, settings);
var errorMessage = string.Empty;
index.Events.ErrorOccurred += (sender, args) =>
{
errorMessage = args.Message;
};
if (indexIsNew)
{
index.Add(filesToIndexFolderLocation);
}
else
{
UpdateOptions options = new UpdateOptions();
options.Threads = 2;
index.Update(options);
index.Optimize();
}
if (string.IsNullOrEmpty(errorMessage))
{
return "Success";
}
return errorMessage;
}
private IndexSettings GetStandardIndexSettings()
{
var settings = new IndexSettings();
settings.UseRawTextExtraction = false;
return settings;
}