Adding new documents gets slower and slower

Hello,
I have a folder which it’s updated each second and I want to index certain new files at time intervals.
when indexed documents count is 0,adding operation for first 1k files takes about 500 ms, but at for example 20k indexed documents, adding operation for 1k files takes about 12 seconds.
I am using GroupDoc.Search 21.8.1.0.

Index Settings is:
IndexSettings settings = new IndexSettings();
DocumentFilter fileExtensionFilter = DocumentFilter.CreateFileExtension(".geo", “.txt”, “.fans”, “.info”, “.eml”, “.isms”);
settings.DocumentFilter = fileExtensionFilter;
settings.AutoDetectEncoding = false;
settings.CustomExtractors.Add(new GeoExtractor());
settings.CustomExtractors.Add(new ismsExtractor());
settings.CustomExtractors.Add(new TextExtractor());
settings.CustomExtractors.Add(new FansExtractor());
settings.CustomExtractors.Add(new InfoExtractor());
settings.CustomExtractors.Add(new EmlExtractor());
settings.IndexType = IndexType.NormalIndex;
settings.UseStopWords = false;

Update 01 :
I emptied Alphabet, still getting slow
Update 02 :
I created 100 K txt files with 0 file size, changed IndexType to Compact, still getting slow after 10 K records.
Is there any settings to upgrade performance?

We are investigating this issue. Your investigation ticket ID is SEARCHNET-2688.

@ariasohrab

We have tested the indexing process. And based on the results, we built a graph of the speed of indexing 30k+ text documents in various languages - Result.zip (36.0 KB).

As can be seen from the graph, the speed does not change much from the beginning to the end of the process.
We can also provide the following recommendations:

  1. No need to delete the alphabet. This will result in nothing being indexed
  2. To increase the efficiency of the index, you can sometimes optimize it (Index.Optimize() method). This is a rather lengthy process (It depends on the number of documents in the index)
  3. Files in Windows are stored in the cache after the first reading, so they are read faster again. If you disable file caching in the properties of the disk, then the speed will always be about the same

Thanks for the response.
I think I didn’t describe my conditions clearly. I am trying to add 10K+ files to the indexer in a loop. Meaning, every time I pick 1K of files and add it to the indexer. the first couple of loops are fast, but for every 1K added file, the next loop take 300 ms more time to execute.
I will try disabling disk cache asap to check the result and share it with you.

@ariasohrab

Yes, please follow this and let us know if issue persists.

I disabled disk caching but nothing changed.
Disk is SSD and OS is Windows Server 2016.
My code is like this: ( This part runs every 1 minute and Indexes 1K items)
if (unindexedlist.Count > 0)
{
while (indexer.Indexer.IndexInfo.IndexStatus != IndexStatus.Ready)
{
Thread.Sleep(1);
}
Stopwatch indexstopwatch = Stopwatch.StartNew();
indexer.Indexer.Add(unindexedlist.ToArray(), indexingOptions);
while (indexer.Indexer.IndexInfo.IndexStatus != IndexStatus.Ready)
{
Thread.Sleep(1);
}
indexstopwatch.Stop();
$@“INDEX STOP TIME = {indexstopwatch.ElapsedMilliseconds}”.DebugLog();
$@“Count of Indexed Items = {indexer.Indexer.GetIndexedDocuments().Length}”.DebugLog();
while (indexer.Indexer.IndexInfo.IndexStatus != IndexStatus.Ready)
{
Thread.Sleep(1);
}
OptimizeIndexes(indexer);
}

@ariasohrab

We are further investigating this scenario. We’ll notify you in case of any progress.

I’ve prepared a test solution.(Excluded GroupDoc License and DLL)
I am using GroupDoc 21.8.1.
Solution : Indexer_PerformanceTester_Solution.zip (5.2 KB)
Binary: Exe.zip (7.8 KB)
My Log after running it on Server 2016: Log.zip (585 Bytes)
Please note that you should uncomment CreateTempFolder(); on first use to create temp folders used by solution.

1 Like

@ariasohrab

Thanks for the details. You’ll be notified as there’s any update.

1 Like

Hello again.
Did your team got the same results? Is there any updates?
Thanks

@ariasohrab

Yes, the issue is reproduced at our end. We are working on the fix.

1 Like

Hello,
Thanks for your support in advanced.
Is there any estimations when the issue will be fixed and a new dll will be available?

@ariasohrab

This ticket is still under investigation. We’ll notify you as there’s any ETA.

1 Like

@ariasohrab

The pre-release version is published on NuGet. The official version with .NET 6 support will be ready in September.
Please utilize this pre-release and let us know if you see any improvement.

Hello,
Although the hops between delays are much smaller, but unfortunately the issue persists on my end with 22.7.0-alpha-20220815080235 and after 200 loops, the adding time for 100 items reaches more than 2 seconds. my license key has been disabled with 22.7.0. Is there any way to upgrade my 21.8.1.0 license?
Thank you in advance.

1 Like

@ariasohrab

All purchase/license renewal related queries are assisted at purchase forum.

It is impossible to get rid of this effect completely, since the index always contains data that is shared to all portions of indexed documents. As the index grows, the amount of shared data increases, so adding even one document to a large index will take quite a long time.
However, we are still working on a fix. You’ll be notified as there’s any further update.

index has only 100 items(because of license), yet again the time is increasing after each loop.

@ariasohrab

We will further investigate this ticket.

@ariasohrab

This issue SEARCHNET-2688 is fixed in API version 22.10.