Document search API index browsing in .NET

Dear Colleagues

We have the following questions:

  1. Is groupdocs search completely developed by Aspose or is there another search engine behind (for example lucene)?
  2. What are the advantages from groupdocs search compared to the open source project lucene?
  3. Can groupdocs search completely replace lucene? We are thinking of replacing lucene by groupdocs search in our application.

We have developed a knowledge management software. We need to index our content in the wiki as html5, docx, xlsx, txt, … and also custom metafields from our application. We are now indexing these files with aspose and for searching the content we are using lucene.net 3.3 at the moment. Is it possible to replace lucene by GroupDocs Search?

Thanks a lot for your help.

Kind regards
Incite GmbH
Marc Huber

1 Like

@marchuber

Aspose is our sister company and yes, GroupDocs.Search uses some of Aspose components at the back-end. But as compare to lucene, it is a commercial API.

Please have a look at some of the important API features:

If you can share the main lucene features that are being used in your application, we can then guide you accordingly. Moreover, we’d recommend you to take a look at this developer guide.

  1. Which version of lucene are you using, 8.5 or 3.0.3?
  2. In the dot.net version of groupdocs search are you using lucene .net or java?
  3. Is your API in regards of performance similar to lucene?

Thanks.

@marchuber

GroupDocs.Search doesn’t use lucene.

High indexing and search performance can be achieved by unique algorithms and data structures, optimizations and multi-threaded execution.

Dear Atir

Our web application incite is a knowledge management solution. As database we use the Microsoft SQL Server. There are the following 3 types of objects:

  1. incite object with a file attached like Microsoft Word
  2. incite object with a wiki text deposited (html5)
  3. incite object with customer-specific attributes such as tags, description, category, …
    The Word file in point 1 and the Wiki text in point 2 are stored in the SQL Server as a filestream, the attributes in point 3 as database fields in the SQL Server.

Question A:
In your examples we have seen that GroupDocs Search always assumes that the files are located in a folder. Could you also run GroupDocs by streaming the files for the index from SQL Server and omitting the path for the files? We do not want to put all files in a path in addition to the SQL Server. Could you perhaps send us a .Net example here.

Question B:
Is there a way to view the index of GroupDocs Search with a tool to quickly check values in it visually. Your competitor Lucene has a tool called Luke.exe with which you can view an index.

Thanks a lot.

Kind regards.
Incite GmbH
Marc Huber

1 Like

@marchuber

Currently API doesn’t support SQL Server management. This use-case is already under investigation. Your investigation ticket ID is SEARCHNET-2111.

We are investigating this scenario with ID SEARCHNET-2339. As there’s any update, you’ll be notified.

We are not talking to store the index in SQL Server. Would it be possible that we save the files from SQL Server to a folder (c:\temp\TemporaryFileExportFromSql), index it with groupdocs from this folder and delete the files afterwards? So, no sql server management is necessary from groupdocs side.

Can we use then the groupdocs index and search without the local stored files in folder c:\temp\TemporaryFileExportFromSql?

Where can we find the information for SEARCHNET-2111 and SEARCHNET-2339? Can you send us a login or the text inside this two investigation tickets please.

Thanks.

1 Like

@marchuber

If you download the files in a folder and index that folder using GroupDocs.Search API. You can then delete the storage folder. Search process will work fine until and unless you update the index.
So, you just have to index the storage folder one/first time. Later, without updating or adding storage folder to index again,you can keep searching the documents.

Example:

string searchQuery = "test";
string indexFolder = @"D:\\SearchTester\\IndexStore";
string documentsFolder = @"D:\\SearchTester\\Storage";
index.Add(documentsFolder);
SearchResult searchResult = index.Search(query);
TraceResult(query, searchResult);

This code simply creates index for the storage folder. Now, when you delete the folder, you just have to perform search no need to Add or Update index. In order to achieve this, you can implement your own business logic.

Would this scenario be possible:

  1. Our windows service does the code as you have written and indexes for example a file with the content “test”
  2. Employee A makes a search with the word “test”
  3. Employee B updates an incite entry and add’s the words “hello world”
  4. Our windows service makes an update on groupdocs search index with that code you have written
  5. Employee A makes another search with the term “hello world” during the update in point 4 at the same time. Of course he can’t find the entry from employee B at this time. But it is possible to update the index during a search.
  6. After updating is finished the employee A makes a search and he will find the search term “hello world”

Thanks.

1 Like

@marchuber

Yes, this is possible. You can update or perform search in an index simultaneously.
How does it work?
Simultaneous indexing (updating) and searching are possible only when you call methods of the same instance of the Index class. And it is impossible when you create multiple instances of the Index class for the same index on the hard drive.

@marchuber

For your ease, we have implemented an Index Browser application to quickly view a GroupDocs.Search index. Please download/clone it here.