Searching thousands of Word documents

I’m new to your products and curious about GroupDocs.Search. Can I use it to search the text in thousands (150,000+) of Word documents in a folder on the web server? The site is running on ASP.NET in .NET 8.

If this is indeed possible, how long will it take for the initial creating of the index?
When adding new Word documents to the folder, can they be added to the index or does the entire folder need to have its index refreshed?

Do you have sample code for searching through Word documents using C#?

@pmarangoni

Yes, GroupDocs.Search for .NET can search text in thousands of Word documents.
Please take a look at the system requirements.

The time for initial indexing depends on factors like document size, server performance, and settings. For large datasets, it could take more time. Using multi-threading can speed up the process.

You don’t need to re-index the entire folder when adding new documents. The index.Add(documentsFolder); method allows you to incrementally update the index with new or modified files.

Please explore our GitHub example project. Below is a simple search example:

string indexFolder = @"D:/index";  // Folder where the index will be created or stored
string documentsFolder = @"D:/test";  // Folder containing Word documents to be indexed

// Creating an index in the specified folder
Index index = new Index(indexFolder);

// Indexing options - UseRawTextExtraction is set to false 
var options = new IndexingOptions() { UseRawTextExtraction = false };

// Adding documents from the folder to the index with the specified indexing options
index.Add(documentsFolder, options);

// Simple search query to find documents containing the word "document"
{
    string query = "document";  // The search query string
    SearchResult result = index.Search(query);  // Executing the search on the indexed documents
    
    // Outputting the search query and result details
    Console.WriteLine("Query: " + query);  // The search query
    Console.WriteLine("Documents: " + result.DocumentCount);  // Number of documents containing the query word
    Console.WriteLine("Occurrences: " + result.OccurrenceCount);  // Number of occurrences of the query word in all documents
    Console.WriteLine();  // Empty line for readability
}

Thanks! By the way, I wanted to view your live demos for Search, and went to https://demos.groupdocs.com/ but got this error:

# 504 ERROR

## The request could not be satisfied.


CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can’t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.


Generated by cloudfront (CloudFront) Request ID: 3tORp0sjd9scajxfE9HIbvGikkqVok4v16Phz1HJqiBkjCnh873d9w==

This doesn’t seem to work either. It just displays “Preparing the document” but never displays the highlighted results.

@pmarangoni

This is not our live demo app.

We successfully reproduced this issue on our end. We have moved this query to the GroupDocs app forum, and we’ve started investigating the issue.

However, please check out our highlight in HTML example to evaluate your use case.

Capture.PNG (31.9 KB)

I accessed that link from here. It clearly says “Live Demos”… :upside_down_face:

@pmarangoni

We are already fixing this issue. However. please access live applications here.