Free Support Forum - groupdocs.com

Best product to search text in documents


#1

Hello,
we have the requirement to search text inside a document and get a list of all hits with some context information. So ideally we would like to get PageNumber, maybe some position information on page, some surrounding text. It seems that from the GroupDocs.Total Family, GroupsDocs.Parser comes closest to offer such functionality. Am i correct? Is this the best product to use for our use case?
Note: we only want to search in memory and do some basic text search, so building an index like in GroupsDocs.Search is not required, hence i think GroupDocs.Search would not be a fit here.

Thanks!


#2

@wolfgang.gogg,

Using GroupDocs.Search for .NET you can search text and get list of hits with some context information. But creating index is mandatory. However, there are two types of indexing:

  • Index created in memory - An index created in memory cannot be saved after exiting your program
  • Index created on disk - may be loaded in the future to continue working

For further details visit this article.

GroupDocs.Parser also supports searching the keywords in the document’s text. However, it only provides the following outcomes:

  • The position of the keyword in the document text.
  • The found text.
  • The left highlight
  • The right highlight

For details, please visit this documentation article.


#3

Hello,

feature wise GroupDocs.Parser would be perfect. The only feature gap for us is to also get the correct page information, not just the position (what is this exaclty? Word? Character?).
Is there any chance to get this from Parser or use Parser in a paged manner, i.e. parse one page after the other?
Thanks!


#4

@wolfgang.gogg,

We have logged it in our Issue Tracking System (ID: PARSERNET-1292) to check if it is feasible to get the page information as well in the search results. Furthermore, the position returns the index of the first character of the found term in the document text.

Yes, you can also parse a document and extract text page by page. Please have a look at this documentation article for more details. Furthermore, you may also have a look at this blog article that shows how to count words and occurrences of each word in a document. You may modify or enhance the code sample given in this article to parse the document page by page.


#5

Hello,

thanks - waiting for feedback on your feasability check for getting the page number in search results.


#6

@wolfgang.gogg,

Sure, we’ll let you know about the outcomes as soon as possible.