Hello team,
I have raised a ticket few months back → Documents are not getting highlighted
I want highlight a specific text like I have a phrase as follows
“Hello team my name is Ibrahim shaikh and I am from India”
I want to highlight either “Ibr” or “Ibrahim shaikh” or even “Ind” or “India” and the specific searched text should get highlighted.
As already raised by me few months back I had done a workaround and it is not satisfactory enough and also it is taking a lot of time which is very unsatisfactory for the clients.
Do we have this feature yet where we can search for a term instead of complete word.
And also, the document uploaded by the clients are larger for e.g., about 400/500 pages.
Some of the documents are scanned pdf and I want the search functionality to work on them too.
@Niteen_Jadhav
Can you please specify which GroupDocs product you are using for highlighting text in documents?
GroupDocs.Search.
Already mentioned this in detailed in the linked post
@Niteen_Jadhav
We are investigating the possibility to support this feature. Your investigation ticket ID is SEARCHNET-3370. As we’ve any workaround or a permanent fix, you’ll be notified.
@Niteen_Jadhav
To search for a portion of a word, you can utilize wildcards. For more details, refer to the Wildcard Search Documentation. You can also see examples of highlighting in the GroupDocs Search GitHub Repository.
For indexing text within images, including PDF documents, GroupDocs.Search provides OCR support. You can find more information in the OCR Support Documentation.
If you still encounter any issues, please share your complete use case along with an example document, code, search query, and the expected output.
Thank you for the update.
I would like to know few things here, how can I use wildcards for e.g. I just want to highlight card in cardiff and if I use the following code
SearchResult result2 = index.Search("card?(1~6)");
will it highlight cardiff or just card and what does → 1~6 phrase stands for?
@Niteen_Jadhav
Using above approach, it’s highlighting complete word not just ‘card’. Please let us further investigate it. You’ll be notified in case of any update.
@Niteen_Jadhav
According to the documentation, there are two types of wildcard characters used in text form search queries:
?
represents a single character.
?(n~m)
denotes a range of characters, where n
and m
are numbers from 0 to 255, with n
being less than or equal to m
.
Currently, there is no option to highlight only a specific part of a word. However, you can perform post-processing on the output HTML file to emphasize just the portion of the highlighted word that you need. Words are highlighted in the text using the following tags:
<span style='background-color: yellow;'>found-word</span>
In the provided code:
FoundDocument document = result.GetFoundDocument(0); // Retrieve the first found document
FileOutputAdapter outputAdapter = new FileOutputAdapter(OutputFormat.Html, @"c:\DocumentText.html"); // Create the output adapter
Highlighter highlighter = new DocumentHighlighter(outputAdapter); // Instantiate the highlighter
HighlightOptions options = new HighlightOptions(); // Create highlight options
options.TermHighlightStartTag = "<span style='background-color: yellow;'>"; // Start tag for highlighting
options.TermHighlightEndTag = "</span>"; // End tag for highlighting
index.Highlight(document, highlighter, options); // Generate text with highlighted occurrences
This code highlights the specified terms in the document. We have created a new ticket (SEARCHNET-3374) to explore the possibility of adding this feature to the API.