I was wondering if there is any way to highlight certain words of document, when converting it to HTML with GroupDocs.Viewer.
I tried using GroupDocs.Search after the conversion with GroupDocs.Viewer, but the resulting HTML had a completely new format. We would need to keep that formatting, done by the Viewer.
I wanted to also give GroupDocs.Annotation a try, to highlight some text, but it seems it requires an absolute position, which I don’t have. I only know which words I want to highlight unfortunately.
What we are ultimately trying to highlight search results, that were found by GroupDocs.Parser.
GroupDocs.Parse is searching on the output of GroupDocs.Viewer and gives us good results.
The only problem we have, that we don’t get any context, where this was found in the HTML.
There is a position in the search results, but it does not reflect a usable position in the original HTML, as it seems to be based on some intermediate Markup from GroupDocs.Parser.
I hope you have some idea for us, how we could tackle that problem.
Could you please provide more details on how you are currently obtaining the search results from GroupDocs.Parser and the specific format of the HTML output from GroupDocs.Viewer?
using (Parser parser = new Parser(documentPath))
{
HighlightOptions options = new HighlightOptions(40);
IEnumerable<SearchResult> sr = parser.Search("deforestation. China", new SearchOptions(false, false, false, true, options, options));
Here are a few output files form Viewer we were using: output.zip (107.7 KB)
Can you please describe in more details how you are going to use the highlighted text. As you wrote about context it seems to me that you would like to know if there any entries found and their location, is that correct?
Yes, if we’d have the exact location of a hit, we could do the highlighting ourselves.
That might be the ideal scenario, as we have more control about the highlight.
If that’s not possible, we’d also be fine with when Viewer could do the highlight for us and we could compare against the original where the highlight was done.
If there are any more question or my description wasn’t clear, please let me know.
Good question.
We could either do the highlight from the original file, which could be any format.
Or we could do the highlight on the already converted file, which would always be HTML.
I never actually tried to run some output from GroupDocs.Viewer, through GroupDocs.Viewer again.
Maybe it would make more sense, to get a “correct” position from GroupDocs.Parser.
Searching in the previously attached “output_viewer26.html” for “deforestation. China”, gives me that position 898: image.png (23.3 KB)
If I’d copy&paste all characters up to “deforestation. China” into notepad, I can see this exactly matches: image.png (27.2 KB)
But there are two problems with that.
it does not always match. It can be completely off, when the document has hyperlinks
position is hard to determine programmatically, as the page source is way more complex
I’ve got response from the developer, unfortunately, he can’t provide a correct position as it is a different context.
We are considering to add a feature to Viewer that will highlight text in the source document and then convert to HTML. This approach has advantage as search is expected to work similar to the native one, like in MS Word compared to search in HTML where in some cases text could be in different blocks which makes it hard to find the entry.
To perform the analysis for this feature we would need the list of file formats which you’re processing.
We have implemented the requested feature, it is described in separate article: Search and highlight text in the loaded document. This feature is released with the GroupDocs.Viewer version 25.4, which was released today.
Sorry for the late reply, but I had very limited time lately.
I was able to give the new feature a try, following this guide.
I tried a few search times that we were struggling with highlighting and they were working perfectly.
So as far as my testing goes, everything is looking great!
Configurable colors and regex are also really good options!