Hi,
I was wondering if there is any way to highlight certain words of document, when converting it to HTML with GroupDocs.Viewer.
I tried using GroupDocs.Search after the conversion with GroupDocs.Viewer, but the resulting HTML had a completely new format. We would need to keep that formatting, done by the Viewer.
I wanted to also give GroupDocs.Annotation a try, to highlight some text, but it seems it requires an absolute position, which I don’t have. I only know which words I want to highlight unfortunately.
What we are ultimately trying to highlight search results, that were found by GroupDocs.Parser.
GroupDocs.Parse is searching on the output of GroupDocs.Viewer and gives us good results.
The only problem we have, that we don’t get any context, where this was found in the HTML.
There is a position in the search results, but it does not reflect a usable position in the original HTML, as it seems to be based on some intermediate Markup from GroupDocs.Parser.
I hope you have some idea for us, how we could tackle that problem.
@Clemens_Pestuka
Could you please provide more details on how you are currently obtaining the search results from GroupDocs.Parser and the specific format of the HTML output from GroupDocs.Viewer?
This is how we’re using GroupDocs.Parser:
using (Parser parser = new Parser(documentPath))
{
HighlightOptions options = new HighlightOptions(40);
IEnumerable<SearchResult> sr = parser.Search("deforestation. China", new SearchOptions(false, false, false, true, options, options));
Here are a few output files form Viewer we were using:
output.zip (107.7 KB)
@Clemens_Pestuka
Can you please describe in more details how you are going to use the highlighted text. As you wrote about context it seems to me that you would like to know if there any entries found and their location, is that correct?
1 Like
@vladimir.litvinchik
Yes, if we’d have the exact location of a hit, we could do the highlighting ourselves.
That might be the ideal scenario, as we have more control about the highlight.
If that’s not possible, we’d also be fine with when Viewer could do the highlight for us and we could compare against the original where the highlight was done.
If there are any more question or my description wasn’t clear, please let me know.
@Clemens_Pestuka
Thank you for the details. We’ll take a look if we could do it in Viewer. For which file types do you need the highlight in the first place?
1 Like
@vladimir.litvinchik
Good question.
We could either do the highlight from the original file, which could be any format.
Or we could do the highlight on the already converted file, which would always be HTML.
I never actually tried to run some output from GroupDocs.Viewer, through GroupDocs.Viewer again.
@vladimir.litvinchik
Maybe it would make more sense, to get a “correct” position from GroupDocs.Parser.
Searching in the previously attached “output_viewer26.html” for “deforestation. China”, gives me that position 898:
image.png (23.3 KB)
If I’d copy&paste all characters up to “deforestation. China” into notepad, I can see this exactly matches:
image.png (27.2 KB)
But there are two problems with that.
- it does not always match. It can be completely off, when the document has hyperlinks
- position is hard to determine programmatically, as the page source is way more complex
@Clemens_Pestuka
I’ve got response from the developer, unfortunately, he can’t provide a correct position as it is a different context.
We are considering to add a feature to Viewer that will highlight text in the source document and then convert to HTML. This approach has advantage as search is expected to work similar to the native one, like in MS Word compared to search in HTML where in some cases text could be in different blocks which makes it hard to find the entry.
To perform the analysis for this feature we would need the list of file formats which you’re processing.
1 Like
@vladimir.litvinchik
Thank you , that would be great!
We would need this feature for all MS Office formats, PDF, Emails and text-based files (txt, log).
@Clemens_Pestuka
Got it, thanks. We’re going to analyze this feature and schedule implementation. Will update you when we have any updates.
1 Like