Highlight search results and save in HTML using .NET

bharathiGK · April 17, 2020, 1:13pm

Hi…After searching,i’m saving document with highlighted text in as .html file.will it be convert documents with screenshot,images?
After i convert into html i need to show it in a browser…formatting (alignments ) of the document is changing for me.

atir.tahir · April 17, 2020, 7:06pm

@bharathiGK,

Can you please elaborate your scenario?

bharathiGK · April 24, 2020, 5:51am

Hi…i have attached the image of document which is used in document store

image.png (398.4 KB)

My Document contains screenshot,after saved as a html,will this document format as same like with screenshot will come or not

image.png (338.2 KB)

its an Pdf attachement,after highlighting saved as an html and i need to show that html in iframe of the viewer…same format still remains…i wish to get my pdf format.

atir.tahir · April 24, 2020, 8:30am

@bharathiGK,

Please share sample PDF file, console application and the output HTML. In your screenshots, looking at the evaluation mark, we noticed that you are also using GroupDocs.Parser for .NET, is that so?
Secondly, as per your screenshots, you are performing search over Word documents but you mentioned in query that PDF is being used, highlighted and saved as HTML.

bharathiGK · April 24, 2020, 8:57am

Hi…i will attach the original documents.

Venkatesh.pdf (251.3 KB)
public static void UpdateIndexedDocuments()
{

        //path settings
        string indexFolder = "C:\\Store\\IndexStore\\";
        string documentsFolder = "C:\\DocumentStorage1\\";

        if (Directory.Exists(indexFolder))
        { }
        else
        {
            Directory.CreateDirectory(indexFolder);
        }

        //index settings AutoDetectEncoding
        IndexSettings settings = new IndexSettings();
        settings.AutoDetectEncoding = true;

        //index class with async option

        Index index = new Index(indexFolder, settings);
        IndexingOptions options = new IndexingOptions();
        options.IsAsync = true;

        //Creating an Indexing

        string[] files = System.IO.Directory.GetFiles(indexFolder, "*.body");
        if (files.Length == 0)
        {
            index.Add(documentsFolder);
        }
        else
        {

            UpdateOptions updateoptions = new UpdateOptions();
            updateoptions.Threads = 2; // Setting the number of indexing threads
            index.Update(updateoptions);

            MergeOptions mergeoptions = new MergeOptions();
            mergeoptions.Cancellation = new Cancellation(); // Creating cancellation object to be able to cancel the operation
            mergeoptions.Cancellation.CancelAfter(1200000); // Setting maximum duration of the operation to 30 seconds

            index.Optimize(mergeoptions);

        }
        // Searching in index
        string query = "Contact";
        SearchResult result = index.Search(query);
        if (result.DocumentCount > 0)
        {
            FoundDocument document = result.GetFoundDocument(0); // Getting the first found document
            string path = @".\BasicUsage\Highlighted.html";
            OutputAdapter outputAdapter = new FileOutputAdapter(path); // Creating the output adapter to a file
            HtmlHighlighter highlighter = new HtmlHighlighter(outputAdapter); // Creating the highlighter object
            index.Highlight(document, highlighter); // Generating output HTML formatted document with highlighted search results

            Console.WriteLine();
            Console.WriteLine("Generated HTML file can be opened with Internet browser.");
            Console.WriteLine("The file can be found by the following path:");
            Console.WriteLine(Path.GetFullPath(path));
        }
        Utils.TraceResult(query, result);

    }

converted html couldn’t able to attach here

atir.tahir · April 24, 2020, 3:15pm

@bharathiGK,

Please compress HTML file to zip format and then try to upload.

bharathiGK · April 24, 2020, 3:34pm

New folder.zip (1.4 KB)

atir.tahir · April 24, 2020, 6:05pm

@bharathiGK,

If I am not wrong, you want to search for a keyword/text in the source document and get a HTML output with highlighted text but the output HTML must follow the source file formatting.
We are investigating this scenario. Your investigating ticket ID is SEARCHNET-2270. As there’s any update, you’ll be notified.

bharathiGK · July 6, 2020, 10:46am

Hi.Any Update on this?.when can i expect solution for this.

atir.tahir · July 6, 2020, 7:18pm

@bharathiGK

We’ve improved the text extraction in terms of format in API version 20.6. Have a look at the code below:

string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
IndexSettings settings = new IndexSettings();
settings.UseRawTextExtraction = false;
// Creating an index
Index index = new Index(indexFolder, settings);
// Indexing documents in the document folder
index.Add(documentFolder);