Hi…After searching,i’m saving document with highlighted text in as .html file.will it be convert documents with screenshot,images?
After i convert into html i need to show it in a browser…formatting (alignments ) of the document is changing for me.
Hi…i have attached the image of document which is used in document store
image.png (398.4 KB)
My Document contains screenshot,after saved as a html,will this document format as same like with screenshot will come or not
image.png (338.2 KB)
its an Pdf attachement,after highlighting saved as an html and i need to show that html in iframe of the viewer…same format still remains…i wish to get my pdf format.
Please share sample PDF file, console application and the output HTML. In your screenshots, looking at the evaluation mark, we noticed that you are also using GroupDocs.Parser for .NET, is that so?
Secondly, as per your screenshots, you are performing search over Word documents but you mentioned in query that PDF is being used, highlighted and saved as HTML.
Hi…i will attach the original documents.
Venkatesh.pdf (251.3 KB)
public static void UpdateIndexedDocuments()
{
//path settings
string indexFolder = "C:\\Store\\IndexStore\\";
string documentsFolder = "C:\\DocumentStorage1\\";
if (Directory.Exists(indexFolder))
{ }
else
{
Directory.CreateDirectory(indexFolder);
}
//index settings AutoDetectEncoding
IndexSettings settings = new IndexSettings();
settings.AutoDetectEncoding = true;
//index class with async option
Index index = new Index(indexFolder, settings);
IndexingOptions options = new IndexingOptions();
options.IsAsync = true;
//Creating an Indexing
string[] files = System.IO.Directory.GetFiles(indexFolder, "*.body");
if (files.Length == 0)
{
index.Add(documentsFolder);
}
else
{
UpdateOptions updateoptions = new UpdateOptions();
updateoptions.Threads = 2; // Setting the number of indexing threads
index.Update(updateoptions);
MergeOptions mergeoptions = new MergeOptions();
mergeoptions.Cancellation = new Cancellation(); // Creating cancellation object to be able to cancel the operation
mergeoptions.Cancellation.CancelAfter(1200000); // Setting maximum duration of the operation to 30 seconds
index.Optimize(mergeoptions);
}
// Searching in index
string query = "Contact";
SearchResult result = index.Search(query);
if (result.DocumentCount > 0)
{
FoundDocument document = result.GetFoundDocument(0); // Getting the first found document
string path = @".\BasicUsage\Highlighted.html";
OutputAdapter outputAdapter = new FileOutputAdapter(path); // Creating the output adapter to a file
HtmlHighlighter highlighter = new HtmlHighlighter(outputAdapter); // Creating the highlighter object
index.Highlight(document, highlighter); // Generating output HTML formatted document with highlighted search results
Console.WriteLine();
Console.WriteLine("Generated HTML file can be opened with Internet browser.");
Console.WriteLine("The file can be found by the following path:");
Console.WriteLine(Path.GetFullPath(path));
}
Utils.TraceResult(query, result);
}
converted html couldn’t able to attach here
If I am not wrong, you want to search for a keyword/text in the source document and get a HTML output with highlighted text but the output HTML must follow the source file formatting.
We are investigating this scenario. Your investigating ticket ID is SEARCHNET-2270. As there’s any update, you’ll be notified.
Hi.Any Update on this?.when can i expect solution for this.
We’ve improved the text extraction in terms of format in API version 20.6. Have a look at the code below:
string indexFolder = @"c:\MyIndex";
string documentFolder = @"c:\MyDocuments";
IndexSettings settings = new IndexSettings();
settings.UseRawTextExtraction = false;
// Creating an index
Index index = new Index(indexFolder, settings);
// Indexing documents in the document folder
index.Add(documentFolder);