How should GetDocumentInfo be used?

I’m interested in getting information about the rendering performed by the viewer. I have no trouble calling ViewerHtmlHandler.GetPages to see HTML content. However, when I call ViewerHtmlHandler.GetDocumentInfo I get basic information, but no content in the Rows collection under Pages. What should I do to get information populated in that collection?

Hi Dwaldo,


Thank you for your query.

Actually, List Rows contains text coordinates information. This collection is being used in Image mode only, to place text over the image, so the text can be selectable.

Please note the following points.

  • GetPages - returns collection of rendered pages.
  • GetDocumentInfo - returns document and pages information only. This method doesn’t render anything.
Main purpose of this DocumentInfoContainer is to get document’s information before rendering any page.
For example if you want to render page number 10, you need to know if this page exist in the document. Or if you need to display total page count, etc

Please feel free to ask any other question.

Happy Coding!

Thank you. That helps.

My overall goal is to be able to influence the rendered HTML in some way. I'd like to be able to identify text as the page is rendered and insert hyperlinks or other markup. Is there an API available that would give me that capability?

Hi dwaldo ,


Thank you for your response,

You can fetch the Html content of the each document. I am going to write an example below. May this help you.

// Setup GroupDocs.Viewer config
ViewerConfig config = new ViewerConfig();
config.StoragePath = @“C:\storage”;
// Create html handler
ViewerHtmlHandler htmlHandler = new ViewerHtmlHandler(config);
string guid = “word.doc”;
List pages = htmlHandler.GetPages(guid);
foreach (PageHtml page in pages)
{
Console.WriteLine(“Page number: {0}”, page.PageNumber);
//Here you can get Html content/text
Console.WriteLine(“Html content: {0}”, page.HtmlContent);
}

Happy Coding!
I can get the HTML content and images without any trouble. I still haven't any combination of calls that will cause the Rows collection in a PageData object to be populated. My goal is similar to the older version of the viewer. I want the viewer UI I'm writing to be able to search for text in the preview. I can generate and read the HTML for a preview, but I'd also like to get the mapping of text to image that the older viewer used. Here's the latest version of my code:

ViewerImageHandler imageHandler = new ViewerImageHandler(viewerConfig, dataHandler);
DocumentInfoContainer docInfo2 = imageHandler.GetDocumentInfo(new DocumentInfoOptions(tempFn));
List images = imageHandler.GetPages(stream3, tempFn, GetImageOptions());

I've tried calling GetDocumentInfo before and after GetPages, but have not yet gotten any information in the Rows collection of the PageData object.

What sequence of calls should I be using?

Hi Dwaldo,


Thank you for explanation of your point of view.

As far as mapping of text to image is concerned, you can get html text even in the case of image rendering. Please use the code like following.

PrintableHtmlOptions options=new PrintableHtmlOptions(@“word.doc”);
PrintableHtmlContainer container= imageHandler.GetPrintableHtml(options);


Warm Regards

Hi Dwaldo!


You should make sure ViewerConfig object’s UsePdf property is set true, then you should have your Rows collection of the PageData object been populated when you call GetDocumentInfo.
And this works only for ViewerImageHandler not for ViewerHtmlHandler.