Converted html file is too large when we use groupdocs 3.x

Hi ,

We are planning to migrate from group docs old version(2.x) to latest version(3.x) and we are able to generate html file successfully using below code and we are able to create succesfully but the file size is too large (original file size -110 kb and converted html file size is 800 kb).

ViewerConfig config = GetConfigurations();
ViewerHtmlHandler htmlHandler = new ViewerHtmlHandler(config);
HtmlOptions options = new HtmlOptions();
options.IsResourcesEmbedded = true;
List pages = htmlHandler.GetPages(inputFile, options);
foreach (PageHtml page in pages)
{
File.AppendAllText(@outputfilepath, page.HtmlContent);
}
Is there any way for us to reduce the converted file size as it is taking long to retrieve html body on callouts.
I’m attaching original file and attachment for your reference. Thanks

Hi ,

We are getting this license error on page event hough we apply license
<span class=“awspan awtext2” style=“position: absolute; white-space: nowrap; color: rgb(255, 0, 0); font-family: “Bookman Old Style”; left: 56.02pt; top: 0.51pt;”>
http://screencast.com/t/GpJvNxSD

Hi sysadmin12,


Thanks for taking interest in next generation GroupDocs.Viewer for .NET API.

We are investigating your issue related to html file size and will share our findings with you shortly.

For license issue, please make sure that license is applying properly and the path to license file is valid. Furthermore, please remove all the files created by the API in storage/temp folder and regenerate the html files. I hope this will help you.

If you get any further issue, please let us know.

Warm Regards

Hi sysadmin12,


Thanks for being patient.

I have investigated your issue related to large size of html file. The size of html file is increasing because you are embedding resources within the html content. Image placed in the header of the document pages is converted to base64 string and added to the output html file for each page. Furthermore, the other resources such as styles are also included in the html content for each page which increases the size of html file.

As a solution, please set HtmlOptions.IsResourcesEmbedded = false, which will keep all the resources files (images, styles etc.) in a separate folder named resources and include the reference of these files in the html content. This will reduce the size of output html files.

Please try above mentioned solution and share your feedback.

Warm Regards
Hi,
Thanks for your reply.We have tried this option but the converted html is not displaying properly and the size reduced from 868 to 586kb (this size also larger than original file). Below is the link to converted file when we set HtmlOptions.IsResourcesEmbedded = false
https://s3.amazonaws.com/talentrover/00D36000000pFvi/00P36000002egi9/6bbd688e-7aa6-421b-91b0-ceb36c2ee72c.html
Can you please let us know what we are missing and how to avoid including resources in each single page.
Is there any option to convert 3-6 pages document to one html file instead of multiple single pages.

Hi sysadmin,


Thanks for sharing your experience.

You are facing the issue because you are not keeping resource files along with the converted html file. When the html file is loaded in browser, it is unable to find its referenced resource files (styles, fonts, images etc.) and hence the output is not correct. Please note that the API also provides the feature to get resource files which are used by the converted html page and you can also save these resource files in the directory where you are saving the output html files to solve your issue.

For your convenience, I am attaching a sample code that converts the document pages to html and also keeps the resource files with it. I have also implemented the functionality to merge the html content in a single html file and the size is considerably reduced. Please try the attached sample code at your end and share your feedback with us.

Note: The sample code uses “D:\storage” directory to store output files. You can change it accordingly. Also, please clean your cache/temp directory in storage folder before rendering the document.

Have a nice day.

Warm Regards