I am using GroupDocs.Conversion in a .NET Core application to convert document files (e.g., Word, Excel) to HTML for preview purposes. However, I have noticed that the converted HTML files are significantly larger than the original documents.
For example:
A 10MB Excel file is converted into a 45MB HTML file.
A 7MB Word document is converted into a 16MB HTML file.
I would like to know:
Is there any way to reduce the file size while converting documents to HTML?
Does GroupDocs provide any built-in file compression options for HTML conversion?
Here is the code snippet I am currently using for the conversion:
using (Converter converter = new Converter(documentPath))
{
var options = new WebConvertOptions();
converter.Convert(outputFilePath, options);
}
Details:
GroupDocs.Total Version - “25.2.0”
.Net Version - “8.0”
Could you please provide guidance on how to optimize or compress the HTML output?
We couldn’t reproduce this issue at our end using sample 32MB Word file. The converted/resultant HTML is only 9MB. Please take a look at this screenshot.
Please share following details and we’ll further investigate scenario:
Do you face this issue for every large source file? Or it happens for some specific files?
Problematic/source and output files (upload them to some cloud storage e.g. Google Drive and share link here)
There’s no built-in compression feature/option available. However, if you could provide us the concerned files we can further look into it.
@koc-it-support
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): TOTALNET-205
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
We investigated this ticket. Please note that this is the expected behavior of the API. why?
The DOCX file contains 30 pages, each with a single image. When converted to HTML, the images are base64-encoded and embedded directly in tags, which increases the overall size.
The XLSX file has 35,001 rows of text. During HTML conversion, additional HTML elements are added for each row to ensure proper rendering, which also adds to the size.
Finally, both DOCX and XLSX are ZIP-compressed formats, so their internal data is significantly smaller on disk. HTML, on the other hand, is stored as plain text and is not compressed by default, which results in noticeably larger output files.