Problems with Thai characters when converting certain PDF to HTML in .NET

Clemens_Pestuka · May 7, 2025, 8:05am

Hi,

A problem was reported to us, regarding a certain PDF with Thai characters.
I’ve taken a screenshot and highlighted a few of the differences between the PDF and the HTML:
image.png (224.9 KB)

The Viewer from “GroupDocs.Total for .NET” 25.3 was used with “HtmlViewOptions.ForEmbeddedResources” options, but nothing else.

Attaching the original and the conversion output:
Thai characters not correctly converted to HTML.zip (3.1 MB)

Can you please take a look at the problem?

Professionalize.Discourse · May 7, 2025, 8:05am

@Clemens_Pestuka

It appears that the issue you are experiencing with the conversion of a PDF containing Thai characters to HTML may be related to the handling of specific fonts, such as AngsanaUPC. Users have reported similar problems where certain decorative characters are lost during the conversion process.

To address this issue, I recommend the following steps:

Check Font Embedding: Ensure that the PDF file has the necessary fonts embedded. If the fonts are not embedded, the conversion process may not render the characters correctly.
Update GroupDocs.Viewer: If you are using an older version of GroupDocs.Viewer, consider updating to the latest version. Newer versions may include fixes and improvements related to font handling and character rendering.
Use Alternative Options: Experiment with different options in the HtmlViewOptions class. While you mentioned using HtmlViewOptions.ForEmbeddedResources, there may be other settings that could improve the output.
Contact Support: Since this seems to be a known issue with specific fonts, it may be beneficial to reach out to GroupDocs support for further assistance. They may have additional insights or workarounds for this particular problem.

If you have any further details or specific examples, sharing them could help in diagnosing the issue more effectively.

Feel free to check back for updates or solutions from the support team regarding this font handling issue.

Sources:
[1]: Converting certain PDF with Thai font to HTML, leads to incorrect results in .NET

vladimir.litvinchik · May 8, 2025, 7:34am

@Clemens_Pestuka

Thank you for attaching the source and output files. I can reproduce the issue. The issue ID for reference is VIEWERNET-5350.