Font information gets lost for unicode characters when converting from HTML to RTF

When converting text containing some Unicode characters (like ą or ł) then some default font is used instead of the one from the source html. The rest of the text also has a different font then the source.

source html and converted rtf:
files.zip (5.4 KB)

@piterpid
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CONVERSIONNET-7221

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@piterpid

We can look at this (CONVERSIONNET-7221) as two separate issues:

  1. The main text font changes to “Times New Roman”
  2. Special Unicode characters (like ą or ł) switch to “Calibri”

Regarding the main text font:
The issue arises because the “Book Antiqua” font isn’t available on the machine where the document is being converted. Since this is a Microsoft Office font, it requires Office to be installed.

  • On Windows: The font should be available in the default “c:\Windows\Fonts” folder, so no additional steps are needed.
  • On macOS: You’ll find the font in the Microsoft Word application’s folder at /Applications/Microsoft Word.app/Contents/Resources/DFonts. To use this font in the conversion, add this directory in your ConverterSettings, like so:
// Create a new instance of ConverterSettings
var converterSettings = new ConverterSettings();
// Add the directory containing Microsoft Word fonts on macOS to the font directories
converterSettings.FontDirectories.Add("/Applications/Microsoft Word.app/Contents/Resources/DFonts");

Then, apply these settings in the converter, as shown below:

// Begin the fluent conversion process
FluentConverter
    // Apply the converter settings to use the custom font directory
    .WithSettings(() => converterSettings)

    // Load the source HTML file
    .Load("source.html")

    // Specify the output format as RTF and set up options
    .ConvertTo("source.html.rtf")
    .WithOptions(new WordProcessingConvertOptions
    {
        // Set the target file type to RTF
        Format = WordProcessingFileType.Rtf
    })

    // Perform the actual conversion
    .Convert();

If Office is not installed, you’ll need to manually install the “Book Antiqua” font and, if not in the default folder, specify the correct folder path in your settings.

Regarding the Unicode characters issue:
We are further investigating the issue with the Unicode characters and will update you as soon as we have any new information.