DOCX to PDF conversion, unwanted page break

Hi,

we have what it seems a new issue with unwanted page breaks that occur when converting a docx to pdf format. We are using GroupDocs.Conversion 21.11.0 (latest stable).
However when i convert the same document through the online pdf converter (Online bestandsconversie | Gratis GroupDocs Apps) it seems OK.

So my question is actually: what are the default LoadOptions/PdfConvertOptions that are used on your online converter?

In our application we use:

converter.Convert(new SaveDocumentStream(() => output), new PdfConvertOptions()
{
MarginBottom = 0,
MarginLeft = 0,
MarginRight = 0,
MarginTop = 0,

                            PdfOptions = {
                                PdfFormat = PdfFormats.v1_7,
                                OptimizationOptions = {
                                    CompressImages = false,
                                    ImageQuality = 100
                                },
                                Zoom = 100
                            }
                        });

I already tried playing with the margin settings and other PdfOptions (even removed them all) but i canā€™t get the same conversion output like your online tool does.

Thanks for the information,
attached, the related document (docx) and the converted pdf with an extra page break on page 2

https://easyupload.io/m/0d9eve
(using easyupload, as the initial file was too large)

1 Like

@Verthosa

Please take a look at this output.pdf (1.2 MB). Does it look good/expected to you?

Yes indeed, this is the same output your online converter gives and is completely correct.

But i cannot achieve the same results with my current converter 21.11.0. Must be some LoadOptions/ConversionSettings?

@Verthosa

Please try the following code:

using (Converter converter = new GroupDocs.Conversion.Converter(@"D:/HiveTracker - offer BizzMine Premium Cloud.docx"))
{
        converter.Convert(@"D:/output.pdf", new PdfConvertOptions());
} 

Please ignore/skip load options.

Iā€™m having the same (wrong) result, an extra page is created for the sentence ā€˜in attachemnt: Bizzmine Saas Agreementā€™ā€¦

@Verthosa

Could you please share the sample application using that issue could be reproduced?

That will not be so easy as we use our pdf convertor as an azure webjob. Iā€™ll try to simulate it as a console app and will come back to you

Hello,
please find a zip on this link https://we.tl/t-ZvLTn1lyen

i cleared the contents of the license and added the docx file where the problem resides. Change inPath and outPath to your environment.

the resulting PDF will put the line ā€˜In attachmentā€¦ā€™ on a second page

@Verthosa

Itā€™s still on the first page. Please have a look at this screenshot.jpg (132.7 KB) and this output.pdf (1.2 MB).
Could you please highlight the issue with a screenshot?

Hi, see another wetransfer link with a screen record of original DOCX, conversion through the app i sent previously, opening of the output.pdf
https://we.tl/t-zXugBciAUA

As a followup: the conversion in the animated gif is done on a windows server 2016 (in production, the conversion is also done on a windows server 2016).

I just tested the console app on my local windows 10 environment and then the output seems correct! This is in fact the only difference i might see right now. Are you perhaps in the possibility to test this console app on a windows server 2016 to maybe be able to reproduce my behaviour?

EDIT:

  • tested on Windows Server 2012R2: incorrect conversion
  • tested on another Windows 10: incorrect conversion
  • tested on Windows 11: incorrect conversion

Are there any specific system references Groupdocs uses?

Another EDIT: please note, i have the same issue when using Free Online DOCX Converter | Conholdate Apps

@Verthosa

We are investigating this issue with ticket ID CONVERSIONNET-5054.

Meanwhile, please share a list of installed fonts on the above environments.

Hello, great thinking, iā€™ll check differences in installed fonts and try to extract the fonts used in the two different pdf documents,

already big thanks for your assistance

EDIT:
noticed this difference when comparing the wrongly converted and the correctly converted docx. DejaVu-font seem to be missing image.png (12.0 KB)

However, in the docx i cannot find this font to be used (i analysed the contents of the docx as zip and cannot see that dejavu font is used anywhere)

I can confirm that after adding the missing font, the conversion works perfectly. But this DejaVu font is non-standard isnā€™t it? I have no idea where it comes from

1 Like

@Verthosa

Glad to know that the issue is resolved.

We are investigating this scenario.

Would there be any update on this? Since we are planning to release in a couple of weeks
Thanks

@Verthosa

This issue is still under investigation. Please note that all the free support tickets are assisted on first come first served basis. However, weā€™ll let you know in case of any update.

@Verthosa

It seems that the ā€œRobotoā€ font is missing on your system. Please try the following code:

const string source = "HiveTracker - offer BizzMine Premium Cloud.docx";
var loadOptions = new WordProcessingLoadOptions
{
     UseTextShaper = true,
     AutoFontSubstitution = true,
     FontSubstitutes = new List<FontSubstitute>
     {
           FontSubstitute.Create("Roboto", "Arial")
     }
};
using (var converter = new Converter(source, () => loadOptions))
{
     var options = new PdfConvertOptions();
     converter.Convert("converted.pdf", options);
}

Sorry for the late reply, using the FontSubstitution works, however, the UseTextShaper property throws an error here. Just not using it fixes it, so no problem

thanks a lot !!

1 Like

Good to know that the issue is fixed.

Please share sample code and exception stack trace.