DOCX to PDF conversion, unwanted page break

Verthosa · January 7, 2022, 8:14am

Hi,

we have what it seems a new issue with unwanted page breaks that occur when converting a docx to pdf format. We are using GroupDocs.Conversion 21.11.0 (latest stable).
However when i convert the same document through the online pdf converter (Online bestandsconversie | Gratis GroupDocs Apps) it seems OK.

So my question is actually: what are the default LoadOptions/PdfConvertOptions that are used on your online converter?

In our application we use:

converter.Convert(new SaveDocumentStream(() => output), new PdfConvertOptions()
{
MarginBottom = 0,
MarginLeft = 0,
MarginRight = 0,
MarginTop = 0,

                            PdfOptions = {
                                PdfFormat = PdfFormats.v1_7,
                                OptimizationOptions = {
                                    CompressImages = false,
                                    ImageQuality = 100
                                },
                                Zoom = 100
                            }
                        });

I already tried playing with the margin settings and other PdfOptions (even removed them all) but i can’t get the same conversion output like your online tool does.

Thanks for the information,
attached, the related document (docx) and the converted pdf with an extra page break on page 2

https://easyupload.io/m/0d9eve
(using easyupload, as the initial file was too large)

atir.tahir · January 7, 2022, 1:36pm

@Verthosa

Please take a look at this output.pdf (1.2 MB). Does it look good/expected to you?

Verthosa · January 7, 2022, 1:40pm

Yes indeed, this is the same output your online converter gives and is completely correct.

But i cannot achieve the same results with my current converter 21.11.0. Must be some LoadOptions/ConversionSettings?

atir.tahir · January 7, 2022, 1:41pm

@Verthosa

Please try the following code:

using (Converter converter = new GroupDocs.Conversion.Converter(@"D:/HiveTracker - offer BizzMine Premium Cloud.docx"))
{
        converter.Convert(@"D:/output.pdf", new PdfConvertOptions());
}

Please ignore/skip load options.

Verthosa · January 7, 2022, 3:07pm

I’m having the same (wrong) result, an extra page is created for the sentence ‘in attachemnt: Bizzmine Saas Agreement’…

atir.tahir · January 8, 2022, 3:20pm

@Verthosa

Could you please share the sample application using that issue could be reproduced?

Verthosa · January 10, 2022, 7:54am

That will not be so easy as we use our pdf convertor as an azure webjob. I’ll try to simulate it as a console app and will come back to you

Verthosa · January 10, 2022, 8:36am

Hello,
please find a zip on this link https://we.tl/t-ZvLTn1lyen

i cleared the contents of the license and added the docx file where the problem resides. Change inPath and outPath to your environment.

the resulting PDF will put the line ‘In attachment…’ on a second page

atir.tahir · January 10, 2022, 11:08am

@Verthosa

It’s still on the first page. Please have a look at this screenshot.jpg (132.7 KB) and this output.pdf (1.2 MB).
Could you please highlight the issue with a screenshot?

Verthosa · January 10, 2022, 12:20pm

Hi, see another wetransfer link with a screen record of original DOCX, conversion through the app i sent previously, opening of the output.pdf
https://we.tl/t-zXugBciAUA

Verthosa · January 10, 2022, 1:05pm

As a followup: the conversion in the animated gif is done on a windows server 2016 (in production, the conversion is also done on a windows server 2016).

I just tested the console app on my local windows 10 environment and then the output seems correct! This is in fact the only difference i might see right now. Are you perhaps in the possibility to test this console app on a windows server 2016 to maybe be able to reproduce my behaviour?

EDIT:

tested on Windows Server 2012R2: incorrect conversion
tested on another Windows 10: incorrect conversion
tested on Windows 11: incorrect conversion

Are there any specific system references Groupdocs uses?

Another EDIT: please note, i have the same issue when using Free Online DOCX Converter | Conholdate Apps

atir.tahir · January 10, 2022, 4:30pm

@Verthosa

We are investigating this issue with ticket ID CONVERSIONNET-5054.

Meanwhile, please share a list of installed fonts on the above environments.

Verthosa · January 11, 2022, 7:09am

Hello, great thinking, i’ll check differences in installed fonts and try to extract the fonts used in the two different pdf documents,

already big thanks for your assistance

EDIT:
noticed this difference when comparing the wrongly converted and the correctly converted docx. DejaVu-font seem to be missing image.png (12.0 KB)

However, in the docx i cannot find this font to be used (i analysed the contents of the docx as zip and cannot see that dejavu font is used anywhere)

I can confirm that after adding the missing font, the conversion works perfectly. But this DejaVu font is non-standard isn’t it? I have no idea where it comes from

atir.tahir · January 11, 2022, 9:09am

@Verthosa

Glad to know that the issue is resolved.

We are investigating this scenario.

Verthosa · January 20, 2022, 7:29am

Would there be any update on this? Since we are planning to release in a couple of weeks
Thanks

atir.tahir · January 20, 2022, 8:10am

@Verthosa

This issue is still under investigation. Please note that all the free support tickets are assisted on first come first served basis. However, we’ll let you know in case of any update.

atir.tahir · February 22, 2022, 2:44pm

@Verthosa

It seems that the “Roboto” font is missing on your system. Please try the following code:

const string source = "HiveTracker - offer BizzMine Premium Cloud.docx";
var loadOptions = new WordProcessingLoadOptions
{
     UseTextShaper = true,
     AutoFontSubstitution = true,
     FontSubstitutes = new List<FontSubstitute>
     {
           FontSubstitute.Create("Roboto", "Arial")
     }
};
using (var converter = new Converter(source, () => loadOptions))
{
     var options = new PdfConvertOptions();
     converter.Convert("converted.pdf", options);
}

Verthosa · April 11, 2022, 3:06pm

Sorry for the late reply, using the FontSubstitution works, however, the UseTextShaper property throws an error here. Just not using it fixes it, so no problem

thanks a lot !!

atir.tahir · April 11, 2022, 3:14pm

Good to know that the issue is fixed.

Please share sample code and exception stack trace.