DOCX (43Mb) conversion to PDF results in a whopping 2.59gig

Hi, see following document on wetransfer: converting this to pdf using latest groupdocs converter results in a very big filesize. I agree there are a lots of images in the docx, but still, the file size increasement is stunning :slight_smile:

Can you suggest something?

@Verthosa

Can you please share more details on this issue (development environment details, sample code, steps to reproduce the issue)? We couldn’t reproduce it using latest API version. Take a look at the resultant PDF.

Sure we are running the groupdocs converter in dotnetcore 7 as a windows service in combination with azure service bus. Conversion is done with following code:

using (MemoryStream sourceStream = new MemoryStream(file))
{
    LoadOptions loadOptions = null;

    if (extension.Equals(".xls") || extension.Equals(".xlsx"))
    {
        loadOptions = new SpreadsheetLoadOptions();
        ((SpreadsheetLoadOptions)loadOptions).SkipEmptyRowsAndColumns = false;
        ((SpreadsheetLoadOptions)loadOptions).OnePagePerSheet = false;
    }
    else if (extension.Equals(".doc") || extension.Equals(".docx"))
    {
        // Load options, to hide the track changes & comments
        loadOptions = new WordProcessingLoadOptions();
        ((WordProcessingLoadOptions)loadOptions).HideWordTrackedChanges = true;
        ((WordProcessingLoadOptions)loadOptions).HideComments = true;
        // When fonts are not found, use substitutes
        ((WordProcessingLoadOptions)loadOptions).AutoFontSubstitution = true;
        ((WordProcessingLoadOptions)loadOptions).FontSubstitutes = new List<FontSubstitute>
        {
            FontSubstitute.Create("Roboto", "Arial")
        };

        if (properties.EnableWordHeaderConversionToPdfBookmarks)
        {
            ((WordProcessingLoadOptions)loadOptions).BookmarkOptions.HeadingsOutlineLevels = 9;
        }
    }
    else if (extension.Equals(".ppt") || extension.Equals(".pptx"))
    {
        loadOptions = new PresentationLoadOptions();
        ((PresentationLoadOptions)loadOptions).ShowHiddenSlides = false;
    }

    using (var converter = new Converter(() => sourceStream, () => loadOptions))
    {
        using (MemoryStream output = new MemoryStream())
        {
            // Convert document through GroupsDoc.Converter
            converter.Convert(() => output, new PdfConvertOptions()
            {
                MarginBottom = 0,
                MarginLeft = 0,
                MarginRight = 0,
                MarginTop = 0,

                PdfOptions = {
                     PdfFormat = properties.SetPdfACompliant ?  PdfFormats.PdfA_1A : PdfFormats.Default,

                    OptimizationOptions = {
                        CompressImages = false,
                        ImageQuality = 100
                    },
                    Zoom = 100
                }
            });
            using (MemoryStream inputforaspose = new MemoryStream(output.ToArray()))
            {
                using (Document document = new Document(inputforaspose))
                {
                    byte[] bytes = Encoding.Default.GetBytes(filenameWithoutExtension);
                    var encodedFilename = Encoding.UTF8.GetString(bytes);

                    document.Info.Author = "Bizzmine";
                    document.Info.Title = encodedFilename;

                    DocumentPrivilege documentPrivilege = DocumentPrivilege.AllowAll;
                    // Only allow screen reading
                    documentPrivilege.AllowScreenReaders = true;

                    // Document assembly
                    documentPrivilege.AllowAssembly = properties.PdfPermissions
                        .First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowDocumentAssembly).Allow;

                    // Extract content
                    documentPrivilege.AllowCopy = properties.PdfPermissions
                        .First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowContentCopying).Allow;
                    if (!documentPrivilege.AllowCopy)
                    {
                        documentPrivilege.CopyAllowLevel = 0;
                    }

                    // Change document
                    documentPrivilege.AllowModifyContents = properties.PdfPermissions
                        .First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowChangeDocument).Allow;

                    // Printing
                    documentPrivilege.AllowPrint = properties.PdfPermissions
                        .First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowPrinting).Allow;
                    documentPrivilege.AllowDegradedPrinting = properties.PdfPermissions
                        .First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowPrinting).Allow;
                    if (!documentPrivilege.AllowPrint)
                    {
                        documentPrivilege.PrintAllowLevel = 0;
                    }

                    if (properties.PdfPermissions.Any(x => !x.Allow))
                    {
                        document.Encrypt(string.Empty, "vivaldipdf2020", documentPrivilege, CryptoAlgorithm.AESx256,
                        false);
                    }

                    // Save updated document
                    using (MemoryStream saveStream = new MemoryStream())
                    {
                        document.Save(saveStream);
                        return saveStream.ToArray();
                    }
                }
            }
        }
    }
}

We use Groupdocs for the conversion and Aspose for setting the permissions, we also tried omitting the aspose part but it makes no difference.

VM details:
Windows Server 2016 - 2.8 Ghz cp ad 4gig of Ram…

Notice we had similar performance issues before (running in webjob, but now running on VM), so i will first try to use a stripped version of our converter in order to identity the problem.

@Verthosa
Thanks for sharing these details. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CONVERSIONNET-6471

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi Atir, you can cancel this ticket, it seems that our chunk uploading code had some bugs resulting in the big filesize, so no issues in converting this document.

Sorry for the inconvenience

1 Like

@Verthosa

Glad to know that the issue is fixed.