Hi, see following document on wetransfer: converting this to pdf using latest groupdocs converter results in a very big filesize. I agree there are a lots of images in the docx, but still, the file size increasement is stunning
Can you suggest something?
@Verthosa
Can you please share more details on this issue (development environment details, sample code, steps to reproduce the issue)? We couldn’t reproduce it using latest API version. Take a look at the resultant PDF.
Sure we are running the groupdocs converter in dotnetcore 7 as a windows service in combination with azure service bus. Conversion is done with following code:
using (MemoryStream sourceStream = new MemoryStream(file))
{
LoadOptions loadOptions = null;
if (extension.Equals(".xls") || extension.Equals(".xlsx"))
{
loadOptions = new SpreadsheetLoadOptions();
((SpreadsheetLoadOptions)loadOptions).SkipEmptyRowsAndColumns = false;
((SpreadsheetLoadOptions)loadOptions).OnePagePerSheet = false;
}
else if (extension.Equals(".doc") || extension.Equals(".docx"))
{
// Load options, to hide the track changes & comments
loadOptions = new WordProcessingLoadOptions();
((WordProcessingLoadOptions)loadOptions).HideWordTrackedChanges = true;
((WordProcessingLoadOptions)loadOptions).HideComments = true;
// When fonts are not found, use substitutes
((WordProcessingLoadOptions)loadOptions).AutoFontSubstitution = true;
((WordProcessingLoadOptions)loadOptions).FontSubstitutes = new List<FontSubstitute>
{
FontSubstitute.Create("Roboto", "Arial")
};
if (properties.EnableWordHeaderConversionToPdfBookmarks)
{
((WordProcessingLoadOptions)loadOptions).BookmarkOptions.HeadingsOutlineLevels = 9;
}
}
else if (extension.Equals(".ppt") || extension.Equals(".pptx"))
{
loadOptions = new PresentationLoadOptions();
((PresentationLoadOptions)loadOptions).ShowHiddenSlides = false;
}
using (var converter = new Converter(() => sourceStream, () => loadOptions))
{
using (MemoryStream output = new MemoryStream())
{
// Convert document through GroupsDoc.Converter
converter.Convert(() => output, new PdfConvertOptions()
{
MarginBottom = 0,
MarginLeft = 0,
MarginRight = 0,
MarginTop = 0,
PdfOptions = {
PdfFormat = properties.SetPdfACompliant ? PdfFormats.PdfA_1A : PdfFormats.Default,
OptimizationOptions = {
CompressImages = false,
ImageQuality = 100
},
Zoom = 100
}
});
using (MemoryStream inputforaspose = new MemoryStream(output.ToArray()))
{
using (Document document = new Document(inputforaspose))
{
byte[] bytes = Encoding.Default.GetBytes(filenameWithoutExtension);
var encodedFilename = Encoding.UTF8.GetString(bytes);
document.Info.Author = "Bizzmine";
document.Info.Title = encodedFilename;
DocumentPrivilege documentPrivilege = DocumentPrivilege.AllowAll;
// Only allow screen reading
documentPrivilege.AllowScreenReaders = true;
// Document assembly
documentPrivilege.AllowAssembly = properties.PdfPermissions
.First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowDocumentAssembly).Allow;
// Extract content
documentPrivilege.AllowCopy = properties.PdfPermissions
.First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowContentCopying).Allow;
if (!documentPrivilege.AllowCopy)
{
documentPrivilege.CopyAllowLevel = 0;
}
// Change document
documentPrivilege.AllowModifyContents = properties.PdfPermissions
.First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowChangeDocument).Allow;
// Printing
documentPrivilege.AllowPrint = properties.PdfPermissions
.First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowPrinting).Allow;
documentPrivilege.AllowDegradedPrinting = properties.PdfPermissions
.First(x => x.Permission == Bizzmine.Shared.Enums.PdfPermission.AllowPrinting).Allow;
if (!documentPrivilege.AllowPrint)
{
documentPrivilege.PrintAllowLevel = 0;
}
if (properties.PdfPermissions.Any(x => !x.Allow))
{
document.Encrypt(string.Empty, "vivaldipdf2020", documentPrivilege, CryptoAlgorithm.AESx256,
false);
}
// Save updated document
using (MemoryStream saveStream = new MemoryStream())
{
document.Save(saveStream);
return saveStream.ToArray();
}
}
}
}
}
}
We use Groupdocs for the conversion and Aspose for setting the permissions, we also tried omitting the aspose part but it makes no difference.
VM details:
Windows Server 2016 - 2.8 Ghz cp ad 4gig of Ram…
Notice we had similar performance issues before (running in webjob, but now running on VM), so i will first try to use a stripped version of our converter in order to identity the problem.
@Verthosa
Thanks for sharing these details. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): CONVERSIONNET-6471
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Hi Atir, you can cancel this ticket, it seems that our chunk uploading code had some bugs resulting in the big filesize, so no issues in converting this document.
Sorry for the inconvenience
1 Like
@Verthosa
Glad to know that the issue is fixed.