We have an existing solution that converts several types of files into TIFF files using GroupDocs.Conversion. Now we have been experimenting a new requirement for converting incoming files into PDF instead of TIFF. We are using GroupDocs.Conversion version 19.10.0.0 and have encountered the following problems:
-
Converting normal PDF files into PDFA_1 format mostly works fine, but in some cases the resulting file size is ten times larger than original. It doesn’t seem to relate to the PDF contents. E.g. a PDF with three rows of text is originally 87kb and after PDFA_1 conversion 719kb (the same happens with pdfa_2 and _3).
-
When trying to save diskspace using grayscale conversion (PdfOptions.Grayscale=true), conversion seems to use gigabytes of memory if PDF has raster images in it. Also the resulting PDF files are much larger than without Grayscale conversion.
-
The same problem as n:o 2 arises, when trying to set option PdfOptions.OptimizationOptions.CompressImages to true and setting ImageQuality to a number smaller than 100.
So is there some settings we could try, to save PDF into PDFA_1, 2 or 3 as close to the original filesize as possible? Grayscale is actually not needed unless the filesize can be reduced using that.
Here’s the code we are using in all above cases:
using (GroupDocs.Conversion.Converter converter = new GroupDocs.Conversion.Converter(inputfileAndPath))
{
PdfFormats targetFormat = PdfFormats.v1_7;
if (pdfa == “pdfa_1”) targetFormat = PdfFormats.PdfA_1A; // PdfA_n seems to increase file size in some cases!!
if (pdfa == “pdfa_2”) targetFormat = PdfFormats.PdfA_2A;
if (pdfa == “pdfa_3”) targetFormat = PdfFormats.PdfA_3A;
PdfConvertOptions pdfoptions = new PdfConvertOptions
{
Dpi = 96,
PdfOptions =
{
//Grayscale = true, // Caution, excess CPU load + large PDF size!
PdfFormat = targetFormat,
OptimizationOptions =
{
//CompressImages = true,
//ImageQuality = 35,
RemoveUnusedObjects = true,
RemoveUnusedStreams = true,
LinkDuplicateStreams = true,
},
Linearize = true,
},
Format = GroupDocs.Conversion.FileTypes.PdfFileType.Pdf,
};
using (FileStream saveFileStream = new FileStream(outputfilePath, FileMode.Create))
{
GroupDocs.Conversion.Contracts.SaveDocumentStream getDocumentStream = delegate () { return saveFileStream; };
converter.Convert(getDocumentStream, pdfoptions);
}
}