PDF conversion issues, file size, performance in .NET

We have an existing solution that converts several types of files into TIFF files using GroupDocs.Conversion. Now we have been experimenting a new requirement for converting incoming files into PDF instead of TIFF. We are using GroupDocs.Conversion version 19.10.0.0 and have encountered the following problems:

  1. Converting normal PDF files into PDFA_1 format mostly works fine, but in some cases the resulting file size is ten times larger than original. It doesn’t seem to relate to the PDF contents. E.g. a PDF with three rows of text is originally 87kb and after PDFA_1 conversion 719kb (the same happens with pdfa_2 and _3).

  2. When trying to save diskspace using grayscale conversion (PdfOptions.Grayscale=true), conversion seems to use gigabytes of memory if PDF has raster images in it. Also the resulting PDF files are much larger than without Grayscale conversion.

  3. The same problem as n:o 2 arises, when trying to set option PdfOptions.OptimizationOptions.CompressImages to true and setting ImageQuality to a number smaller than 100.

So is there some settings we could try, to save PDF into PDFA_1, 2 or 3 as close to the original filesize as possible? Grayscale is actually not needed unless the filesize can be reduced using that.

Here’s the code we are using in all above cases:

using (GroupDocs.Conversion.Converter converter = new GroupDocs.Conversion.Converter(inputfileAndPath))
{
PdfFormats targetFormat = PdfFormats.v1_7;
if (pdfa == “pdfa_1”) targetFormat = PdfFormats.PdfA_1A; // PdfA_n seems to increase file size in some cases!!
if (pdfa == “pdfa_2”) targetFormat = PdfFormats.PdfA_2A;
if (pdfa == “pdfa_3”) targetFormat = PdfFormats.PdfA_3A;

PdfConvertOptions pdfoptions = new PdfConvertOptions
{
	Dpi = 96,
	PdfOptions =
	{
		//Grayscale = true, // Caution, excess CPU load + large PDF size!
		PdfFormat = targetFormat, 
		OptimizationOptions =
		{
			//CompressImages = true, 
			//ImageQuality = 35,  
			RemoveUnusedObjects = true,
			RemoveUnusedStreams = true,
			LinkDuplicateStreams = true,
		},
		Linearize = true,
	},
	Format = GroupDocs.Conversion.FileTypes.PdfFileType.Pdf,
};

using (FileStream saveFileStream = new FileStream(outputfilePath, FileMode.Create))
{
	GroupDocs.Conversion.Contracts.SaveDocumentStream getDocumentStream = delegate () { return saveFileStream; };
	converter.Convert(getDocumentStream, pdfoptions);
}

}

1 Like

@Tero,

We tried to reproduce this issue. But output file size doesn’t increase this much. However, for further investigation, we need the problematic file(s) for all of your 3 scenarios (please share sample PDF if you have one or multiple).

Hello!
Thank you for the reply and sorry for delay. Here is an example file set for problem n:o 1. The file with longer filename is the problematic one (we have also several others behaving the same way) and the other is a normally behaving one just for reference. The files in result-folder are converted with the code above using different Pdf formats. examples.zip (3.2 MB)

If it’s ok, we could concentrate on this problem and ignore the two others.

1 Like

@Tero,

Thanks for sharing the details. We have reproduced this issue at our end using the PDF file with longer name. We are now investigating this issue. Your investigation ticket ID is CONVERSIONNET-3523. As there is any further update, you’ll be notified.

The issues you have found earlier (filed as CONVERSIONNET-3523) have been fixed in this update. This message was posted using Bugs notification tool by nikola.yankov