Document redaction performance is getting slow and the output file size becomes huge in C# .NET

ramanantechnocit · April 18, 2021, 9:58pm

I would like to share and clear few doubts which I faced while using it,

Processing time is getting very delayed for each file to finish the redaction process.
File size is getting very huge when compared with the original file.
While processing the multiple files(more than five), the first file getting exception that “System Out of Memory”

atir.tahir · April 19, 2021, 6:55am

@ramanantechnocit

Please share following details and we’ll investigate these issues:

API version that you are using (e.g. 21.1, 21.3)
Sample code
Problematic files
Are you evaluating the API in trial mode (without a license)?

ramanantechnocit · April 19, 2021, 7:56am

Hi Atir,
Thanks for the support.
Please find the response in inline,

API version that you are using (e.g. 21.1, 21.3)
We are using dll called “GroupDocs.Redaction” with trial license
Sample code
Redactor redactor = new Redactor(filename);
redactor.Apply(new RegexRedaction("\d{2}\s*\d{2}[^\d]*\d{6}", new ReplacementOptions(System.Drawing.Color.Blue)));
redactor.Save();
redactor.Dispose();
Problematic files
File 1: Test Sample 01.pdf (2.7 MB)
File 2: Test Sample 02.pdf (725.7 KB)
Are you evaluating the API in trial mode (without a license)?
Using the dll with trial license for 30 days GroupDocsLicense.PNG (21.7 KB)

AlexanderObraztsov · April 19, 2021, 3:05pm

@ramanantechnocit

Please, note, that by default Save() method rasterizes the document, i.e. renders each page into a raster image and replaces the page searchable content with it. This activity takes additional time and makes the file size grow. If you want to disable it, you can pass an instance of SaveOptions class, e.g.

redactor.Save(new SaveOptions(false, SaveOptions.SaveSuffix));

It will perform much faster with minimal changes in file’s size. For more details about saving options, please review this article in public documentation: Saving documents

augustinechristo · April 20, 2021, 6:42am

Hi Alexander,
If we use the above code, it is not applying the redaction in the PDF.

For Example the below code will redact the XMPManifest

redactor.Apply(new RegexRedaction("\d{2}\s*\d{2}[^\d]*\d{6}", new ReplacementOptions(System.Drawing.Color.Blue)));

If we use save(), it is redacting the XMPManifest. But if we use the code suggested by you, it is not removing the XMPManifest

Please advise

AlexanderObraztsov · April 20, 2021, 1:18pm

@augustinechristo

The rasterized file does not import any metadata from the original file, so the XMP is not actually “redacted”, it is missing. Also, please, note, that RegexRedaction and coloring options apply only to the document’s body, not the metadata. In order to redact XMP headers, you will need one of Metadata redactions. For instance, you can try this instead:

redactor.Apply(new MetadataSearchRedaction(@"\d{2}\s*\d{2}[^\d]*\d{6}", "removed"));

augustinechristo · April 21, 2021, 6:47am

Thanks Alex,
But Unfortunately MetaDataSearchRedaction not redacting the XMP Manifest.

   <xmpMM:Manifest>
        <rdf:Seq>
           <rdf:li rdf:parseType="Resource">
              <stMfs:linkForm>EmbedByReference</stMfs:linkForm>
              <stMfs:reference rdf:parseType="Resource">
                 <stRef:filePath>/Users/name/Desktop/Jobs/Subfolder/filename.psd</stRef:filePath>
              </stMfs:reference>
           </rdf:li>
        </rdf:Seq>
     </xmpMM:Manifest>

Even we tried to remove all metadata by using the below code, still it is not removing. Any Idea?
redactor.Apply(new EraseMetadataRedaction(MetadataFilters.All));

AlexanderObraztsov · April 21, 2021, 12:39pm

@augustinechristo

We could reproduce this issue at our end. It’s been logged in our internal issue tracking system with ID REDACTIONNET-383. As there’s any update, you’ll be notified.

augustinechristo · April 21, 2021, 12:50pm

Thanks Alex,
Looking forward for the Updated Version

AlexanderObraztsov · September 22, 2021, 5:19pm

@augustinechristo

GroupDocs.Redaction for .NET v21.9 that includes fix for this issue has been published. You can find the new version at

Have a nice day!