Free Support Forum - groupdocs.com

Get text with coordinates and redact in .NET

I have a scanned pdf file and I want to get all the data with the coordinates of data and redact that text using this library , how can I achieve my goal

@NaveedHakim786

We are investigating the possibility to redact the scanned PDF files. Your investigation ticket ID REDACTIONNET-310. We’ll notify you as there’s any update.

@NaveedHakim786

We are working on a solution for OCR at this time, most likely it will be provided in one of the upcoming releases. We’ll update you in case of any progress.

Yes, you can redact scanned PDF files using known coordinates. Unfortunately, you shared only binaries of the project, we cannot check its code. The following example demonstrates how to remove account information block and transaction descriptions block on all pages, except second:

 private static void RedactNotSearchable()
 {
            using (Redactor r = new Redactor(".\\NotSearchable.pdf", new LoadOptions(), new RedactorSettings(new RedactionCallback())))
            {
                Redaction[] redactions = new Redaction[] { 
                    new ImageAreaRedaction(new Point(738, 169), new RegionReplacementOptions(Color.Blue, new Size(414, 110))),
                    new ImageAreaRedaction(new Point(140, 375), new RegionReplacementOptions(Color.Black, new Size(550, 1180)))
                };
                var log = r.Apply(redactions);
                Console.WriteLine(log.Status);
                r.Save(new SaveOptions(false, "Example"));
            }
  }
  private class RedactionCallback : IRedactionCallback
  {
            public bool AcceptRedaction(RedactionDescription description)
            {
                return (description.Details != "Image 2");
            }
  }

The resultant file is available here.
What concerns searchable file, since this file contains an image of the original page in addition to the recognized text, you have to use color box replacement. The following example demonstrates how to remove all occurrences of an account numbers in this document:

using (Redactor r = new Redactor(".\\Searchable.pdf"))
{
     var log = r.Apply(new RegexRedaction("\\d{3}-?\\d{7}", new ReplacementOptions(Color.Green)));
     Console.WriteLine(log.Status);
     r.Save(new SaveOptions(false, "Example"));
}  

Resultant file is here.