Problems with watermark removal in Python

carled · January 7, 2025, 9:37am

One of my engineers is trying to use the Python version of the product to remove watermarks. We couldn’t find online syntax for doing this, but put together code that appeared to run without error, it just did nothing. Are we on the right lines with this, or have we got the code wrong?

def run():

# Update with the path to your output directory
document_path = test_files.sample_pdf
output_directory = utils.get_output_directory_path()
output_document_path = join(output_directory, os.path.basename(document_path))

if not os.path.exists(output_directory):
    os.makedirs(output_directory)

with gw.Watermarker(document_path) as watermarker:
    watermarks = watermarker.search()
    for i in range(len(watermarks) - 1, -1, -1):
        print(watermarks[i].text)
       
        watermarks.remove(watermarks[i])
    watermarker.save(join(output_directory, output_document_path))


# Indicate the successful rendering of the source document and specify where to find the output in the specified directory
print(f"\nWatermark removed successfully.\nCheck output in {output_directory}.")

Professionalize.Discourse · January 7, 2025, 9:38am

@carled

It looks like you’re on the right track with your code for removing watermarks using the GroupDocs.Watermark library in Python. However, there are a few things to consider that might help ensure the watermark removal works as expected.

Check Watermark Types: Ensure that the watermarks you are trying to remove are indeed recognized by the library. The search() method should return the watermarks present in the document. If it returns an empty list, it means no watermarks were found.
Removing Watermarks: The way you are removing watermarks in your loop is correct, but make sure that the watermarks you are trying to remove are not being filtered out or are of a type that can be removed.
Saving the Document: After removing the watermarks, you are saving the document correctly. Just ensure that the output_document_path is correctly defined and that you have write permissions to the output directory.

Here’s a slightly modified version of your code with added checks:

def run():
    # Update with the path to your output directory
    document_path = test_files.sample_pdf
    output_directory = utils.get_output_directory_path()
    output_document_path = join(output_directory, os.path.basename(document_path))

    if not os.path.exists(output_directory):
        os.makedirs(output_directory)

    with gw.Watermarker(document_path) as watermarker:
        watermarks = watermarker.search()
        if not watermarks:
            print("No watermarks found in the document.")
            return
        
        for i in range(len(watermarks) - 1, -1, -1):
            print(f"Removing watermark: {watermarks[i].text}")
            watermarks.remove(watermarks[i])
        
        watermarker.save(join(output_directory, output_document_path))

    print(f"\nWatermark removed successfully.\nCheck output in {output_directory}.")

Additional Tips:

Debugging: Add print statements to check the contents of watermarks after the search() call to ensure that you are indeed finding the watermarks.
Documentation: Refer to the official documentation for any specific requirements or limitations regarding watermark removal.

If you continue to face issues, consider checking the documentation for any additional parameters or methods that might be necessary for your specific use case.

Sources:
[1]: GroupDocs.Watermark for Python overview - Documentation
[2]: GroupDocs.Watermark for Python via .NET - Documentation

alexndr · January 21, 2025, 8:57am

hi @carled ,
Thank you for sharing your code snippet. I tested it on a attached sample PDF containing the text watermark “top secret,” and it successfully removed the watermark. The approach itself looks correct. If this code isn’t working for you, could you please provide a sample PDF where you’re experiencing the issue? We’ll be happy to investigate further.
sample.pdf (739.6 KB)

Additionally, it’s also possible to remove the watermark by index, for example:
del watermarks[i]