'PDF' recognize as 'MP3', 'PNG/JPG' recognize as 'Unknown' file type

Hi!
We are using GroupDocs.Metadata (License Type - Site OEM - paid version). We are able to clean metadata for most of the file extensions like (docx, pptx, mp3, txt, csv, xlsx) but failing for PDF, PNG, JPG extensions.

Listing the errors for each file type below-

  • PNG/JPG - Unexpected file format type: Unknown
  • PDF - Unexpected file type: Mp3

Note:

  1. We are doing this metadata cleanup inside a docker container (Linux container - Ubuntu bionic image).
  2. Separately for testing purpose we created one console utility (on windows machine) there PDF/PNG/JPG files metadata cleanup is working fine (tried SkiaSharp.NativeAssets.Linux 2.88.3 package as well - it did not help).

Versions:

  • Docker container - Linux - Ubuntu Bionic image
  • .Net Framework - 4.7.2
  • GroupDocs version - Latest stable 23.1.0
  • Xamarin - XamarinBionic 18.04

Attaching docker file and PDF/Image cleaner code snippets as well.

Any help/pointers would really appreciate.Code files.zip (1.9 KB)

@Gunilla_Hadders
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): METADATANET-3992

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@atir.tahir

[Update]

  • We logged the assembly version since we thought it could be using an older version of the GroupDocs DLL due to a cache issue, but it is using the correct version 23.1.0
  • We verified upgrading Ubuntu version from bionic (Ubuntu 18.04.6 LTS) to focal (Ubuntu 20.04.5 LTS)

image.png (7.6 KB)

1 Like

@Gunilla_Hadders

Thanks for the details. We’ll continue the investigation.

Hello @atir.tahir - I could use some advice on this. Any additional alternatives we can attempt on our end while you respond. Thanks!

@Gunilla_Hadders

Your ticket is still under investigation. We’ll notify you as soon as we have any fix or workaround.

Any new information on this topic, @atir.tahir? Do you require any more information? This problem has us bogged down. We need to reprocess some old file upload (in azure blobs format) that are in Azure Blobs format. Could you provide an update on this?

@Gunilla_Hadders

We are still investigating this issue for .NET framework (e.g. 4.7.2). But if you use .NET 5, .NET 6, or .NET Core 3.1, then the error is not reproduced.

@atir.tahir - sharing few more findings with of GroupDocs metadata (two different versions) DLL and .NET framework version 6-

image.png (15.3 KB)

1 Like

@Gunilla_Hadders

We’ll continue our investigation and let you know in case of any update.

Hi @atir.tahir - Just wanted to check if there is any update on this thread from 9th Mar?

@Gunilla_Hadders

Please try .NET 6 with API version 22.9. Let us know if issue persists at your end. However, we do not support .txt and .csv files.

@atir.tahir
Thanks for the response, .NET 6 with 22.9 version is not working for images. Please refer attached screenshot. It is converting the file to UNKNOWN file type.

GroupDocs22.9_NET6_Unknown_File_Type.png (157.1 KB)

1 Like

@Gunilla_Hadders

Does this happen for all images? We’ll continue our investigation for this case.

@atir.tahir -

Observations with GroupDocs.Metadata versions.

Version 22.9 -

  1. Most of the png and jpg files are failing with this Unknown file type error.
  2. While cleaning PDF file’s metadata (using metadata.Sanitize() method call), we are facing below error.
    Error: 'Object reference not set to an instance of an object.
    Stack trace:
    at #=zlHsjAMLpWrUiX3xwvXF_u1VaqzDqxSm1ElSQPZupOTKjCxIux1GJ6f4=.#=zzLKMWHA=(Operator #=zcXIYVbA=)
    at #=zlHsjAMLpWrUiX3xwvXF_u1VaqzDqxSm1ElSQPZupOTKjCxIux1GJ6f4=.#=zVNpXMw4=(Page #=zXfh$$m0=)
    at #=zxO40XsL557cnUOLZuO0QYtaa6vdDCF9M6mYj2i3UnucC_6JWVjBR63MygKnK.#=zbGS77_UALvnN(BaseOperatorCollection #=zrpi_8Q4=, Resources #=zKclq818=, Page #=zXfh$$m0=, Rectangle #=zBz$aJLqizqlI)
    at #=zxO40XsL557cnUOLZuO0QYtaa6vdDCF9M6mYj2i3UnucC_6JWVjBR63MygKnK.#=zbGS77_UALvnN(BaseOperatorCollection #=zrpi_8Q4=, Resources #=zKclq818=, Rectangle #=zBz$aJLqizqlI)
    at #=zxO40XsL557cnUOLZuO0QYtaa6vdDCF9M6mYj2i3UnucC_6JWVjBR63MygKnK.#=zgbY7UEM=()
    at #=zxO40XsL557cnUOLZuO0QYtaa6vdDCF9M6mYj2i3UnucC_6JWVjBR63MygKnK…ctor(Page #=zXfh$$m0=, TextSearchOptions #=z526oi9s10DwV, Boolean #=zycvtuEaJPBNX)
    at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
    at Aspose.Pdf.Devices.TextDevice.Process(Page page, Stream output)
    at .()
    at .()
    at GroupDocs.Metadata.Formats.Document.PdfRootPackage.()
    at GroupDocs.Metadata.Common.RootMetadataPackage..()
    at 2..() at 2..MoveNext()
    at 2...MoveNext() at GroupDocs.Metadata.Common.MetadataPackage.(Func2 )
    at GroupDocs.Metadata.Common.MetadataPackage.Sanitize()
    at GroupDocs.Metadata.Common.RootMetadataPackage.Sanitize()
    at GroupDocs.Metadata.Metadata.Sanitize()

Version 23.5 -

  1. png and jpg files are failing “Unknown” file type
1 Like

@Gunilla_Hadders

Thank you for providing your feedback. We will thoroughly investigate the matter and keep you informed of any findings or outcomes.