Issue with Extracting Attachments and PDF Conversion

Hello Team,

I have purchased a full license of GroupDocs and I am currently facing issues while extracting attachments from documents and converting them to PDF.

My requirement is:

  • First, extract all original attachments from the document
  • Then, convert those original attachments into PDF separately

However, I am facing the following problems:

Case 1: MSG / EML files

  • GroupDocs directly converts both the main document and attachments into PDF
  • I lose access to the original attachment files

Case 2: DOCX / XLS / PPT files

  • Attachments are embedded as OLE objects
  • GroupDocs only converts the main document
  • Attachments are neither extracted nor converted
  • I lose both original attachments and their PDF versions

Case 3: PDF files

  • Attachments are embedded inside the PDF
  • GroupDocs does not extract or convert these attachments
  • Original attachments are not accessible

Supported file types in my use case:
pdf, doc, msg, docx, xls, rtf, xlsx, dot, eml, ppt, pptx, dotx, ics

I would like to know:

  1. Is there any way in GroupDocs to extract original attachments from all these file types?
  2. Can we programmatically access attachments before conversion?
  3. Do you provide any solution or API for handling OLE embedded objects and PDF attachments?

Additionally, I found that Aspose provides better support for attachment extraction across formats.

Could you please suggest:

  • Whether this can be handled in GroupDocs
  • Or if switching to another solution is recommended

Looking forward to your guidance.

Thank you.

Best regards,
Sanket

@sanket8753

Welcome to Free Support Forum!

Thank you for reporting this issue and sharing the details on your use case. The issue has been already addressed to the corresponding team. As soon as the issue analyzed we’ll update you.

I can see that you already shared a couple of output files in PDF/A. Can you also share source test files as it would help us to diagnose issue faster.

@sanket8753

GroupDocs.Conversion fully supports all three scenarios you described. The key is using the LoadContext parameter in the Converter constructor — it provides access to the original attachment stream (SourceStream) for every document being processed, including embedded attachments.

Case 1: MSG / EML — Extract original attachments + convert to PDF

string source = "input.msg"; // or .eml
int fileIndex = 0;

using (var converter = new Converter(source, (LoadContext ctx) =>
{
    // ctx.SourceStream contains the ORIGINAL attachment data
    // ctx.HierarchyLevel = 0 for the main email, > 0 for attachments
    if (ctx.HierarchyLevel > 0)
    {
        // Save the original attachment in its native format
        ctx.SourceStream.Position = 0;
        using (var file = File.Create($"originals/{ctx.SourceFileName}"))
        {
            ctx.SourceStream.CopyTo(file);
        }
        ctx.SourceStream.Position = 0;
        return null; // use default load options for the attachment
    }

    return new EmailLoadOptions
    {
        ConvertOwner = true,  // convert the email body
        ConvertOwned = true   // convert attachments
    };
}))
{
    converter.Convert(
        (SaveContext c) => File.Create($"converted-{++fileIndex}.pdf"),
        c => new PdfConvertOptions());
}

If you only want the attachments (skip the email body), set ConvertOwner = false:

return new EmailLoadOptions
{
    ConvertOwner = false,  // skip email body
    ConvertOwned = true    // convert attachments only
};

Case 2: DOCX / XLS / PPT — Extract embedded OLE objects + convert to PDF

string source = "input.docx"; // or .xlsx, .pptx
int fileIndex = 0;

using (var converter = new Converter(source, (LoadContext ctx) =>
{
    if (ctx.HierarchyLevel > 0)
    {
        ctx.SourceStream.Position = 0;
        using (var file = File.Create($"originals/{ctx.SourceFileName}"))
        {
            ctx.SourceStream.CopyTo(file);
        }
        ctx.SourceStream.Position = 0;
        return null; // use default load options for the embedded object
    }

    return new WordProcessingLoadOptions  // or SpreadsheetLoadOptions, PresentationLoadOptions
    {
        ConvertOwner = true,
        ConvertOwned = true
    };
}))
{
    converter.Convert(
        (SaveContext c) => File.Create($"converted-{++fileIndex}.pdf"),
        c => new PdfConvertOptions());
}

Use the matching load options for your source format:

  • .docx / .doc / .rtf / .dot / .dotxWordProcessingLoadOptions
  • .xls / .xlsxSpreadsheetLoadOptions
  • .ppt / .pptxPresentationLoadOptions

Case 3: PDF — Extract embedded file attachments + convert to PDF

string source = "input.pdf";
int fileIndex = 0;

using (var converter = new Converter(source, (LoadContext ctx) =>
{
    if (ctx.HierarchyLevel > 0)
    {
        ctx.SourceStream.Position = 0;
        using (var file = File.Create($"originals/{ctx.SourceFileName}"))
        {
            ctx.SourceStream.CopyTo(file);
        }
        ctx.SourceStream.Position = 0;
        return null; // use default load options for the embedded file
    }

    return new PdfLoadOptions
    {
        ConvertOwner = true,
        ConvertOwned = true
    };
}))
{
    converter.Convert(
        (SaveContext c) => File.Create($"converted-{++fileIndex}.pdf"),
        c => new PdfConvertOptions());
}

How it works

The LoadContext parameter in the constructor callback is called for every document being processed — the main document and each embedded attachment/OLE object. It provides:

Property Description
SourceFileName The name of the file being loaded (e.g., report.docx, image.png)
SourceFormat The detected file format
SourceStream The original stream of the document/attachment in its native format
HierarchyLevel 0 for the main document, 1 for direct attachments, 2+ for nested

Return the appropriate LoadOptions (with ConvertOwner/ConvertOwned flags) only for the root document (HierarchyLevel == 0). For attachments, return null to let GroupDocs.Conversion auto-detect the correct options based on the file format.

The ConvertOwner / ConvertOwned flags control what gets converted:

ConvertOwner ConvertOwned Result
true true Convert everything (main doc + all attachments)
true false Convert main document only
false true Convert attachments only

This gives you full control: extract originals via LoadContext.SourceStream, and convert to PDF in the same pass.

Useful documentation links

The ConvertOwner/ConvertOwned properties are available on WordProcessingLoadOptions, SpreadsheetLoadOptions, PresentationLoadOptions, and PdfLoadOptions via the IDocumentsContainerLoadOptions interface.

Hello Team,

Thank you for the detailed explanation — the suggested approach using LoadContext is working well for formats like DOCX, XLSX, and PPTX.

I have a follow-up question regarding .ics (iCalendar) files.

I noticed that .ics is not mentioned in the supported load options (WordProcessingLoadOptions, SpreadsheetLoadOptions, PresentationLoadOptions, etc.).

Could you please clarify:

  1. Is .ics format supported for PDF conversion in GroupDocs.Conversion?
  2. If supported, what is the recommended way to handle it (since there is no specific LoadOptions)?
  3. If not supported, is there any recommended workaround?

Additionally, I would like to understand attachment handling for .ics files:
4. Does GroupDocs support extracting attachments from .ics files using LoadContext (similar to MSG/EML)?
5. If yes, how can we access those attachments?
6. If not, is there any alternative approach to extract attachments from .ics files?

I can share sample files if needed.

Thank you.

Hi Team,

I am working on implementing document-to-PDF conversion using GroupDocs, including extraction of embedded attachments (both original format and converted PDF). I followed the approach using LoadContext and ConvertOwned = true, and I tested with multiple file types.

Here are the issues I am facing:


1. XLS / PPT with OLE Objects

I embedded files using Insert → Object (embedded, not linked).

  • Only the main document is converted to PDF
  • Embedded OLE attachments are not extracted

However, for other formats like PPTX/XLSX/RTF/DOCX/MSG/EML/PDF, the same approach works perfectly:

  • Attachments are detected
  • Original files are saved
  • PDF conversion of attachments also works

2. DOC Issue (Exception while saving original attachments)

In DOC

  • Main document converts correctly
  • Attachments are extracted and converted
  • Original attachments are also saved successfully

But after processing, I get this exception:

GroupDocsConversionException: Object reference not set to an instance of an object

Even though all files are saved correctly, this exception is thrown at the end storing documents in the orignal format.


3. ICS File Issue

For .ics files:

  • Main PDF is generated, but content looks incorrect (binary/mixed format)
  • Embedded attachments are not extracted
  • Output PDF contains mixed or unreadable data

4. dot/dotx

Some features are restricted, including:

  • OLE embedding so i not try these one.

5. Additional Info

I have also attached sample files with embedded objects for your reference.

attachement_documents_1.zip (2.9 MB)

attachement_documents_2.zip (2.3 MB)

Questions

  1. Why are embedded OLE objects in XLS/PPT not detected, while they work in DOC/DOCX?
  2. What could be causing the NullReferenceException after successful processing in DOC?
  3. Is .ics fully supported for conversion and attachment extraction?

Any guidance or clarification would be really helpful.

@sanket8753

Thank you for the detailed report and sample files — they were very helpful in reproducing the issues.

1. XLS / PPT — OLE Objects Not Detected

We identified and fixed a bug where OLE objects with incomplete metadata in binary XLS/PPT formats caused a NullReferenceException, which aborted the entire conversion. This prevented valid attachments from being extracted as well.

Fix included in v26.3 (releasing tomorrow). After the update, your existing code will work — valid OLE objects are extracted, and invalid ones are silently skipped.

2. DOC — NullReferenceException After Successful Processing

This was caused by a .bin OLE object (unrecognized format) in your DOC file. The converter attempted to convert it to PDF, which failed.

The fix is in the convertOptionsProvider callback — return NoConvertOptions for unknown formats so they pass through as-is instead of failing:

using (var converter = new Converter(source, (LoadContext ctx) =>
{
    if (ctx.HierarchyLevel > 0)
    {
        // Save original attachment
        ctx.SourceStream.Position = 0;
        using (var file = File.Create($"originals/{ctx.SourceFileName}"))
            ctx.SourceStream.CopyTo(file);
        ctx.SourceStream.Position = 0;

        // Return null for attachments — auto-detects the correct load options
        // based on the attachment's actual format (PDF, PPTX, XLSX, etc.)
        return null;
    }

    // Root document — specify format-specific load options
    return new WordProcessingLoadOptions
    {
        ConvertOwner = true,
        ConvertOwned = true
    };
}))
{
    converter.Convert(
        (SaveContext c) => File.Create($"converted-{++fileIndex}.pdf"),
        c =>
        {
            // Skip conversion for unrecognized formats (e.g. .bin OLE objects)
            if (c.SourceFormat == null || c.SourceFormat == FileType.Unknown)
                return new NoConvertOptions();

            return new PdfConvertOptions();
        });
}

Two important points:

  • LoadContext callback: Return format-specific load options (e.g. WordProcessingLoadOptions) only for the root document (HierarchyLevel == 0). For attachments, return null — the converter will auto-detect the correct format and load options based on the attachment’s actual content (PDF, PPTX, XLSX, etc.).
  • convertOptionsProvider callback: The ConvertContext.SourceFormat tells you what format each document is. When it is Unknown, returning NoConvertOptions passes the original stream through without conversion. This way valid attachments are converted to PDF, while unrecognized ones (like .bin OLE objects) are output as-is — no exception thrown.

3. ICS File Support

ICS (iCalendar) was not supported in v26.1 — the file was misidentified as HTML.

We have added full ICS support in v26.3 (releasing tomorrow):

  • ICS files are correctly detected and loaded
  • Calendar event content is converted to PDF, DOCX, HTML, etc.
  • Embedded ICS attachments are correctly handled as owned documents
using (var converter = new Converter("meeting.ics", (LoadContext ctx) =>
{
    if (ctx.HierarchyLevel > 0)
    {
        // Save original ICS attachment (e.g. embedded PDF)
        ctx.SourceStream.Position = 0;
        using (var file = File.Create($"originals/{ctx.SourceFileName}"))
            ctx.SourceStream.CopyTo(file);
        ctx.SourceStream.Position = 0;

        // Return null — auto-detects load options for the attachment
        return null;
    }

    // Root document — use EmailLoadOptions for ICS
    return new EmailLoadOptions
    {
        ConvertOwner = true,
        ConvertOwned = true
    };
}))
{
    converter.Convert(
        (SaveContext c) => File.Create($"converted-{++fileIndex}.pdf"),
        c => new PdfConvertOptions());
}

With your sample file (ics_pdf_attachment.ics), this produces:

  • converted-1.pdf — the calendar event rendered as PDF
  • converted-2.pdf — the embedded PDF attachment converted
  • originals/Naukri_SanikaPatil[1y_9m].pdf — the original attachment in its native format

Please let us know if you have any further questions.