Hi,
I have a .pdf document only 1 page long. When calling GetDocumentInfo() on the parser created with the stream of this document it takes at least 5 minutes to do this. This makes the whole implementation very inefficient.
I have added the .pdf file in question.
23.020 Regiodepot - Aar - Grondplan - Nieuw - r13 - 2025 05 14.pdf (2.1 MB)
1 Like
@Tender_Experts
Summary
GroupDocs.Parser.GetDocumentInfo() can be slow for some PDFs because the method parses the whole file to collect metadata, page dimensions and text‑coordinate information. When the document contains large resources (e.g., high‑resolution images, embedded fonts) the parsing work increases, which may lead to minutes‑long execution even for a single‑page file.
Why it happens (general reasons that are typical for PDF parsing; the specific evidence for this method is not available in the provided documentation)
| Reason |
Explanation |
| Full document scan |
GetDocumentInfo walks through every object in the PDF to build a complete model, which can be CPU‑intensive for complex resources. |
| Large embedded resources |
High‑resolution images or full‑size fonts must be decoded, increasing processing time. |
| Older library version |
Earlier releases of GroupDocs.Parser were less optimized and could spend extra time on tasks such as page rendering. |
What to try next (these steps are common practice for improving performance, but they are not directly confirmed by the supplied evidence)
- Check the library version – make sure you are using the latest GroupDocs.Parser NuGet package, as newer versions contain performance improvements.
- Upgrade if needed – run
dotnet add package GroupDocs.Parser --version <latest> to get the most recent build.
- Load the PDF with minimal options – if the SDK provides load‑options (e.g.,
PdfLoadOptions), set any flags that skip page rendering or font substitution. Example (adjust to the actual API you have):
var loadOptions = new PdfLoadOptions {
RenderPageLimit = 0, // do not render pages while loading
EnableFontSubstitution = false
};
var parser = new Parser("path/to/file.pdf", loadOptions);
var info = parser.GetDocumentInfo();
- Request only the needed information – if the SDK offers a
DocumentInfoOptions object, disable parts you do not need (e.g., text coordinates) to reduce work.
- Inspect the PDF – use tools like
pdfinfo or mutool to see whether the file contains unusually large images or fonts. Reducing their size can help.
- Enable SDK logging – turn on detailed logging (if available) to see which stage of the parsing takes the most time.
- Provide more context if the problem persists – share the exact GroupDocs.Parser version, a minimal reproducible code snippet, and any log output showing the delay.
If after trying these steps the call is still taking several minutes, the issue may be specific to the PDF structure or a bug in the version you are using. In that case, consider opening a support ticket with GroupDocs, attaching the PDF file, and including the details above.
Note: The above recommendations are based on typical behavior of PDF parsers and general best practices, as the supplied documentation does not contain explicit guidance for GetDocumentInfo() performance.
SDK version unclear — refer to the official release notes before applying this guidance.
We were unable to reproduce this issue on our end. Using the provided PDF, the GetDocumentInfo() call completed in 36 seconds.
To investigate further, could you please share the following details?
- The API version you are using
- Whether you are running the API in evaluation mode or with a valid license
- A sample code snippet demonstrating how you are calling the method
We’ll review the information and continue looking into the matter.