Many PDF files result in System.ArgumentOutOfRangeException: The added or subtracted value results in an un-representable DateTime

Hello,

We are getting the following exception…
It seems to be caused by the extraction of metadata…

System.ArgumentOutOfRangeException: The added or subtracted value results in an un-representable DateTime. (Parameter 't')\r\n   at System.DateTime.ThrowDateArithmetic(Int32 param)\r\n   at System.DateTime.op_Subtraction(DateTime d, TimeSpan t)\r\n   at \u0006\u0018\u0005.\u0002(String \u0002)\r\n   at \u0006\u0017\u0005.\u0008\u0017\u0003\u0017\u0005\u0017\u0015\u0002()\r\n   at GroupDocs.Parser.Parser.GetMetadata()\r\n   at \u000F\u0002\u0010.\u000E\u001A\u000F\u0016\u0005\u0003()\r\n   at \u0003\u0016\u000F.\u0002(Document \u0002, \u0006\u001A\u000F \u0003, String \u0005)

This is the unescaped version of the exception:

System.ArgumentOutOfRangeException: The added or subtracted value results in an un-representable DateTime. (Parameter 't')
   at System.DateTime.ThrowDateArithmetic(Int32 param)
   at System.DateTime.op_Subtraction(DateTime d, TimeSpan t)
   at .(String )
   at .()
   at GroupDocs.Parser.Parser.GetMetadata()
   at .()
   at .(Document ,  , String )

Can you tell us what it’s about? Could this be fixed on your end?

@jamsharp

The System.ArgumentOutOfRangeException you are encountering typically occurs when there is an issue with date and time calculations, particularly when the resulting DateTime value is outside the allowable range. This can happen during the extraction of metadata from PDF files, especially if the metadata contains dates that are invalid or not representable in the .NET DateTime structure.

To address this issue, consider the following steps:

  1. Check the PDF Metadata: Inspect the metadata of the PDF files that are causing the exception. Look for any date fields that may contain invalid or extreme values (e.g., dates far in the past or future).

  2. Update GroupDocs.Parser: Ensure that you are using the latest version of the GroupDocs.Parser library. Updates often include bug fixes and improvements that may resolve such exceptions.

  3. Error Handling: Implement error handling in your code to catch this specific exception. You can log the problematic PDF file and the metadata being processed to identify patterns or specific files that consistently cause the issue.

  4. Contact Support: If the problem persists, consider reaching out to GroupDocs support with details about the specific PDF files and the context in which the error occurs. They may be able to provide a fix or workaround.

If you need further assistance or specific code examples for error handling, feel free to ask!

@jamsharp

Could you please share a problematic PDF and the sample code to reproduce the issue?

I cannot promise to send you the file, because one of our customers ran into the problem with their file. We’ll aks them…

Repro code:

var extractionOptions = new ExtractionOptions { UseRawTextExtraction = false, };
var document = Document.CreateFromFile(filePath);
m_Extractor.ErrorOccurred += OnExtractionErrorOccurred;

extractedData = m_Extractor.Extract(document, extractionOptions);

We received the error via the ErrorOccurred event.

@jamsharp
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): SEARCHNET-3520

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.