We're sorry GroupDocs doesn't work properply without JavaScript enabled.

Free Support Forum - groupdocs.com

PDF: text before table is missing

file.pdf (54.8 KB)

I am trying to get “confidential” word from attached file.pdf, but all words before table are missing.
.net6,
parsers version 22.6.0

Output from parsers:
“\0\0\0\u0001\0\u0002\0\u0003\0\u0004\0\u0005\0\u0006\0\0\0\u0003\0\a\0\b\0\t\0\u0003\0\r\n\0\v\0\u0003\0\f\0\a\0\r\n\0\u000e\0\u000f\0\u0010\0\u0011\0\u0003\0\u0012\0\u0013\0\u0014\0\u0015\0\u0016\0\u0017\0\u0018\0\u0019\0\u001a\0\u0003\0\u0012\0\u001b\0\u001c\0\u0018\0\u001d\0\u001e\0\u0003\0\u0012\0\u001f\0\u001d\0 \0\0\0\u0001\0\u0002\0\u0003\0\u0004\0\u0005\0\u0006\0\u0002\0\a\0\u0004\0\b\0\t00051614-18610-K-000300051614-18610-K-000400051614-18610-K-000500051614-18610-K-000600051614-18610-K-001100051614-18630-K-001400051614-18650-K-000900051614-18650-K-022000051614-18650-K-022100051614-18650-K-023000051614-18650-K-023100051614-18650-K-024100051614-18720-K-011400051614-50150-K-861000051614-57210-K-000100051614-58120-K-001000051614-64400-K-001400051614-64400-K-102100051614-76300-K-021100051614-76300-K-0421DDDCHIGDHFJFDEEADJBC1861018610186101861018610186301865018650186501865018650186501872050150572105812064400644007630076300\0\0\0\u0001\0\u0002\0\u0003\0\u0004\0\u0005\0\u0006\0\a\0\b\0\t\0\r\n\0\v\0\f\0\r\n\0\u000e\0\b\0\u000f\r\n”

Missing text is:
confidential
شركة ترسانة الاسكندرية
the last REV TO DRAWING

@safetica.rad

Could you please share the sample code or application as well? Also specify your OS details.

OS: windows 10
Code sample:

var fileStream = File.Open("file.pdf", FileMode.Open, FileAccess.Read, FileShare.Delete | FileShare.ReadWrite);
var gdExtFormat = GroupDocs.Parser.Options.FileType.FromExtension(".pdf").Format;
var gdParser = new Parser(fileStream, new GroupDocs.Parser.Options.LoadOptions(gdExtFormat));
var textReader = _gdParser.GetText(new GroupDocs.Parser.Options.TextOptions(true));
string text = textReader.ReadToEnd();

When i change TextOptions to False to use accurate mode instead of raw mode it seems to work.
var textReader = _gdParser.GetText(new GroupDocs.Parser.Options.TextOptions(false));

We are using “raw mode” due to its performance benefits.
Do you have an idea why raw mode got the text output wrong?

1 Like

@safetica.rad

We are investigating this issue. Your investigation ticket ID is PARSERNET-1962.