Arabic text getting reversed Groupdocs Parser 19.5

Niteen_Jadhav · January 19, 2023, 2:12pm

Hello all,

I have a pdf with arabic text (here)

now I want to extract the text from the pdf using groupdocs.parser V(19.5)

the pdf is stored in my database as a byte[], below is my code for parsing the pdf with arabic text and storing the text in my database.

StringBuilder strTextData = new StringBuilder();
System.IO.MemoryStream mTestStream = new MemoryStream(fileBytes);
{
	strTextData.Append((ExtractTextAll(mTestStream, false) ?? "").Trim();
}

private string ExtractTextAll(MemoryStream stream, bool formatted)
{
    stream.Seek(0, SeekOrigin.Begin);
    GroupDocs.Parser.ExtractorFactory factory = new GroupDocs.Parser.ExtractorFactory();
    GroupDocs.Parser.Extractors.Text.TextExtractor extractor = formatted
        ? factory.CreateFormattedTextExtractor(stream)
        : factory.CreateTextExtractor(stream);
    if (extractor == null)
    {
        return null;
    }
    try
    {
        return extractor.ExtractAll();
    }
    finally
    {
        extractor.Dispose();
    }
}

using (SqlConnection conn = new SqlConnection(Program.GetConnectionString()))
{
    conn.Open();
    string QUERY = @"UPDATE [dbo].[FileTextData] SET [TextData]=@TextData, [ModifiedDateTime]=GETDATE() WHERE FileVersionId=@FileVersionId AND [PageIndex]=0
						IF @@ROWCOUNT <= 0
						BEGIN
							INSERT INTO [dbo].[FileTextData] ([FileVersionId], [PageIndex], [TextData], [CreatedDateTime], [ModifiedDateTime])
							VALUES(@FileVersionId, 0, @TextData, GETDATE(), GETDATE())
						END
						UPDATE [dbo].[FileVersion] SET IsTextDataExtracted='Y' WHERE Id=@FileVersionId
					";
    using (SqlCommand cmd = new SqlCommand(QUERY, conn))
    {
        cmd.Parameters.Add("@FileVersionId", SqlDbType.BigInt, 0).Value = fileVersionId;
        cmd.Parameters.Add("@TextData", SqlDbType.NVarChar, -1).Value = strTextData.ToString();
        WriteToLog("Updating database ...");
        cmd.ExecuteNonQuery();
        WriteToLog("Updating database Completed...");
    }
}

but the problem is arabic text is stored in my database in a reverse order,

for e.g (دائرة بلدية أبوظضبي) this text is converted into (يبضظوبأ ةيدلب ةرئاد) text and I have to use this website whenever I want to search content of my pdf’s.

how can I fix this?

NOTE: If I download the pdf or view the pdf(using groupdocs viewer) from my db(byte[]) like below

return File(fileBytes, System.Net.Mime.MediaTypeNames.Application.Octet, Path.GetFileName(data.FileName));//for downloading

the text in the pdf is appearing correctly.

atir.tahir · January 19, 2023, 6:08pm

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

  Issue ID(s): PARSERNET-1986

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Niteen_Jadhav · January 30, 2023, 7:56am

Hello,

Is there any updates.

atir.tahir · January 30, 2023, 10:06am

@Niteen_Jadhav

This ticket is still under investigation.

Niteen_Jadhav · February 3, 2023, 10:27am

When can I expect a response from your side?

atir.tahir · February 3, 2023, 10:29am

@Niteen_Jadhav

Please note that all free support tickets are assisted on first come first serve basis. We’ll notify you as there’s any progress update on your ticket.

Niteen_Jadhav · May 5, 2023, 7:04am

hello, I am still unable to get any response on this.

atir.tahir · May 5, 2023, 12:14pm

@Niteen_Jadhav

Your ticket is still under investigation.