Reading order issues

  1. Reading order of each tabular section is not correct and reading the entire row section at once as the entire row is identified as one single element. The table is not identified as table / cells are not associated with the table header cells
  2. Heading level1 is not identified as h1
  3. normal content identified as heading.
  4. List items are not identified as list item and sub list items are skipped by SR

Can you please help here? We were testing in windows using JAWS

@sonamsinha

Can you please share which product you’re using and attach a sample application with steps so we could reproduce the issue on our side?

@vladimir.litvinchik
I am using GroupDocs.Total for Java. License type is Site OEM
You can create a word file with Headings H1 , H2 and normal text along with list and table and try to verify after converting word to pdf using GroupDocs.Total for Java

@sonamsinha

Unfortunately I can’t reproduce the issue. I have tested conversion of DOCX document to PDF and it worked well see source-docx-and-output-pdf.png (104.2 KB).

Then I tried to read text from PDF using pdfbox package and it was able to read the text

Heading 1
Normal text
Heading 2
Normal text
? List item 1
? List item 2
? List item 3
Table heading 1 Table heading 2 Table heading 3 Table heading 4
Table row 1
Table row 2
Table row 3
Table row 4
Table row 5

Here is the file headers-text-list-table.docx (14.0 KB).
Sample application sample-app.zip (47.1 KB)

Based on details you’ve shared it seems you may be using some different library to extract content from PDF file.

Can you please modify the sample application so we could reproduce the issue?

how to run this sample application?

I am facing Heading 1 is identified as heading 2 and not as heading 1. Also normal paragraph content is getting read out as heading 2

Here is the recording you can refer.
Also after conversion of same word to pdf using GroupDocs.Total for Java. The bullets for list in pdf are not same as bullets for word.
Test Accessibility Document For GroupDocs.pdf (36.2 KB)

headers-text-list-table-link.docx (15.4 KB)

The pdf file which you have shared is not at all accessible. The accessibility tags were not auto tagged in pdf while opening via adobe acrobat.
Screenshot 2024-09-16 at 10.57.55 AM.jpg (68.2 KB)

@sonamsinha

What application do you use to validate PDF file?

You can run the sample application by executin mvn compile exec:java command in the folder with pom.xml file.

We are using groupdocs-conversion 24.6 here for conversion of word to pdf.
Adobe Acrobat to verify pdf file. We are using voice over in mac and JAWS in windows to verify screen reader

@sonamsinha

Thank you for sharing the video and details. To keep document structure while converting you can set PreserveDocumentStructure to true:

 WordProcessingLoadOptions loadOptions = new WordProcessingLoadOptions();
 loadOptions.setPreserveDocumentStructure(true);
 
 //convert DOCX to PDF
 Converter converter = new Converter(fileName, () -> loadOptions);
 PdfConvertOptions pdfConvertOptions = new PdfConvertOptions();
 FileOutputStream fileOutputStream = new FileOutputStream(fileName + ".pdf");
 converter.convert(() -> fileOutputStream, pdfConvertOptions);
 converter.close();

The file that I’ve got on the output: headers-text-list-table.docx.pdf (37.2 KB).

we have already added
WordProcessingLoadOptions loadOptions = new WordProcessingLoadOptions();
loadOptions.setPreserveDocumentStructure(true);

while converting word to pdf. We are still facing the above issues

@sonamsinha

Can you please check this file - headers-text-list-table.docx.pdf (37.2 KB)?

To check the structure I was using a tool that I found online see attachment:

and it seems to work properly.

@vladimir.litvinchik
Structure is fine. I am facing issues while using screen reader. Screen reader is not reading tags properly. Heading 1 and normal text is read out as Heading 2.
On mac you can try opening file in browser and try reading the text using voice over. In windows you can try JAWS to verify the text read out properly or not.

@sonamsinha

Ok, thanks for the clarification. What application do you use to open a PDF file on Windows and read it with JAWS?

You can open it in browser as well. Yes try to read it with JAWS

@sonamsinha

Got it. Can you please also attach a PDF file that is properly read by JAWS so I could compare them?

@vladimir.litvinchik

I don’t have any pdf file with me.

@sonamsinha

Got it. Thank you for the feedback. I’ll share the details with development team. As soon as we have any updates we’ll let you know.

Thank you @vladimir.litvinchik till when I can expect your reply ?

@sonamsinha

Unfortunately, I can’t share any ETA at the moment. The issues are processed according to the priority.

We provide Free Support here on this forum and paid support through Paid Support Helpdesk. The paid support issues have a higher priority in comparison to the free support requests.