The accesibility related tags should not be stripped from headers during pdf conversion

We are seeing header is getting stripped after pdf conversion

@sonamsinha

Could you please share the source and output files? It would be great if you highlight the paragraph that has this issue (maybe share a screenshot). It will help us reproduce and investigate the issue.

header footer test.pdf (87.2 KB)

header footer test.docx (21.5 KB)

I am using these doc and pdf. The voice over is not reading header and footer
Screenshot 2024-11-21 at 5.25.47 PM.jpg (84.8 KB)

Header tag is not present in screenshot.

same thing happening for footer

It depends on the tool that you are using. Please take a look at this screenshot and details.

Hi @atir.tahir Can you please also confirm “The accesibility related tags should not be stripped from headers during pdf conversion” If this is also dependent on tool?

Atleast we should see heading tag in tool? I tried testing in adobe using voice over in mac. There I was not seeing header footer tags. This is expected behaviour?

Yes, it could be.
Could you please try following tools and share the results/screenshots:

Hi @atir.tahir Can you please confirm from your end for both windows and Mac?

@sonamsinha

Please take a look at this image (25.7 KB). It reads the header/footer. We used ReadAloud extension for this. Moreover, see this one (93.0 KB) taken from Pave PDF and it doesn’t consider header/footer.

Therefore, we think that it depends on the tool you are using.

@atir.tahir
TestDocument (7).pdf (20.6 KB)

Are you able to read image alt text and dash after Request Id as well in this document?

@atir.tahir
Can you suggest a screen reader tool for mac and windows which we can use for our testing?

@sonamsinha

We get these results using PAVE. And these results using PAC. You can try PAVE that will work for both Windows and Mac.

Screenshot 2024-11-26 at 5.12.09 PM.jpg (104.3 KB)

test header footer (1).docx (85.0 KB)

Hi @atir.tahir If you see above screenshots. The section is present for header footer. I am testing this using adobe. We are not using here groupdocs for word to pdf conversion. Can you please take a look and confirm why this is not happening for groupdocs?

@sonamsinha

Could you clarify this for us? Are you converting Word documents to PDF without using GroupDocs and then checking for artifacts in the resulting PDF? Does the PDF generated without conversion show all artifacts, while the output from GroupDocs does not display everything? Are you saying that when converting Word to PDF with GroupDocs, not all artifacts are being captured in the PDF?
Secondly, do you have GroupDocs license? Could you please also share the conversion code (when you use GroupDocs for the conversion)?

pdftron test.pdf (36.2 KB)

Here is the converted pdf. We are expecting header and footer tags should also be present in same way after converting word to pdf using groupdocs. But while converting word to pdf using groupdocs these header footer tags are missing and that’s why we are not able to read header footer.

Hi @atir.tahir @vladimir.litvinchik
Yes we have groupdocs license
Product → GroupDocs.Total for Java
LicenseVersion ->3.0
Yes I meant this only “the PDF generated without conversion show all artifacts, while the output from GroupDocs does not display everything”

Conversion Code

 public byte[] generatePdfUsingGroupDocs(byte[] contentInBytes, boolean showDocTitle, String title) throws IOException {
        long startTime = System.currentTimeMillis();
        WordProcessingLoadOptions loadOptions = new WordProcessingLoadOptions();
        // Below is required for adding Tags in the output PDF documnet
        loadOptions.setPreserveDocumentStructure(true);
        logger.info("Office document size in bytes: {}", contentInBytes.length);
        InputStream is = new ByteArrayInputStream(contentInBytes);
        title = isBase64Encoded(title) ? new String(Base64.decodeBase64(title), StandardCharsets.UTF_8) : title;
        try (Converter converter = new Converter(() -> is, () -> loadOptions)) {
            try (ByteArrayOutputStream ms = new ByteArrayOutputStream()) {
                PdfConvertOptions convertOptions = new PdfConvertOptions();
                PdfOptions pdfOptions = convertOptions.getPdfOptions();
                pdfOptions.getFormattingOptions().setDisplayDocTitle(showDocTitle);
                PdfDocumentInfo pdfDocumentInfo = pdfOptions.getDocumentInfo();
                pdfDocumentInfo.setTitle(title);
                converter.convert(() -> ms, convertOptions);
                byte[] outputPdfBytes = ms.toByteArray();
                long endTime = System.currentTimeMillis();
                logger.info("Generated PDF content size in bytes: {}", outputPdfBytes.length);
                logger.info(
                        "Office document to PDF conversion using GroupDocs completed in {}ms", (endTime - startTime));
                return outputPdfBytes;
            }
        } catch (IOException e) {
            System.out.println(e.getMessage());
            throw e;
        }
    }

@sonamsinha
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): TOTALJAVA-236

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi @atir.tahir Can you please share updates here?

@sonamsinha

This ticket is still under investigation.