The accesibility related tags should not be stripped from headers during pdf conversion

sonamsinha · November 21, 2024, 11:49am

We are seeing header is getting stripped after pdf conversion

atir.tahir · November 21, 2024, 10:13pm

Could you please share the source and output files? It would be great if you highlight the paragraph that has this issue (maybe share a screenshot). It will help us reproduce and investigate the issue.

sonamsinha · November 22, 2024, 5:18am

header footer test.pdf (87.2 KB)

header footer test.docx (21.5 KB)

I am using these doc and pdf. The voice over is not reading header and footer
Screenshot 2024-11-21 at 5.25.47 PM.jpg (84.8 KB)

Header tag is not present in screenshot.

sonamsinha · November 22, 2024, 5:18am

same thing happening for footer

atir.tahir · November 22, 2024, 7:47pm

It depends on the tool that you are using. Please take a look at this screenshot and details.

sonamsinha · November 25, 2024, 4:00am

Hi @atir.tahir Can you please also confirm “The accesibility related tags should not be stripped from headers during pdf conversion” If this is also dependent on tool?

Atleast we should see heading tag in tool? I tried testing in adobe using voice over in mac. There I was not seeing header footer tags. This is expected behaviour?

atir.tahir · November 25, 2024, 11:19am

Yes, it could be.
Could you please try following tools and share the results/screenshots:

sonamsinha · November 25, 2024, 11:44am

Hi @atir.tahir Can you please confirm from your end for both windows and Mac?

atir.tahir · November 25, 2024, 12:19pm

@sonamsinha

Please take a look at this image (25.7 KB). It reads the header/footer. We used ReadAloud extension for this. Moreover, see this one (93.0 KB) taken from Pave PDF and it doesn’t consider header/footer.

Therefore, we think that it depends on the tool you are using.

sonamsinha · November 26, 2024, 2:40am

@atir.tahir
TestDocument (7).pdf (20.6 KB)

Are you able to read image alt text and dash after Request Id as well in this document?

sonamsinha · November 26, 2024, 5:56am

@atir.tahir
Can you suggest a screen reader tool for mac and windows which we can use for our testing?

atir.tahir · November 26, 2024, 6:23am

@sonamsinha

We get these results using PAVE. And these results using PAC. You can try PAVE that will work for both Windows and Mac.

sonamsinha · November 27, 2024, 6:33am

Screenshot 2024-11-26 at 5.12.09 PM.jpg (104.3 KB)

test header footer (1).docx (85.0 KB)

Hi @atir.tahir If you see above screenshots. The section is present for header footer. I am testing this using adobe. We are not using here groupdocs for word to pdf conversion. Can you please take a look and confirm why this is not happening for groupdocs?

atir.tahir · November 27, 2024, 12:56pm

@sonamsinha

Could you clarify this for us? Are you converting Word documents to PDF without using GroupDocs and then checking for artifacts in the resulting PDF? Does the PDF generated without conversion show all artifacts, while the output from GroupDocs does not display everything? Are you saying that when converting Word to PDF with GroupDocs, not all artifacts are being captured in the PDF?
Secondly, do you have GroupDocs license? Could you please also share the conversion code (when you use GroupDocs for the conversion)?

sonamsinha · November 28, 2024, 3:26am

pdftron test.pdf (36.2 KB)

Here is the converted pdf. We are expecting header and footer tags should also be present in same way after converting word to pdf using groupdocs. But while converting word to pdf using groupdocs these header footer tags are missing and that’s why we are not able to read header footer.

sonamsinha · November 28, 2024, 3:30am

Hi @atir.tahir @vladimir.litvinchik
Yes we have groupdocs license
Product → GroupDocs.Total for Java
LicenseVersion ->3.0
Yes I meant this only “the PDF generated without conversion show all artifacts, while the output from GroupDocs does not display everything”

Conversion Code

 public byte[] generatePdfUsingGroupDocs(byte[] contentInBytes, boolean showDocTitle, String title) throws IOException {
        long startTime = System.currentTimeMillis();
        WordProcessingLoadOptions loadOptions = new WordProcessingLoadOptions();
        // Below is required for adding Tags in the output PDF documnet
        loadOptions.setPreserveDocumentStructure(true);
        logger.info("Office document size in bytes: {}", contentInBytes.length);
        InputStream is = new ByteArrayInputStream(contentInBytes);
        title = isBase64Encoded(title) ? new String(Base64.decodeBase64(title), StandardCharsets.UTF_8) : title;
        try (Converter converter = new Converter(() -> is, () -> loadOptions)) {
            try (ByteArrayOutputStream ms = new ByteArrayOutputStream()) {
                PdfConvertOptions convertOptions = new PdfConvertOptions();
                PdfOptions pdfOptions = convertOptions.getPdfOptions();
                pdfOptions.getFormattingOptions().setDisplayDocTitle(showDocTitle);
                PdfDocumentInfo pdfDocumentInfo = pdfOptions.getDocumentInfo();
                pdfDocumentInfo.setTitle(title);
                converter.convert(() -> ms, convertOptions);
                byte[] outputPdfBytes = ms.toByteArray();
                long endTime = System.currentTimeMillis();
                logger.info("Generated PDF content size in bytes: {}", outputPdfBytes.length);
                logger.info(
                        "Office document to PDF conversion using GroupDocs completed in {}ms", (endTime - startTime));
                return outputPdfBytes;
            }
        } catch (IOException e) {
            System.out.println(e.getMessage());
            throw e;
        }
    }

atir.tahir · November 28, 2024, 9:28am

@sonamsinha
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): TOTALJAVA-236

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

sonamsinha · December 13, 2024, 7:23am

Hi @atir.tahir Can you please share updates here?

atir.tahir · December 13, 2024, 9:59am

@sonamsinha

This ticket is still under investigation.