Unwanted text indentation when converting from docx to png

nirm · February 14, 2022, 1:21pm

Hi,
input file: 2101TLy.docx (31.0 KB)
Output file: 2101TLy.png (238.5 KB)
Comparison screenshot (bug): COMPARISON.jpg (203.8 KB)

CONVERSION CODE:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.filetypes.FileType;
import com.groupdocs.conversion.options.convert.ImageConvertOptions;

import java.io.InputStream;

public class FileConvertor2 {

private static final Logger logger = Logger.getLogger(FileConvertor2.class);

public static int convert(InputStream inputStream, FileFormat inputFormat, FileFormat outputFormat) throws Exception {
    try {
        final Converter converter = new Converter(inputStream);
        final ImageConvertOptions options = new ImageConvertOptions();
        tryToEnhanceQuality(options, inputFormat);
        options.setFormat(FileType.fromExtension(outputFormat.getExtension()));
        options.setPagesCount(1); // currently we convert just 1 page separately
        final int totalDocumentPages = converter.getDocumentInfo().getPagesCount();
        for (int i = 1; i <= totalDocumentPages; i++) {
            try (ByteBufferOutputStream outputStream = new ByteBufferOutputStream()) {
                options.setPageNumber(i); // The page index to be converted
                converter.convert(outputStream, options);
                // Save the output stream to an external source....
            } catch (Exception e) {
                logger.error(e, "Failed to convert page #%s to %s", i, options.getFormat().getExtension());
                throw e;
            }
        }
        return totalDocumentPages;
    } catch (Exception e) {
        logger.error(e, "Failed to convert file to %s", outputFormat.getExtension());
        throw e;
    }
}

private static void tryToEnhanceQuality(ImageConvertOptions options, FileFormat inputFormat) {
    int resolutionDPIVal = -1;
    // Approximate print-size of PDF: width 8.27 X  height 11.69 INCH
    switch (inputFormat) {
        case DOC:
        case DOCX:
        case JPG:
        case JPEG:
            // Why 300?
            // Because the maximum resolution we aim is 2200 X 3000 which is about 220-300 DPI.
            // Also, the default DPI value of a Microsoft Word doc is 220 DPI (Windows 10) and we want to maintain/enhance that quality when converting to image
            resolutionDPIVal = 300;
            break;
        default:
            break;
    }
    if (resolutionDPIVal > 0) {
        options.setHorizontalResolution(resolutionDPIVal);
        options.setVerticalResolution(resolutionDPIVal);
    }
}

}

atir.tahir · February 14, 2022, 7:29pm

@nirm

We cannot reproduce this issue at our end. Please have a look at this output.png (183.5 KB). Could you please share your development environment details (e.g. OS version, Java version) and a sample application. We’ll then further look into this scenario.

nirm · February 15, 2022, 10:02am

I use GroupDocs.Coversion 21.10.1.
The issue was reproduced in 2 cases:
OS: Windows 10 Pro, Java 11, JDK Amazon Corretto 11.0.11.
OS (running on AWS): Amazon FARGATE so it’s Linux/Unix, Amazon Linux 2.0.20220121

Am I doing something wrong in my convert code?
How is it possible that I’m getting different results from you?

Please keep me posted, thank you

atir.tahir · February 15, 2022, 11:15am

@nirm

Make sure your environment has the font family installed that is used in the source file. If you closely look at the output.png or even at the source/Word file, you will see the font difference.jpg (120.7 KB). Hence, please install the missing font. You can also specify the default font or the font substitution, have a look at the Load WordProcessing Document with Options.

nirm · February 15, 2022, 12:41pm

You can see in my code, that I don’t use WordProcessingLoadOptions, I’ll try to use and update here.
Have you had the chance to inspect my input file in an environment that is similar to my own?
Also, from what machine (OS, java jdk type and version), your output file?

atir.tahir · February 15, 2022, 3:09pm

@nirm

OS: Windows 11
JAVA: openjdk version 11.0.14.1
Runtime environment: Corretto-11.0.14.10.1

Below is the code:

package com.groupdocs.conversion.examples;
import java.io.File; 
import com.groupdocs.conversion.Converter; 
import com.groupdocs.conversion.filetypes.ImageFileType;
import com.groupdocs.conversion.options.convert.ImageConvertOptions; 
import java.io.FileInputStream;

public class RunExamples {
	public static void main(String[] args) throws Throwable { 
		String outputFolder = "D:/";
		String outputFileTemplate = new File(outputFolder, "output-page-%d.png").getPath();
		final Converter converter = new Converter(new FileInputStream("D:/2101TLy.docx"));
		final ImageConvertOptions options = new ImageConvertOptions();
		options.setHorizontalResolution(300);
		options.setVerticalResolution(300);
		options.setFormat(ImageFileType.Png);
		options.setPagesCount(1);  
		options.setPageNumber(1); 
		converter.convert(outputFileTemplate, options); 
		converter.close();
	} 
}

Could you please share a list of installed fonts on your machine and a running console application?

atir.tahir · February 15, 2022, 9:08pm

@nirm

This issue is reproduced at our end on Linux and we’ve logged it in our internal issue tracking system with ID CONVERSIONJAVA-1568. It’ll be now further investigated. We’ll notify you in case of any update.

atir.tahir · February 20, 2022, 7:06pm

@nirm

Please install all the fonts that are used in the source/Word document. If you install these fonts, issue will be resolved.
We cannot reproduce this issue at our end using this approach.

nirm · February 21, 2022, 8:39am

My customers are sending me documents that I convert them to image. I don’t know which fonts are used. Can you please advice which fonts should I install on my machine? Or do you know a general good list of known fonts to install, from you experience?
Thank you

atir.tahir · February 21, 2022, 4:49pm

@nirm

You can try these top/frequently used fonts or fonts from this PDF.pdf (910.5 KB) (on a safe side).