We're sorry GroupDocs doesn't work properply without JavaScript enabled.

Free Support Forum - groupdocs.com

File conversion is a bit off

Hi,
The following file Amasia2 Inv 06-28-2021.docx (180.1 KB) is converting badly.
First page is a bit breaking the content of the table, I added screenshot comparing original page vs the converted page (bad conversion.png (81.8 KB)). I’m converting a docx file and creating a png per page.

Another problem, the 4’th page isn’t being converted, although it does contain a few break lines / whitespaces (not actual text).

Can you please help?
I’m using paid version 21.10.1.

Here’s the conversion code:

import com.groupdocs.conversion.Converter;
import com.groupdocs.conversion.filetypes.FileType;
import com.groupdocs.conversion.options.convert.ImageConvertOptions;

import java.io.InputStream;

public class FileConvertor2 {

private static final Logger logger = Logger.getLogger(FileConvertor2.class);

public static int convert(InputStream inputStream, FileFormat inputFormat, FileFormat outputFormat) throws Exception {
    try {
        final Converter converter = new Converter(inputStream);
        final ImageConvertOptions options = new ImageConvertOptions();
        tryToEnhanceQuality(options, inputFormat);
        options.setFormat(FileType.fromExtension(outputFormat.getExtension()));
        options.setPagesCount(1); // currently we convert just 1 page separately
        final int totalDocumentPages = converter.getDocumentInfo().getPagesCount();
        for (int i = 1; i <= totalDocumentPages; i++) {
            try (ByteBufferOutputStream outputStream = new ByteBufferOutputStream()) {
                options.setPageNumber(i); // The page index to be converted
                converter.convert(outputStream, options);
                // Save the output stream to an external source....
            } catch (Exception e) {
                logger.error(e, "Failed to convert page #%s to %s", i, options.getFormat().getExtension());
                throw e;
            }
        }
        return totalDocumentPages;
    } catch (Exception e) {
        logger.error(e, "Failed to convert file to %s", outputFormat.getExtension());
        throw e;
    }
}

private static void tryToEnhanceQuality(ImageConvertOptions options, FileFormat inputFormat) {
    int resolutionDPIVal = -1;
    // Approximate print-size of PDF: width 8.27 X  height 11.69 INCH
    switch (inputFormat) {
        case DOC:
        case DOCX:
        case JPG:
        case JPEG:
            // Why 300?
            // Because the maximum resolution we aim is 2200 X 3000 which is about 220-300 DPI.
            // Also, the default DPI value of a Microsoft Word doc is 220 DPI (Windows 10) and we want to maintain/enhance that quality when converting to image
            resolutionDPIVal = 300;
            break;
        default:
            break;
    }
    if (resolutionDPIVal > 0) {
        options.setHorizontalResolution(resolutionDPIVal);
        options.setVerticalResolution(resolutionDPIVal);
    }
}

}

Also, can you please provide code for multiple pages conversion? Ans not 1 by 1.
Thank you very much,
Nir

1 Like

@nirm

We cannot reproduce this issue at our end. Have a look at this output.zip (182.5 KB).
Please share OS details (e.g. name/version) and a working console application using that issue could be reproduced.

This issue is successfully reproduced. Hence, we’ve logged it in our internal issue tracking system with ID CONVERSIONJAVA-1563. You’ll be notified in case of any update.

We are also investigating the possibility to convert multi-paged file(s) to multiple images (e.g. PNG) at once. Your investigation ticket ID is CONVERSIONJAVA-1564.

The issue was reproduced in 2 cases:
OS: Windows 10 Pro, Java 11, JDK Amazon Corretto 11.0.11.
OS (running on AWS): Amazon FARGATE so it’s Linux/Unix, Amazon Linux 2.0.20220121

Am I doing something wrong in my convert code?
How is it possible that I’m getting different results from you?

Please keep me posted, thank you :slight_smile: :pray:

@nirm

We still cannot reproduce the issue using following:
OS: Windows 11
JAVA: openjdk version 11.0.14.1
Runtime environment: Corretto-11.0.14.10.1

Please share a sample application and a list of installed fonts on your machine. Could you please also share a screencast/video of all the steps used to reproduce the issue?

I don’t know what fonts are installed.
I added a small java file which is my conversion application: FileConvertor.7z (1.2 KB)
Please try to reproduce the “bug” on a Linux machine and not Windows using my java class and input file.
Input file again: Amasia2 Inv 06-28-2021.docx (180.1 KB)

1 Like

@nirm

Thank you for sharing the details. We successfully reproduced this issue on Linux. We are now further investigating it. Your investigation ticket ID is CONVERSIONJAVA-1566.

@nirm

The only solution to this issue is to install the standard Windows and Linux fonts. We cannot reproduce this issue if all the standard fonts are installed on Linux.

I wonder what is the default behaviour of you library.
Should it be if you don’t find/recognize the specific font of the input file, you take the DEFAULT font or the font that is the most similar to the original font?
Why is there an indentation? Unwanted whitespaces, break lines, tabs?

@nirm

If the actual font is not installed and if you don’t give it/load a default font. API considers Arial as a default font.
However, we’ll continue investigation to see if the above mentioned scenario is achievable or not.
All those alignment issues (break lines, tabs) are because of wrong font.