Unexpected behavior when converting to PDF

bhoughton · July 18, 2024, 5:58pm

In recent testing, I observed several results that were not what I expected when attempting to convert various plain text files to PDF. I used the latest version of GroupDocs.Conversion (24.6) in a Spring Boot 3.3.1 application.

Sending conversion output to a stream does not improve memory usage

Certain large files require an exceptional amount of memory to convert. For example, converting one 161 MB plain text file to PDF required more than 11 GB of heap memory at times. If I tried to reduce the maximum heap size, the application would run out of heap. (It was not handling any other requests at the time.)

In an attempt to reduce memory usage when converting large files, I tried using an async handler with StreamingResponseBody:

    public class ConverterListener implements IConverterListener {
        private final Clock systemClock = Clock.systemDefaultZone();

        public void started() {
            log.info("Conversion started...");
        }

        public void progress(byte current) {
            var output = "... " + current + "% at " + systemClock.instant();
            log.info(output);
        }

        public void completed() {
            log.info("... conversion completed");
        }
    }

    @PostMapping("/convert/pdf")
    public ResponseEntity<StreamingResponseBody> convertToPdf(@RequestParam("file") MultipartFile file) {
        var listener = new ConverterListener();
        var settings = new ConverterSettings();
        settings.setListener(listener);

        try (var converter = new Converter(() -> {
            try {
                return file.getInputStream();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }, () -> settings)) {
            PdfConvertOptions options = new PdfConvertOptions();

            StreamingResponseBody responseBody = outputStream -> {
                converter.convert(() -> outputStream, options);
            };

            return ResponseEntity.ok().body(responseBody);
        } catch (Exception e){
            throw new GroupDocsConversionException(e.getMessage());
        }
    }

I expected that GroupDocs.Conversion would write to the stream as it converted and use much less memory. However, the memory usage when writing to the stream was no different than when I sent the output to a file on the local file system. Is this the intended behavior?

Sending conversion output to a stream does not truly stream the output

This is related to the previous observations. When performing the same conversion, I noticed that nothing was written to the HTTP response body until the file conversion had finished. This is not what I would have expected when writing to a stream. Is this the intended behavior?

Different plain text files of similar size can require different amounts of memory to convert

I found that two plain text files of almost the same size but with different contents required remarkably different amounts of memory to convert. I mentioned above the 161 MB file that required over 11 GB of heap at times. Another 164 MB file barely required 5.5 GB except for a burst at the end.

Is this the intended behavior? It would make sense to me if different file types with contents that varied widely in complexity might require different amounts of memory to convert, but I was very surprised to observe such a disparity when converting two plain text files.

Aside from the above question, why does converting these files require even several GB of heap memory? I would have guessed that, regardless of how the output is written, GroupDocs.Conversion might periodically write what has already been converted and then free the memory used to convert it. It does not appear that that always happens, though.

Conversion events are not received until conversion completes

In the code snippet above, I followed GroupDocs’ instructions for listening to events in order to monitor and report conversion progress. However, whether I wrote the output to a local file or to a StreamingResponseBody, none of the events were sent while the file was being converted. All of the events were sent after conversion completed. The time stamps in the lines written by ConverterListener#progress all were within a fraction of a second of each other. This is not the behavior that the documentation I linked above appears to describe.

Conclusion

Please respond at your earliest convenience with answers to my questions and any guidance needed to resolve the issues I observed. Thank you for your time and attention.

vsevolod.orefin · July 19, 2024, 10:04am

@bhoughton
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CONVERSIONJAVA-2437

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.