Unexpected behavior when converting to PDF

In recent testing, I observed several results that were not what I expected when attempting to convert various plain text files to PDF. I used the latest version of GroupDocs.Conversion (24.6) in a Spring Boot 3.3.1 application.

Sending conversion output to a stream does not improve memory usage

Certain large files require an exceptional amount of memory to convert. For example, converting one 161 MB plain text file to PDF required more than 11 GB of heap memory at times. If I tried to reduce the maximum heap size, the application would run out of heap. (It was not handling any other requests at the time.)

In an attempt to reduce memory usage when converting large files, I tried using an async handler with StreamingResponseBody:

    public class ConverterListener implements IConverterListener {
        private final Clock systemClock = Clock.systemDefaultZone();

        public void started() {
            log.info("Conversion started...");
        }

        public void progress(byte current) {
            var output = "... " + current + "% at " + systemClock.instant();
            log.info(output);
        }

        public void completed() {
            log.info("... conversion completed");
        }
    }

    @PostMapping("/convert/pdf")
    public ResponseEntity<StreamingResponseBody> convertToPdf(@RequestParam("file") MultipartFile file) {
        var listener = new ConverterListener();
        var settings = new ConverterSettings();
        settings.setListener(listener);

        try (var converter = new Converter(() -> {
            try {
                return file.getInputStream();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }, () -> settings)) {
            PdfConvertOptions options = new PdfConvertOptions();

            StreamingResponseBody responseBody = outputStream -> {
                converter.convert(() -> outputStream, options);
            };

            return ResponseEntity.ok().body(responseBody);
        } catch (Exception e){
            throw new GroupDocsConversionException(e.getMessage());
        }
    }

I expected that GroupDocs.Conversion would write to the stream as it converted and use much less memory. However, the memory usage when writing to the stream was no different than when I sent the output to a file on the local file system. Is this the intended behavior?

Sending conversion output to a stream does not truly stream the output

This is related to the previous observations. When performing the same conversion, I noticed that nothing was written to the HTTP response body until the file conversion had finished. This is not what I would have expected when writing to a stream. Is this the intended behavior?

Different plain text files of similar size can require different amounts of memory to convert

I found that two plain text files of almost the same size but with different contents required remarkably different amounts of memory to convert. I mentioned above the 161 MB file that required over 11 GB of heap at times. Another 164 MB file barely required 5.5 GB except for a burst at the end.

Is this the intended behavior? It would make sense to me if different file types with contents that varied widely in complexity might require different amounts of memory to convert, but I was very surprised to observe such a disparity when converting two plain text files.

Aside from the above question, why does converting these files require even several GB of heap memory? I would have guessed that, regardless of how the output is written, GroupDocs.Conversion might periodically write what has already been converted and then free the memory used to convert it. It does not appear that that always happens, though.

Conversion events are not received until conversion completes

In the code snippet above, I followed GroupDocs’ instructions for listening to events in order to monitor and report conversion progress. However, whether I wrote the output to a local file or to a StreamingResponseBody, none of the events were sent while the file was being converted. All of the events were sent after conversion completed. The time stamps in the lines written by ConverterListener#progress all were within a fraction of a second of each other. This is not the behavior that the documentation I linked above appears to describe.

Conclusion

Please respond at your earliest convenience with answers to my questions and any guidance needed to resolve the issues I observed. Thank you for your time and attention.

@bhoughton
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CONVERSIONJAVA-2437

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

I’m having the same problems with the ConversionListener to get events as mentioned above.
The ConversionListener is still not sending events properly in GroupDocs Conversion 25.9.
As can be seen in the logging below, after the pipeline has been selected the conversion takes 17 seconds to perform, and then in 0 ms all percentage events are sent directly after eachother.
In this way it is completely useless. Can you please fix this.

18-12-25 11:32:34.504 [  1] Info webprint: ProcJob[254]: Starting conversion process...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Determine loaded for source document _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx ...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: ... loader selected.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Loading source document _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx ...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: ... document loaded.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Main document will be converted.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Starting conversion of _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]:  ... pipeline selected ... 
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 6% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 13% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 20% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 26% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 33% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 40% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 46% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 53% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 60% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 66% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 73% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 80% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 86% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 93% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 100% completed...
18-12-25 11:32:51.355 [  1] Debug webprint: ProcJob[254]:  ... conversion completed.
18-12-25 11:32:52.286 [  1] Info webprint: ProcJob[254]: Conversion done

Hello @ravitsec ,

Thank you for contacting us. We have rechecked the behavior of our IConverterListener using our GitHub example and noticed that it does not fire an event only for the Started() method, while the other methods receive events as expected according to the logic.

For the issue with the Started() method, we have created a corresponding ticket with the ID CONVERSIONJAVA-3050 and linked it to this forum thread.

If your situation is fundamentally different, please share the code sample you are using, as it may help clarify the issue.

Yes, I noticed that too that the start event wasn’t fired.
But that is not the problem I’m experiencing.
For me (and also for the orginal creator of this thread) it is about WHEN the progress events are fired.

I use the following code:

ConversionProgress convprogress = new ConversionProgress(logwebprint, sJobID);
ConversionLogger convlogger = new ConversionLogger(logwebprint, sJobID);
ConverterSettings convsettings = new ConverterSettings();
convsettings.setLogger(convlogger); //define logging mechanism to log progress of the conversion
convsettings.setListener(convprogress); //define listener to log progress of the conversion

Converter converter = new Converter(nonpdffile.getAbsolutePath(), () -> convsettings);

PdfConvertOptions options = new PdfConvertOptions();

logwebprint.log(LogHandle.s_INFO, "ProcJob[" + sJobID + "]: Starting conversion process...");
converter.convert(convertpdffile.getAbsolutePath(), options);

And the class for keeping track of the progress:

public class ConversionProgress implements IConverterListener
{
	private LogHandle m_logwebprint;
	private String m_sJobID = "";
	
	public ConversionProgress(LogHandle logwebprint, String sJobID)
	{
		m_logwebprint = logwebprint;
		m_sJobID = sJobID;
	}
	
	@Override
    public void started() {
    }

    @Override
    public void progress(byte current) {
        m_logwebprint.log(LogHandle.s_INFO, "ProcJob[" + m_sJobID + "]: Conversion " + current + "% completed...");
    }

    @Override
    public void completed() {
		m_logwebprint.log(LogHandle.s_INFO, "ProcJob[" + m_sJobID + "]: Conversion done");
    }
}

The resulting output when I do a conversion is then:

18-12-25 11:32:34.504 [  1] Info webprint: ProcJob[254]: Starting conversion process...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Determine loaded for source document _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx ...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: ... loader selected.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Loading source document _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx ...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: ... document loaded.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Main document will be converted.
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]: Starting conversion of _lms_2004003_x_x_6227105744396446424_DOCX - very large - tables - links - indexes.docx...
18-12-25 11:32:34.599 [  1] Debug webprint: ProcJob[254]:  ... pipeline selected ... 
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 6% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 13% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 20% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 26% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 33% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 40% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 46% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 53% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 60% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 66% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 73% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 80% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 86% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 93% completed...
18-12-25 11:32:51.355 [  1] Info webprint: ProcJob[254]: Conversion 100% completed...
18-12-25 11:32:51.355 [  1] Debug webprint: ProcJob[254]:  ... conversion completed.
18-12-25 11:32:52.286 [  1] Info webprint: ProcJob[254]: Conversion done

The calls to the progress function all come at the same time. All within the same millisecond. And that while the conversion takes 17 seconds in total.
Also strange is that whatever document I use to convert it always results in 6, 13, 20, 26, 33, 40, 46, 53, 60, 66, 73, 80, 93, 100.
Each time all within the same millisecond fired.

@ravitsec ,

I understand your interest in these events, and I will need some time to investigate this matter. I will get back to you as soon as I receive the necessary information from our developers.

It is possible that we may need to revisit the logic of this functionality. In the meantime, could you please share your thoughts on what kind of behavior or features you would like to see in IConverterListener? This would allow me to pass your feedback on to our development team.

We always strive to listen to our users’ suggestions to improve our products, so we will be looking forward to your feedback.

The main purpose that I would want to use those progress events for, is to let the end user know what is happening to his uploaded document.
I would like to show a progress bar to show how far along the conversion process is.

That listener that we have now seemed to be perfect for this, but with this issue of getting all % values in the same millisecond, it is not usable.
The user would currently see it as being 0% for 17 seconds, before it jumps straight to 100%.

So it would help a lot if you would be able to have those % events sent really during the converting of the document, and not after its done like it seems to be doing now.

@ravitsec ,

Please allow me to quote the response from our development team verbatim.

Response:

In general the progress is based on the completed operations. For each conversion we have a list of operations to be done. 
The progress is based on which operations are passed and it is calculated as percentage.  
The operations we are going to really execute depend on the applied LoadOptions and ConvertOptions, so not all operations are really executed. 
For instance if we have to image conversion - we have options to resize, apply watermark, etc, so they are in the list which builds the conversion chain, but if we do not have the corresponding options set, those operations are exiting very quickly. 
Most likely that is the reason for those milliseconds.

As you can see, everything mainly depends on the selected LoadOptions and ConvertOptions. In your case, I believe the 17-second delay is caused by the conversion process itself (i.e., it is treated as a single operation), while the other operations are executed instantly because they are not actually involved, although they are still included in the overall percentage calculation.

In some conversion scenarios, we use operations such as intermediate conversion, and we may also re-save the document after applying certain ConvertOptions, which can potentially introduce delays in those operations as well. This behavior is then reflected in the corresponding events.

If you have any further questions, please feel free to ask.