Memory Usage when Converting a DOCX file in .NET

Hi,

I’m finding that a simple console app using GroupDocs.Conversion to convert a 4 page Word document is using memory (Private Working Set) in the range 300MB-700MB on a Windows 2012 Standard with the latest updates.

When I use Office Automation, neither WinWord.exe nor the controlling application use much more than 40MB combined for the same conversion. This is what we’re looking to replace so the large difference in memory usage is getting some push back.

I’ve tried setting UseCache to false but it seemed to have no effect. In our case, we wouldn’t want any files cached as the source documents are usually unique.

I also tried updating to 18.1.0 because I saw in the release notes it mentions LocalCacheDataHandler but I can’t find any documentation that references this class so I’m not sure it would give me any additional control in turning off caching. At this point, I don’t know that caching is the problem–I was just looking for something to give me a little more insight into what could be holding on to the memory.

Below is the code I’m using. It gets executed 3 times for the same document in my default test. I’m hoping there’s something I can do reduce memory usage at the very least when conversion is complete and/or between conversions.

Any help you could provide would be appreciated.
Thanks
-Jonathan

 var config = new ConversionConfig();
 var conv = new ConversionHandler(config);
 var imageSaveOptions = new ImageSaveOptions
 {
    ConvertFileType = ImageSaveOptions.ImageFileType.Tiff,
    Grayscale = true,
    TiffOptions =
    {
       Compression = TiffOptions.TiffCompression.Ccitt4
    },
    HorizontalResolution = 203,
    VerticalResolution = 192,
 };

 var loadOptions = new LoadOptions();
 using (var source = File.OpenRead(parameters.Source.File)) 
 {
    using (var destination = File.OpenWrite(parameters.Target.File))
    {
       using (var result = conv.Convert(source, loadOptions, imageSaveOptions))
       {
          result.Save(destination);
       }
    }
 }

@jisabell,

Thank you for your inquiry.

Caching reduces memory usage and time if you convert same (cached) file again with cache enabled (UseCache = true). But in your case, every time you’ll have a unique document for conversion, yes caching is not the problem.
However, in order to investigate the issue at our end, we are required the problematic file(s) from you. Please share the problematic file(s) with us and we shall update you about the outcomes.

Basic.zip (79.5 KB)

@atirtahir3,
Thanks for your response. Please find a zip file containing Basic.docx attached. It is the file I did most of my experimentation with while monitoring memory.
-Jonathan

@jisabell,

Thanks for sharing the problematic file. We reproduced this scenario at our end as well. Conversion process touches 400MB-600MB memory usage. Please note that, if we have two documents of same file size. One may take longer time/memory to be converted than other. Because it depends on the file/document’s content as well. Document with more clip-arts/graphs/charts or images may take more memory/time to be converted than a simple text based document.
However, we’ve logged this behavior as an investigation in our internal issue tracking system with ID:CONVERSIONNET-2375. As we have any update, we’ll notify you.

Hi, again.

I just wanted to add that, upon further testing today, if I apply sufficient memory pressure or do a GC.Collect, the GC does seem to recover a lot of this memory. I’m not sure this was actually an issue with this module unless it happens to have been corrected in 18.1.

Thanks
-Jonathan

@jisabell,

Thanks for sharing your findings. So, by applying GC, your issue is resolved?

@atirtahir3,

Well, if there’s something that could be done about the total memory usage while converting, that would be great. It seems that, as the file gets bigger, more memory is used. Now, I’m hoping that really work is being done per-page, and .NET is just latent about freeing it without sufficient memory pressure. It’s tough for me to make that assertion from the outside.

Maybe I could throw a GC.Collect in the progress callback to see. I haven’t tried the progress callbacks and I’m not at a machine that could try it right now. I will try to do this Monday.

Thanks
-Jonathan

@jisabell,

We are still investigating this issue. As we have any workaround, we shall notify you.
Thanks

@jisabell,

We are working on some improvements that’ll lead to reduced memory consumption. These improvements will be announced in some upcoming version(s) of the API. As we have any further updates, we shall notify you.

Thanks for the update.
-Jonathan

@jisabell,

You are welcome.