Converting the attached file to HTML with GroupDocs.Viewer or to PDF with GroupDocs.Conversion takes about half an hour to complete. Version 21.2 was used for testing (both).
The resulting HTML/PDF file looks fine, it’s just the long conversion time that’s problematic for such a small file.
It’s probably related to the embedded images which cannot be loaded.
using (var viewer = new Viewer(documentPath))
{
var options = ViewOpts.HtmlViewOptions.ForEmbeddedResources("output_viewer{0}.html");
viewer.View(options);
}
Conversion:
using (var converter = new Converter(documentPath))
{
var options = new ConvOpts.PdfConvertOptions();
converter.Convert("output.pdf", options);
}
We could reproduce this issue at our end. It’s been logged in our internal issue tracking system with ID VIEWERNET-3129. As there’s any update, you’ll be notified.
I have investigated this file - it contains many invalid links. Files exist for these links, but with the wrong content-type format.
For example: https://www.nespresso.com/emailing/NespressoVisuals/common/temp_2020/logo-light.jpg - resource exists, but it’s not JPEG, it’s actually WEBP. So outlook and Viewer unable to handle it correctly, because of the wrong content type (JPEG, but actually is WEBP) returned by server www.nespresso.com.
We fixed long rendering (we will provide code for you), the fix will be in the current release (21.3). But because of these invalid links, links resources will not be visible (as it not visible in Outlook too).
Thanks for the quick and detailed answer!
I understand that the images cannot be viewed because of the wrong content type.
As long as the conversion time is fixed in 21.3, I am happy
Thank you
Hi, I would have a small followup question regarding this resource loading timeout:
Will this timeout only affect the time it takes to establish a connection, or would this timeout also terminate existing, but slow connections?
I’m wondering if it can be safely set to 1 second, or if this might interrupt slowly loading images?
As the conversion time is still rather slow for this document, when I set the timeout to 10 seconds,
I was wondering if it might be possible as a future improvement, to retrieve multiple images at once?
This specific file has 24 images that all can’t be loaded. With the 10 seconds timeout, it took over 4 minutes to render, which strongly suggests that images are currently not loaded in parallel.
I believe it relates to both cases as we’re using WebClient under the hood and it supposed to drop any connection regardless a connection’s state.
Yes, you’re totally right. The resources are loaded sequentially so the total loading time is a sum of time taken to load each image. We’ve created the issue in our bug tracker to investigate if we can add such a feature. The issue ID is VIEWERNET-3353. We’ll notify you in case of any updates.
Thanks for the update
Even when building the page sequentially, loading images in parallel should not be a problem.
I’m not certain if loading resources in parallel will really be considered a DDOS attack. I mean browsers are for sure loading images in parallel as well.
The proposed solution sounds good as well
For the new improvement “Prevent loading resources if the host is unavailable” I created new improvement VIEWERNET-3598 in our tracker. I will reply here in case of any updates.