How to exclude download hyperlinks images during document conversion in Java?

We need to convert some documents and would have hyperlinked based images… i.e might download at runtime or when the document is launched which we want to avoid during conversion as it takes lot of time and firewall restrictions

Any suggestions?

@Suds

Could you please share more details on this scenario? Please share the sample conversion code as well.

Hi

Attached html fyr.

Here we want to exclude any src which is connecting to external source

Reason : As the servers wont be exposed to Internet , 2. We are doing conversion from .msg to pdf.

For test purpose have shared .html which might go as email

New Text Document.7z (388 Bytes)

Observation: Takes around 10 mins to convert for 100 kb due to the converter is trying to download external info . and we want to disable these external source

@Suds

Are you using .NET or Java variant of the API. Please also share the sample code.

Hi Java

File directoryPath = new File(“D:\msg\”);
//List of all files and directories
String contents[] = directoryPath.list();
Converter converter = new Converter(directoryPath.getAbsolutePath() +"\"+contents[0]);
PdfConvertOptions options1 = new PdfConvertOptions();
String convertedFile = “D:\pdf\” + contents[i].replace(".msg", “”) + “.pdf”; //final path to be saved
converter.convert(convertedFile, options1);

1 Like

@Suds

It took only 7-8 seconds to convert the provided HTML file to PDF. Please have a look at this screenshot.png (81.2 KB) and this output.pdf (96.0 KB). We can further investigate this issue, if you share the exact problematic file.

You have to disable the internet connection to test

As in our scenario there are 8+ pages - all have external source of image reference

@Suds

We still cannot reproduce this issue. Please note that API doesn’t rely on internet. It works offline perfectly and doesn’t try to fetch any URL. Please share the problematic file and we’ll investigate accordingly.

Thanks

What I observed at customer place is that, Internet is enabled, but the src links that is trying to connect and taking time to connect and download. and later the api gives up and prints without the image contents

As it has lot of firewall restrictions… So was looking for an api option that can completely disconnect access during conversion

@Suds

Like we already explained, this API doesn’t require internet connection at all to convert the documents. However, if you are loading the source file from some cloud storage (e.g. Azure blog) then you may need that but the core conversion process doesn’t require internet connection.
Could you please share a screencast/video of this issue? Also share the problematic file that is taking time during conversion.

How to exclude download hyperlinks images during document conversion in Java? - #10 by atir.tahirRec 31-08-2021.7z (292.5 KB)

As the file is bit sensitive . hence did not upload but has lot of images linked for download

Please open in VLC player

@Suds

We are already investigating this.PNG (84.0 KB) exception. Your investigation ticket ID is CONVERSIONJAVA-1402.

You could share the problematic file in a private message. Otherwise, please create a new MSG file using that issue could be reproduced.
We tried to reproduce this issue (lot of conversion time for a file with hyperlinked images or so) at our end multiple times. But all sample files were converted before 30 seconds.

Hi we will work on this

We did another test using .NET but it worked fine without any issue . i e able to download images etc and completes faster . As our design already in java … so cannot switch now…

@Suds

You can make a new MSG file using that issue could be reproduced and share that with us. However, we’ll notify you as there’s any update on CONVERSIONJAVA-1402.