PDF to HTML conversion issue in .NET

Hi I used Groupdocs.conversion for converting pdf,.docx files to .html format
string inDirectory = Path.Combine(Properties.Settings.Default.DocumentStorageFolderPath, document.Location);

using (Converter converter = new Converter(inDirectory))
{
MarkupConvertOptions options = new MarkupConvertOptions
{
FixedLayout = true
};

          converter.Convert(Path.Combine(Properties.Settings.Default.DocumentStagingFolderPath, "search", "Html", convertedFileName + ".html"), options);
            }

by using this conversion its not converting properly…
i will attach the original file,html file for your reference.Venkatesh.pdf (251.3 KB)

Converted html file
image.png (251.2 KB)

html file in code view…look at the selected span - Primary Contact .
upload.jpg (188.0 KB)

the word Contact is seperated by conta and ct…

@bharathiGK,

Thank you for taking interest in GroupDocs.Conversion for .NET.

Is this the only issue you are facing? When we inspect element “Primary Contact”, the word Contact is separated. But it is rending as a collective word.

Yes.i noticed for this word alone. actually i’m applying tag (mark) for that Word Contact ,in that case due to seperation of word “conta” and “ct” couldn’t able to apply

1 Like

@bharathiGK,

This issue is reproduced at our end. Hence, we are now investigating it. Your investigation ticket ID is CONVERSIONNET-3851. As there’s any update, you’ll be notified.

Okay.Thank you

@bharathiGK,

You are welcome.

When can i get update on this issue.

1 Like

@bharathiGK

We are planning to fix this issue in API version 20.7 and the release is expected sometime in July. However, in case of any further update, we’ll notify you.

Okay Thank you

@bharathiGK

You are welcome.

Hi.i wish to get update on this issue.?

@bharathiGK

The fix is planned this month, if nothing goes wrong (e.g. blockage due to some other issue). However, we’ll notify you in case of any update.

Okay.But i’m in mid of my process bacause of this issue.

Can i get any other solution to convert any format document into html (without this problem)

@bharathiGK

You can try a Word to HTML conversion. For example have a look at these files.zip (69.3 KB).

You mean,whatever file type is there ,that needs to convert as a word document and then need to be converted as html.
like this you coming to convey.
because i used to get documents for conversion .pdf,.xlsx,.pptx and so on.

@bharathiGK

No.

I think there was some misunderstanding. What I said was based on this message. However, you can use GroupDocs.Viewer for .NET (as an alternate solution) in order to render/convert a PDF to HTML.
On top of that, we actually tried to convert the provided PDF to HTML and here are the results.zip (323.7 KB). Primary Contact as a whole word is in a single span.

Ok.This results.Zip looks good.can i get sample coding for that.

which version of viewer is used? and i need sample coding for that.if possible can you share example

1 Like

@bharathiGK

I used version 20.6.1 and below is the sample code:

string outputDirectory = @"D:/";
string pageFilePathFormat = Path.Combine(outputDirectory, "page_{0}.html");
using (Viewer viewer = new Viewer(@"D:/Venkatesh.pdf"))
{
    HtmlViewOptions options =
    HtmlViewOptions.ForEmbeddedResources(pageFilePathFormat);
    viewer.View(options);
}

However, a complete open-source example project is available at GitHub.

Okay.Thank you.currently i’m using viewer 20.1.0.if i update to 20.6.1 means,what are the other dlls need to be updated?.

@bharathiGK

You just have to update GroupDocs.Viewer for .NET DLL. However, you can also try same code with version 20.1.