Prevent hyperlink creation for table of content when converting DOCX to HTML in .NET

Hi,

When converting some Word file that contains a table of contents to HTML using GroupDocs.Viewer, Hyperlinks are created for each entry.
Unfortunately, those hyperlinks are relative, so they point to the currently open document.
It looks something like that:

<a name="_Toc156299515" style="left:0pt; top:0pt;"></a>
<a href="#_Toc156299515">...</a

If the HTML file is split up in pages, that does not work obviously.
options.EmailOptions.PageSize = PageSize.A4;

Is it possible to turn those links off?
Or could it even be fixed somehow? but I guess that’s not possible, as filenames cannot be known in advance.

@Clemens_Pestuka

Thank you for reporting this issue. We’ll investigate how can we turn the links off and update you. The issue ID is VIEWERNET-4658.

1 Like

@Clemens_Pestuka

To workaround this you can use the following code:

using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using GroupDocs.Viewer;
using GroupDocs.Viewer.Interfaces;
using GroupDocs.Viewer.Options;

internal class Program
{
    private static void Main(string[] args)
    {
        using (Viewer viewer = new Viewer("Toc.docx"))
        {
            ViewOptions viewOptions =
                HtmlViewOptions.ForEmbeddedResources(new LinkDisablingPageStreamFactory());
            viewer.View(viewOptions);
        }
    }
}

class LinkDisablingPageStreamFactory : IPageStreamFactory
{
    public Stream CreatePageStream(int pageNumber) =>
        new MemoryStream();

    public void ReleasePageStream(int pageNumber, Stream pageStream)
    {
        MemoryStream memoryStream = (MemoryStream)pageStream;
        string html = Encoding.UTF8.GetString(memoryStream.GetBuffer(), 0, (int)memoryStream.Length);
        html = Regex.Replace(html, @"<a href=""[^""]*""", "<a");

        File.WriteAllText($"page-{pageNumber}.html", html);

        pageStream.Dispose();
    }
}

It uses a regular expression to remove href attribute from a tag. It has a side effect - all other links in this document will stop working too. Unfortunately, we can’t add this feature to our code as it may affect other users. Please let us know if it works for you.

We’ll take a look for other solutions and update you.

1 Like

@Clemens_Pestuka

This feature was implemented in GroupDocs.Viewer for .NET 24.1. See how to use it in Unlink table of contents documentation section. The version can be found at:

Have a nice day!

1 Like

@Clemens_Pestuka

Please note that GroupDocs.Viewer for .NET 24.1.1 hot-fix version was published. The version can be found at:

Please use 24.1.1 instead of 24.1.
We’re sorry for the inconvenience.