When converting some Word file that contains a table of contents to HTML using GroupDocs.Viewer, Hyperlinks are created for each entry.
Unfortunately, those hyperlinks are relative, so they point to the currently open document.
It looks something like that:
To workaround this you can use the following code:
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using GroupDocs.Viewer;
using GroupDocs.Viewer.Interfaces;
using GroupDocs.Viewer.Options;
internal class Program
{
private static void Main(string[] args)
{
using (Viewer viewer = new Viewer("Toc.docx"))
{
ViewOptions viewOptions =
HtmlViewOptions.ForEmbeddedResources(new LinkDisablingPageStreamFactory());
viewer.View(viewOptions);
}
}
}
class LinkDisablingPageStreamFactory : IPageStreamFactory
{
public Stream CreatePageStream(int pageNumber) =>
new MemoryStream();
public void ReleasePageStream(int pageNumber, Stream pageStream)
{
MemoryStream memoryStream = (MemoryStream)pageStream;
string html = Encoding.UTF8.GetString(memoryStream.GetBuffer(), 0, (int)memoryStream.Length);
html = Regex.Replace(html, @"<a href=""[^""]*""", "<a");
File.WriteAllText($"page-{pageNumber}.html", html);
pageStream.Dispose();
}
}
It uses a regular expression to remove href attribute from a tag. It has a side effect - all other links in this document will stop working too. Unfortunately, we can’t add this feature to our code as it may affect other users. Please let us know if it works for you.
We’ll take a look for other solutions and update you.
This feature was implemented in GroupDocs.Viewer for .NET 24.1. See how to use it in Unlink table of contents documentation section. The version can be found at: