Empty <title> tags being created when loading PDF and EPub documents

bgrimes · October 31, 2022, 7:50pm

Hello,

We are reviewing the HTML within the reader and noticed that there are empty tags throughout the document. We are seeing this in both EPub and PDF documents. This will cause accessibility issues with our product and would like to understand why these empty tags are being created and how to either use them or get rid of them.

Thank you

MicrosoftTeams-image (1).png (35.4 KB)

vladimir.litvinchik · October 31, 2022, 8:55pm

@bgrimes

We can reproduce this issue on our side. The simple workaround could be replacing <title> and </title> strings with the empty string. We’ll investigate this issue and update you.

bgrimes · November 1, 2022, 12:17pm

Vladimir,

Thank you for taking a look. This suggestion infers that we can control the HTML used in the viewer for displaying a document. If I am correct in that, do you have documentation on how to do this?

Thanks again

vladimir.litvinchik · November 1, 2022, 1:54pm

@bgrimes

You have full control of the output produced by the Viewer. To replace text in the HTML you have to use a custom stream factory - a class that is responsible for instantiating the output stream and closing it. Find out an example in Save output to a stream.

The following code converts a file to HTML and replaces Title tags with empty strings.

List<MemoryStream> pages = new List<MemoryStream>();

using (Viewer viewer = new Viewer("sample.pdf"))
{
    MemoryPageStreamFactory pageStreamFactory = new MemoryPageStreamFactory(pages);

    ViewOptions viewOptions =
        HtmlViewOptions.ForEmbeddedResources(pageStreamFactory);

    viewer.View(viewOptions);
}

internal class MemoryPageStreamFactory : IPageStreamFactory
{
    private readonly List<MemoryStream> _pages;

    public MemoryPageStreamFactory(List<MemoryStream> pages)
    {
        _pages = pages;
    }

    public Stream CreatePageStream(int pageNumber)
    {
        MemoryStream pageStream = new MemoryStream();

        _pages.Add(pageStream);

        return pageStream;
    }

    public void ReleasePageStream(int pageNumber, Stream pageStream)
    {
        MemoryStream memoryStream = (MemoryStream)pageStream;
        
        string html = Encoding.UTF8.GetString(memoryStream.ToArray());
        
        string result = html
            .Replace("<title>", string.Empty)
            .Replace("</title>", string.Empty);
        
        byte[] bytes = Encoding.UTF8.GetBytes(result);

        memoryStream.SetLength(0);
        memoryStream.Write(bytes, 0, bytes.Length);
    }
}

bgrimes · November 1, 2022, 2:56pm

Vladimir,

Thanks, we will work this today and get back to you if we have any questions.

vladimir.litvinchik · November 1, 2022, 3:17pm

@bgrimes

You’re welcome!

bgrimes · November 2, 2022, 6:10pm

Vladimir, we are all set on this one. Thanks for your help.

vladimir.litvinchik · November 2, 2022, 7:20pm

@bgrimes

You’re welcome! Thank you for the feedback!

vladimir.litvinchik · June 26, 2023, 1:54pm

@bgrimes

This issue has been fixed in GroupDocs.Viewer for .NET 23.6. The version is available at

Have a nice day!