How to view document using stream from Azure Blob Storage in .NET

AdrianHofman · October 15, 2021, 12:16am

Hi all,

I’m using GroupDocs.Viewer with a stream that is opened from Azure Blob Storage. When I pass a stream to the Viewer for a large document (an 18 MB PDF in my tests), it behaves badly - I can see from my instrumentation that it is repeatedly spamming the blob in an endless loop and the viewer never completes.

I note that the blob SDK documentation has the following remarks on the CloudBlob.OpenReadAsync method:

“On the System.IO.Stream object returned by this method, the System.IO.Stream.EndRead(System.IAsyncResult) method must be called exactly once for every System.IO.Stream.BeginRead(System.Byte[],System.Int32,System.Int32,System.AsyncCallback,System.Object) call. Failing to end the read process before beginning another read process can cause unexpected behavior.”

Can you please verify that GroupDocs.Viewer will treat the stream in the way recommended by the Azure Blob SDK docs?

Otherwise can you please advise of a solution here?

Kind Regards,
Adrian Hofman

vladimir.litvinchik · October 15, 2021, 6:27am

@AdrianHofman

Thank you for your for sharing the details about this issue.

While I can’t confirm that the stream is treated in the way recommended in the Azure Blob SDK docs the actual issue may be related to the fact that we can seek through a stream and as a result there could be multiple requests to the Azure API.

The simplest solution could be reading the entire file into memory with DownloadToStreamAsync and then passing this stream to Viewer e.g.

MemoryStream stream = new MemoryStream();
await blob.DownloadToStreamAsync(stream);

Please let us know if such a workaround works for you.

AdrianHofman · October 17, 2021, 9:48pm

Thanks Vladimir,

Yes, we are currently downloading the entire blob to an in-memory stream and then passing that to
GroupDocs as a workaround, this works fine.

I am concerned about the consequences of this, particularly around large documents - obviously this won’t scale well as concurrency increases. Ideally we could make GroupDocs work well with an Azure Blob stream.

When you say you seek through the stream, do you mean you seek backwards through the stream, or seek to arbitrary points in the stream?

vladimir.litvinchik · October 18, 2021, 5:56pm

@AdrianHofman

I’m sorry for the delayed response and thank you for the details.

When processing a document all the bytes should be loaded into memory to make it possible to render it completely (all the pages) or partially (some pages.) Can you please describe the case when it won’t scale well and we’ll try to find a solution?

We’ll check and update you here.

vladimir.litvinchik · October 25, 2021, 4:19pm

@AdrianHofman

During the investigation, we have found that when processing PDF files we do perform a lot of seeking/reading operations. As a confirmation of this, you can try running the sample_app.zip (305.3 KB) that will output all operations performed on a stream object.

While we continue to investigate this issue please use the proposed workaround for now - load the file into memory instead of opening a stream with CloudBlob.OpenReadAsync