Convert PDF to HTML programmatically in Java

Hello team,

I just started the conversion service on the jetty server and was able to get it running. When opening it in the browser, I wanted to convert the sample file you provided in your examples package (called “intelligen systems.pdf”). I was getting this exception when selecting HTML as output and clicking on the convert button:

Exception: Font Arial was not found
class com.groupdocs.conversion.internal.c.a.pd.exceptions.FontNotFoundException: Font Arial was not found
com.groupdocs.conversion.internal.c.a.pd.FontRepository.findFont(Unknown Source)
com.groupdocs.conversion.c.e.d(Unknown Source)
com.groupdocs.conversion.c.e.i(Unknown Source)
com.groupdocs.conversion.c.e$1$2.m(Unknown Source)
com.groupdocs.conversion.c.e.ax(Unknown Source)
com.groupdocs.conversion.c.d.ax(Unknown Source)
com.groupdocs.conversion.c.v.ax(Unknown Source)
com.groupdocs.conversion.c.d.ax(Unknown Source)
com.groupdocs.conversion.c.i.ax(Unknown Source)
com.groupdocs.conversion.c.d.ax(Unknown Source)
com.groupdocs.conversion.c.i.ax(Unknown Source)
com.groupdocs.conversion.c.d.ax(Unknown Source)
com.groupdocs.conversion.c.c.ax(Unknown Source)
com.groupdocs.conversion.converter.a.a(Unknown Source)
com.groupdocs.conversion.converter.b.e.a(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.au(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
com.groupdocs.ui.Conversion.service(Conversion.java:126)
javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
org.eclipse.jetty.server.Server.handle(Server.java:499)
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
java.lang.Thread.run(Thread.java:745)
at com.groupdocs.conversion.internal.c.a.pd.FontRepository.findFont(Unknown Source)
at com.groupdocs.conversion.c.e.d(Unknown Source)
at com.groupdocs.conversion.c.e.i(Unknown Source)
at com.groupdocs.conversion.c.e$1$2.m(Unknown Source)
at com.groupdocs.conversion.c.e.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.v.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.i.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.i.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.c.ax(Unknown Source)
at com.groupdocs.conversion.converter.a.a(Unknown Source)
at com.groupdocs.conversion.converter.b.e.a(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.au(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.ui.Conversion.service(Conversion.java:126)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

I’m running the server on a Windows 10 virtual machine and have jdk 8 installed.

Then I tried converting the same pdf to words, got this error:
Exception: null
java.lang.NullPointerException
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z8.m3(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z8.dtc(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z8.dtb(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z8.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z10.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z10.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p319.z10.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p133.z16.(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p132.z12.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p133.z1.m41(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p133.z1.(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p132.z12.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p42.z5.(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.p81.z2.m1(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ApsUsingConverter.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ApsUsingConverter.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ApsUsingConverter.b(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.fn.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ADocument.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ADocument.save(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.save(Unknown Source)
at com.groupdocs.conversion.domain.b.k.h.a(Unknown Source)
at com.groupdocs.conversion.domain.a.k.a(Unknown Source)
at com.groupdocs.conversion.c.d.ay(Unknown Source)
at com.groupdocs.conversion.c.t.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.i.ax(Unknown Source)
at com.groupdocs.conversion.c.d.ax(Unknown Source)
at com.groupdocs.conversion.c.c.ax(Unknown Source)
at com.groupdocs.conversion.converter.a.a(Unknown Source)
at com.groupdocs.conversion.converter.f.n.a(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.au(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.ui.Conversion.service(Conversion.java:126)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Conversion to image succeeded partially, means that I didn’t get any exception but the result was three green images, see the result attached.groupdocs-conversion-result-3774516239621393745.zip (2.1 KB)
Do you have any idea what is going wrong on my site?
Thanks already in advance.
Best regards,
Nicole

@nseidel,

Please specify API version that you integrated in the project (17.12, 18.7 etc).

It’s the one you can currently download from GitHub, I assume it’s 17.12 because this is what it says on GitHub.
Cheers,
Nicole

@nseidel,

That is a old front-end. We discontinued its support.
We will publish new and modified UI (with latest version of the API). But we cannot share ETA at the moment. Meanwhile, you can evaluate API features in our console application.
We’d recommend you to clone/downlaod our updated console application and share your feedback.

Hi @atirtahir3,
thank you for your quick reply.
I downloaded your examples project and started it (I’m using eclipse mars). In MainClass, I commented out line 35 where the method convertToHtmlAsFilePath is called. Now I’m getting following exception:
Exception in thread “main” class com.groupdocs.foundation.exception.a: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 —> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
— End of inner exception stack trace —
com.groupdocs.conversion.domain.savers.pdf.d.a(Unknown Source)
com.groupdocs.conversion.domain.savers.pdf.d.c(Unknown Source)
com.groupdocs.conversion.domain.documents.l.a(Unknown Source)
com.groupdocs.conversion.operations.f.aK(Unknown Source)
com.groupdocs.conversion.operations.B.aJ(Unknown Source)
com.groupdocs.conversion.converter.html.e.a(Unknown Source)
com.groupdocs.conversion.converter.html.e.a(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.aI(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
com.groupdocs.conversion.examples.Conversion.convertToHtmlAsFilePath(Conversion.java:198)
com.groupdocs.conversion.examples.MainClass.main(MainClass.java:35)
at com.groupdocs.conversion.domain.savers.pdf.d.a(Unknown Source)
at com.groupdocs.conversion.domain.savers.pdf.d.c(Unknown Source)
at com.groupdocs.conversion.domain.documents.l.a(Unknown Source)
at com.groupdocs.conversion.operations.f.aK(Unknown Source)
at com.groupdocs.conversion.operations.B.aJ(Unknown Source)
at com.groupdocs.conversion.converter.html.e.a(Unknown Source)
at com.groupdocs.conversion.converter.html.e.a(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.aI(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.conversion.handler.ConversionHandler.convert(Unknown Source)
at com.groupdocs.conversion.examples.Conversion.convertToHtmlAsFilePath(Conversion.java:198)
at com.groupdocs.conversion.examples.MainClass.main(MainClass.java:35)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at com.groupdocs.conversion.internal.c.a.pd.internal.ms.System.I12l.iSV(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.ms.System.I12l.lI(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.l2if(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.ll(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.getLocalFontPaths(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.l137.I2l.l0IF(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.l137.I2l.l0l(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.l137.I2l.l0if(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.internal.l137.I2l.ll(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.I17I.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ADocument.a(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.ADocument.save(Unknown Source)
at com.groupdocs.conversion.internal.c.a.pd.Document.save(Unknown Source)
… 12 more
I started it in debug mode as Java application.
Cheers,
Nicole

@nseidel,

Did you use “intelligent systems.pdf” for conversion process? We are not able to reproduce this issue at our end. Please see this screenshot, we can successfully convert this sample PDF to HTML - document conversion.JPG (170.1 KB).
And here is the output file - intelligent systems.pdf.zip (225.7 KB).
In order to further investigate this issue, we need following details:

  • OS and its version
  • Java version

Thank you for sending me the converted file. As stated above, I am working with Windows 10 v1607 on a virtual machine. The jdk installed is 1.8.0_201.
Am I also able to convert .doc(x) files to HTML directly using your product?
Cheers,
Nicole

@nseidel,

Yes, you can convert a Word document to HTML. Let us know if you face any issue.

We are further investigating this, your investigation ticket ID is CONVERSIONJAVA-597. As we have any further update, we’ll notify you.

@nseidel,

Can you reproduce this issue using API version 19.10?