Hi,
We want to try your Viewer Java Lib, so I downloaded the latest version und the GitHub examples today.
I tried to use one of our existing docx files and I get encoding problems. üä are not displayed properly in the resulting html. I also reproduced the error when I modify (add ü) the word.docx file shipped within the GitHub examples.
I tried to set
options.getWordsOptions().setEncoding(Charset.defaultCharset());
Either to default charset (Windows-1252) or UTF-8, no effect at all.
Although if it would work it still would be a problem for our use case. In our application customers can upload their documents and we might not know the encoding in which the document was created.
Which leads me to the real question why do I have to specify an encoding on an docx or xlsx file? I use Apache POI in another project and do not have to care about the encoding at all.
I attached my test word file for you to reproduce the issue.
The code i am using is mostly copied from the samples.
public static void renderTestWord() {
try {
// Setup GroupDocs.Viewer config
ViewerConfig config = Utilities.getConfiguration();
// Create html handler
ViewerHtmlHandler htmlHandler = new ViewerHtmlHandler(config);
String guid = FILE_TEST_WORD;
HtmlOptions htmlOptions = new HtmlOptions();
htmlOptions.setResourcesEmbedded(true);
// htmlOptions.setPageNumbersToConvert(Arrays.asList(2, 3));
// Charset charset = Charset.forName("UTF-8");
// options.getWordsOptions().setEncoding(charset);
// options.getCellsOptions().setEncoding(charset);
// options.getEmailOptions().setEncoding(charset);
// Perform page reorder
// ReorderPageOptions reorderPageOptions = new ReorderPageOptions(
// guid, 2, 1);
// htmlHandler.reorderPage(reorderPageOptions);
List pages = htmlHandler.getPages(guid, htmlOptions);
for (PageHtml page : pages) {
Utilities.saveAsHtml(datePart() + "_" + page.getPageNumber()
+ "_testdocx", page.getHtmlContent());
}
} catch (Exception exp) {
System.out.println("Exception: " + exp.getMessage());
exp.printStackTrace();
}
}
Regards,
Martin