Document search, parse and editing APIs in Java

I want to get the directory of .doc or .docx documents, which API should I use, I have searched for a long time and couldn’t find it.Thanks very much !

1 Like

@YvanZhang,

Please give a try to GroupDocs.Search API. It is basically a full-text search API. However, when you search for a keyword, it will also return exact file path.
Please specify your development environment (.NET or Java) details. We’ll then guide you accordingly.

@atirtahir3, Thank you for your answer,My development environment is Java ,I want to splitting documents by table of contents,so I should get the table of contents of the document first, and then split the document based on the number of pages in the table of contents. Do you have any good suggestions for this.

@YvanZhang,

What we got from the details you have provided is, you want to:

  1. Parse the doc/docx document and extract the table of content from it.
  2. Parse table of content to get the page range for each section. For example, Introduction … Page 1 and Conclusion … Page 4.
  3. Split the document based on each section in table of content. For example, create one document based on the section “Introduction” (page 1 to 3) and the second document for section “Conclusion” (page 4).

In such a case, you can use GroupDocs.Parser for Java to extract the table of content and then use GroupDocs.Merger for Java to split the document based on the page numbers. Please confirm if we have correctly identified your requirements. In case of any addition or modification, please let us know.

1 Like

@atirtahir3 You are great, you understand my requirements accurately,let me try it ,Thank you very much.

1 Like

@YvanZhang,

You are welcome.

@atirtahir3 I got a problems
public static final String SampleDocx = GetFilePath(“sample.docx”);

public static void run() {

    // Create an instance of Parser class
    try (Parser parser = new Parser(Constants.SampleDocx)) {
        // Check if text extraction is supported
        if (!parser.getFeatures().isText()) {
            System.out.println("isText()-----Text extraction isn't supported.");
            return;
        }
        // Check if toc extraction is supported
        if (!parser.getFeatures().isToc()) {
            System.out.println("isToc()-----Toc extraction isn't supported.");
            return;
        }
        // Get table of contents
        Iterable<TocItem> toc = parser.getToc();
        // Iterate over items
        for (TocItem i : toc) {
            // Print the Toc text
            System.out.println(i.getText());
            // Check if page index has a value
            if (i.getPageIndex() == null) {
                continue;
            }
            // Extract a page text
            try (TextReader reader = parser.getText(i.getPageIndex())) {
                System.out.println(reader.readToEnd());
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

}

Result :
Open RunExamples.java.
In Main() method uncomment the example that you want to run.
License set successfully.

isToc()-----Toc extraction isn’t supported.

All done.

I use GroupDocs.Parser-for-Java-master\Examples\Resources\SampleFiles\sample.docx .But it is not supported. What kind of .doc file can extract the directory, can you send me a template? Thanks very much!!!

@YvanZhang,

We are sorry for the inconvenience. ToC extraction is not supported for the Word documents at the moment. However, we are investigating this at our end. Your investigation ticket ID is PARSERJAVA-97. As there is any update, you’ll be notified.

@atirtahir3 That’s a pity.GetToc method is only supported .Epub document directory extraction?any other supported formats?

@YvanZhang,

We’re currently investigating this.
Meanwhile, can you please share a sample Word document and the expected split Word documents showing the desired outputs? You can use MS Word to create Split documents manually just for our reference.

@atirtahir3 Thanks very much sri,I’m working on it too

@YvanZhang,

You are welcome.

@atirtahir3 Hello,sir,I want to use groupdocs products as back-end services,and use angular as front-end display operation,I find in your examples use @groupdocs.examples.angular/editor ,Can I use it? Or Do you have any good suggestions for this ,I will be very grateful.

1 Like

@YvanZhang,

As you are interested in different GroupDocs APIs (e.g Editor, Parser, Search) and evaluating them. We’d recommend you to always create a new topic/thread for each issue in corresponding forum category.

Please note that all GroupDocs APIs are back-end, UI-Independent. Hence, you can integrate them in any Java or .NET application(s) without any third party tool/software dependency.

We do have open-source UI projects for GroupDocs.Editor for Java. Please have a look at our Spring application and video demo here.

Ok sir, I will follow your rules to use it.Thanks again!

1 Like

@YvanZhang,

You are welcome.