PDF to Docx Conversion - paragraph splitting issue

Hey !

I am trying to convert PDF to Docx using Group Docs converter, conversion appears good but each line is converted into pargraph (refer the posted pics), is there a way to group the run objects in a particular frame (rectangle) ? I am attaching the necessary screenshots and pdf document for your referral, please guide on this

Thank You !
GroupDocs Output:
each_line_paragraph.png (78.3 KB)

Ideal Output (Manual):
ideal_para_split.png (81.0 KB)

PFTF_201.pdf (492.7 KB)


Please share following details and we’ll investigate this issue:

  • API version (e.g. 20.2, 20.7) and variant (Java or .NET) that you are evaluating
  • Sample conversion code
package temp.testing;

import java.math.RoundingMode;
import java.text.DecimalFormat;
import java.util.HashMap;
import java.util.Map;

import org.json.JSONArray;
import org.json.JSONObject;

import com.groupdocs.parser.Parser;
import com.groupdocs.parser.data.PageTextArea;
import com.groupdocs.parser.data.Rectangle;

public class TestPosition {
	public static void main(String args[]) {
    	DecimalFormat df = new DecimalFormat("#.####");
        try (Parser parser = new Parser(args[0])) {
            // Extract text areas
            Iterable<PageTextArea> areas = parser.getTextAreas();
            // Check if text areas extraction is supported
            JSONArray map = new JSONArray();
            if (areas == null) {
            	map.put("Error in AReas");
            // Iterate over page text areas
            for (PageTextArea a : areas) {
                // Print a page index, rectangle and text area value:
            	JSONObject details = new JSONObject();
            	details.put("pos_rect_x", a.getRectangle().getPosition().getX());
            	details.put("pos_rect_y", a.getRectangle().getPosition().getY());
            	details.put("x_left_edge", a.getRectangle().getLeft());
            	details.put("x_right_edge", a.getRectangle().getRight());
            	details.put("y_top_edge", a.getRectangle().getTop());
            	details.put("y_bot_edge", a.getRectangle().getBottom());
            	Map<String,JSONObject> newmap = new HashMap<String, JSONObject>();
            	newmap.put(a.getText(), details);

for(int i = 0; i < map.length(); i++)
      JSONObject temp=map.getJSONObject(i);
      //Iterate through the elements of the array i.
      //Get thier value.
      //Get the value for the first element and the value for the last element.

This above code extract the textareas not the frames

My requirement is to extract Frames (Style) and group paragraphs in each frame (If they have same font size/text style)

I have attached the necessary photos, I have also attached ideal case scenario pic too(above). Please guide Thank you
Screenshot from 2020-08-27 00-16-30.png (76.5 KB)

Thank You


Thank you for the details. We are investigating this scenario at our end with ID CONVERSIONJAVA-1074. You’ll be notified as there’s any update.

1 Like

The issues you have found earlier (filed as CONVERSIONJAVA-1074) have been fixed in this update. This message was posted using Bugs notification tool by Atir_Tahir