I am using GroupDocs.Conversion(23.2.0) for one of my requirement to convert PDF to CSV file. My PDF document is a one pager file that contains some plain text and few tables, but the conversion from PDF to CSV is taking around 5 to 6 seconds that is too long. Can you please look into it and let me know is it possible to optimize this conversion time and what needs to be done for that.
Please find the attached files for the source PDF file and generated CSV file and find the sample conversion code as follows –
Stopwatch watch1 = new Stopwatch();
watch1.Start();
// Load PDF file
var converter = new GroupDocs.Conversion.Converter(“PDFFilePath”);
// Set conversion parameters for CSV format
var convertOptions = converter.GetPossibleConversions()[“csv”].ConvertOptions;
// Convert to CSV format
converter.Convert(“CSVFilePath”, convertOptions);
watch1.Stop();
Console.WriteLine(“Time to convert CSV ----” + watch1.ElapsedMilliseconds.ToString());
· We are using GroupDocs.Conversion API version – 23.2.0
· For now we are using the trial version of the API for evaluation purpose to check is it fulfilling our requirement.
Following issue that I want to add here -
If you check the generated CSV the values for “Position” column in “ExecutiveWorkHistory” table is divided into multiple rows i.e – “Senior Vice President, Operations”.
Please compress your CSV to a ZIP format and reupload. Secondly, this text “Senior Vice President, Operations” is on two lines in the source/PDF file. Therefore, in the CSV, it’s also divided in two lines. As you ca see the screenshot shared above.
Thanks for your response I’ll check the PDF for the same. PFA for generated csv file and can you please check the issue related to the time that it takes to convert the PDF to CSV.
@manika
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): CONVERSIONNET-5950
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Thanks for your response, will update the API Version to latest one and will check for the temporary license as well.
I have one more query for the PDF to CSV conversion, Can you please let me know is it possible to save Converted CSV file directly to a stream object instead of saving it physically somewhere on local machine? If yes, them Please let me know how can I do it.
Thanks for your response. As explained in the given article “Save file to stream”, I have tried this approach as well but We need to pass the physical path for converted CSV on our machine and csv file got saved to that location, then have to read that as stream as shown in the following screenshot –
correct me if I am wrong, highlighted text is the physical location of the converted file. I don’t want to save the converted file into the physical file as I can not do that. Please let me know is it possible to convert the PDF directly to CSV stream without referring to any physical location. I am using the stream for Input file(PDF) as well.
MemoryStream outputStream = new MemoryStream();
using (var converter = new GroupDocs.Conversion.Converter("source file")
{
var options = new SpreadsheetConvertOptions();
converter.Convert(() => new MemoryStream(), (convertedStream, sourceFileName) => convertedStream.CopyTo(outputStream), options);
}
Thank you so much for the response, given solution worked for me. I have one query regarding the conversion of PDF to CSV file. I was trying to convert a multipage pdf file to CSV using the same code that I have shared with you in my earlier mail trail and API’s trial version (23.3.1), but it is only converting the first page of the PDF to CSV.
I have tried one of the option given in the link but the mentioned methods are not available for the “ConvertOptions” class object (SpreadsheetConvertOptions) that I have created.
Can you please let me know what needs to be done to convert the complete PDF file to CSV.
PFA for the sample multipage PDF file that I am using to convert to CSV. You can check in generated CSV after converting the data of first page of sample PDF it is again displaying the data of first page rather than displaying the data of second page. Please let me know if anything else is required from my side.
Since, you are evaluating the API in trial mode (without any license). There are free trial limitations. However, the good thing is, you can request a temporary license. Here are steps to avail the temporary license.
You’ll then get an output.zip (9.7 KB) like this.
Thank you for your quick response. I’ll check for the temporary license to test the complete functionality.
Can you please let me know Is it possible to get the data of PDF file that I have shared in one sheet? As I have checked the output file shared by you, data of second page is coming on the separate sheet.
Actually we have one requirement where source PDF can contain data for multiple candidates in that case we want to split the data across separate sheets i.e. data related to candidate A on one sheet and data for candidate B on another sheet. PDF file that I have shared with you contains the data for single candidate that’s why we want to get the data for that one in single sheet. Can you please let me know is it possible to achieve this functionality with Conversion API?
Is there any update on the Issue that I have raised for the time that Conversion API is taking to convert PDF to CSV format?
We are investigating this scenario. Your investigation ticket ID is CONVERSIONNET-5960. However, the other ticket is still under investigation.
We’ll notify you in case of any update.
As you explained I was checking the source PDF for the text in the “Position” column in ExecutiveWorkHistory table. As you said this text is on 2 lines that’s why it is divided into the 2 lines in CSV and that I understand.I just want to add here that, As you can see in the screenshot that you have shared the text “Senior Vice President, Operations” comes in multiple lines but in a single cell and single row in the generated CSV file.
But that’s not the case for rest of the values in the same columns like there are more values in the “Position” column like "Senior Manager, Digital
Operations" and “Senior Manager, Product Development” but for these values data gets divided into separate rows and on separate cells in the generated CSV. Same is the case for Company column. You can refer to the attached screenshot for more clarification I have highlighted the mentioned cells.
Can you please look into this and let me know how I can resolve this.