How to convert PDF to TXT file using C#

Hi,

I am trying to convert pdf to txt using latest version of Groupdoc.Conversion. I have used following code

FileStream fileStream = new FileStream(Path.Combine(directoryPath, fileName), FileMode.Open, FileAccess.ReadWrite);

var convertedDocumentStream = conversionHandler.Convert(fileStream, saveOptions);



I am trying to do it using stream but getting some junk characters. I only want the content part (not image ) and want to save that content into a txt file.

Please suggest me how to do it in a write way.

Waiting for your response.

Hello,


Thank you for giving a try to GroupDocs.Conversion API for .NET.

GroupDocs.Conversion API supports complete file conversion with all contents however there are advance options available. please click here to see the complete supported conversion formats as well.

As per your requirement you can try this to get only text from PDF files.
  1. Convert PDF to DOCX
  2. Convert DOCX to DOCX but with save options property "ConvertFileType As TXT"
  3. Final Result files will be TXT files containing only text contents excluding all formatting and images.

var convertedDocumentPath = conversionHandler.Convert(InputSourceFile, new WordsSaveOptions { OutputType = OutputType.String, ConvertFileType = WordsSaveOptions.WordsFileType.Txt });

If you will need any help or you will have any other questions please feel free to ask.

Warm Regards,

Thanks for your response. This is a good solution and working for me

So by default it is keeping converted document under Converted folder… Do we have any setting for custom path or not?

Hi,

We are glad to hear that our provided solution worked for you. Regarding your question about custom path, Yes you can set custom path. please click here for solution and more details.

If you will need any help or you will have any other questions please feel free to ask.


Warm Regards,