Pdf search not working at all in some files and inconsistently in others

Hi,

I am having a problem with terms not being found in pdf files. One file ‘golfcroquetrules.pdf’ I can’t get it to appear in any search results. Another file MyWords.pdf only works for some words and not others.

I have reviewed the contents of Text cannot be searched from a PDF in C# - #9 by bharathiGK and it doesn’t resolve my issue.

Could you take a look at this and let me know why I am not getting the expected results? I am about to purchase a license but need to know that all terms will be found, or at least why they wouldn’t be.

My application is a .net 5 console app. Here is the code.

PdfDemo.csproj


<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="GroupDocs.Search" Version="21.8.1" />
  </ItemGroup>

</Project>

Program.cs


using GroupDocs.Search;
using GroupDocs.Search.Results;
using System;

//words tested
//'Boundary' expected result: golfcroquetrules.pdf, actual result: nothing found
//'dummy' expected result: test.pdf actual: same as expected
//'Equipment' expected result: MyWordfile.docx and MyWords.pdf, actual: same as expected
//'works' expected result: MyWordfile.docx and MyWords.pdf actual: only found in MyWordfile.docx

namespace PdfDemo
{

    internal class Program
    {

        static void Main(string[] args)
        {

            var filesLocation = @"C:\Temp\GroupDocs\MyFiles";
            var indexLocation = @"C:\Temp\GroupDocs\MyIndex";
            IndexSettings settings = new IndexSettings();
            settings.UseRawTextExtraction = false;
            GroupDocs.Search.Index index = new GroupDocs.Search.Index(indexLocation, settings);

            index.Add(filesLocation);        

            //var query = "Boundary";
            //var query = "dummy";
            //var query = "Equipment";
            var query = "works";

            SearchResult result = index.Search(query);          

            foreach (FoundDocument document in result)
            {
                Console.WriteLine(document.DocumentInfo.FilePath);
                Console.WriteLine($"{document.OccurrenceCount} occurences");    
            }

        }
    }

}

test.pdf (13.0 KB)
MyWords.pdf (189.6 KB)
MyWordfile.docx (11.7 KB)
golfcroquetrules.pdf (493.1 KB)

1 Like

@dbfeatdb

We cannot reproduce this issue at our end using API version 21.8.1, sample code that you shared and .NET target framework .NET 5.0. Please take a look at this screenshot.png (12.8 KB). It gave 29 occurrences for Boundary keyword.
Could you please share the sample application using that issue could be reproduced?

Thanks for the quick reply.

I have zipped up the sample application and uploaded it. Note I have deleted all the files that are not essential such as the dlls but if you want a copy with all that in let me know. I could also send the index files output if that would help?

PdfDemo.zip (1.7 KB)

1 Like

@dbfeatdb

The issue is, you are evaluating the API in trial mode (without any license). We simply applied the license in your application and it started showing the results. Please request a temporary license here in purchase wizard.

Yes you are completely right. Thank you!

@dbfeatdb

You are welcome.