Hi,
I am having a problem with terms not being found in pdf files. One file ‘golfcroquetrules.pdf’ I can’t get it to appear in any search results. Another file MyWords.pdf only works for some words and not others.
I have reviewed the contents of Text cannot be searched from a PDF in C# - #9 by bharathiGK and it doesn’t resolve my issue.
Could you take a look at this and let me know why I am not getting the expected results? I am about to purchase a license but need to know that all terms will be found, or at least why they wouldn’t be.
My application is a .net 5 console app. Here is the code.
PdfDemo.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net5.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="GroupDocs.Search" Version="21.8.1" />
</ItemGroup>
</Project>
Program.cs
using GroupDocs.Search;
using GroupDocs.Search.Results;
using System;
//words tested
//'Boundary' expected result: golfcroquetrules.pdf, actual result: nothing found
//'dummy' expected result: test.pdf actual: same as expected
//'Equipment' expected result: MyWordfile.docx and MyWords.pdf, actual: same as expected
//'works' expected result: MyWordfile.docx and MyWords.pdf actual: only found in MyWordfile.docx
namespace PdfDemo
{
internal class Program
{
static void Main(string[] args)
{
var filesLocation = @"C:\Temp\GroupDocs\MyFiles";
var indexLocation = @"C:\Temp\GroupDocs\MyIndex";
IndexSettings settings = new IndexSettings();
settings.UseRawTextExtraction = false;
GroupDocs.Search.Index index = new GroupDocs.Search.Index(indexLocation, settings);
index.Add(filesLocation);
//var query = "Boundary";
//var query = "dummy";
//var query = "Equipment";
var query = "works";
SearchResult result = index.Search(query);
foreach (FoundDocument document in result)
{
Console.WriteLine(document.DocumentInfo.FilePath);
Console.WriteLine($"{document.OccurrenceCount} occurences");
}
}
}
}
test.pdf (13.0 KB)
MyWords.pdf (189.6 KB)
MyWordfile.docx (11.7 KB)
golfcroquetrules.pdf (493.1 KB)