Does GroupDocs.Search support arabic?

Does GroupDocs.Search support arabic?

I want to search for arabic text.

Is there any help please ?

about the supported language in GroupDocs.Search

@meldeeb
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): SEARCHNET-3131

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@meldeeb

You first need to enable Arabic indexing.

var index = new Index(@"../../../../Index");
// Enable indexing of Arabic letters
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x0610, 11).ToArray(), CharacterType.Letter);
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x0620, 74).ToArray(), CharacterType.Letter);
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x066E, 102).ToArray(), CharacterType.Letter);
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06D5, 8).ToArray(), CharacterType.Letter);
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06DF, 10).ToArray(), CharacterType.Letter);
index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06EA, 19).ToArray(), CharacterType.Letter);

// Indexing
index.Add(@"../../../../arabic-ar.docx");

// Search
var result = index.Search("شجرة");
Console.WriteLine("Occurrences: " + result.OccurrenceCount);

// Highlighting occurrences in the text
if (result.DocumentCount > 0)
{
    var outputAdapter = new FileOutputAdapter(OutputFormat.Html, @"../../../../Highlight.html");
    var highlighter = new DocumentHighlighter(outputAdapter);
    index.Highlight(result.GetFoundDocument(0), highlighter);
}

Moreover, take a look at the following documentation articles:

@meldeeb

We have published the new release 24.4. Where Arabic has been enabled by default for indexing.

Thank you for the support. The Arabic search works by this way.

But when i search for part of word, it didn’t return results, for example if the document contains “welcome” and i search for “welcom”. it didn’t return a result.

The arabic support for indexing was added to GroupDocs.search.

Will it be added to the nugget package GroupDocs.Total ?

@meldeeb

We are investigating this scenario. Your investigation ticket ID is TOTALNET-68.

Could you please share the sample code and the source file?

        //Set license
        var licenseFilePath = Path.Combine(_hostingEnvironment.WebRootPath, "GroupDocs.Total.NET.lic");
        License lic = new License();
        lic.SetLicense(licenseFilePath);

        string indexFolder = @"c:\MyIndex";

        // Creating an index
        Index index = new Index(indexFolder);
        // Enable indexing of Arabic letters
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x0610, 11).ToArray(), CharacterType.Letter);
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x0620, 74).ToArray(), CharacterType.Letter);
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x066E, 102).ToArray(), CharacterType.Letter);
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06D5, 8).ToArray(), CharacterType.Letter);
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06DF, 10).ToArray(), CharacterType.Letter);
        index.Dictionaries.Alphabet.SetRange(Enumerable.Range(0x06EA, 19).ToArray(), CharacterType.Letter);

        // Subscribing to the FileIndexing event for adding attributes
        index.Events.FileIndexing += (sender, args) =>
        {
            if (args.DocumentKey == "documentKey")
            {
                // Adding two attributes
                args.Attributes = new string[] { "attribute1", "attribute2" };
            }
        };

        // Creating a document object
        Stream stream = new MemoryStream(fileData);
        Document document = Document.CreateFromStream("documentKey", DateTime.Now, ".pdf", stream);
        Document[] documents = new Document[]
        {
            document,
        };

        // Indexing document from the stream
        IndexingOptions options = new IndexingOptions();
        index.Add(documents, options);

        // Closing the document stream after indexing is complete
        stream.Close();

        // Searching in the index
        SearchOptions searchOptions = new SearchOptions();
        // Creating a document filter by attribute
        ISearchDocumentFilter filter1 = SearchDocumentFilter.CreateAttribute("attribute1");
        ISearchDocumentFilter filter2 = SearchDocumentFilter.CreateAttribute("attribute2");
        ISearchDocumentFilter orFilter = SearchDocumentFilter.CreateOr(filter1, filter2);
        searchOptions.SearchDocumentFilter = orFilter;
        string query = "Welcom";
        SearchResult result = index.Search(query, searchOptions);

        // Printing the result
        Console.WriteLine("Documents: " + result.DocumentCount);
        Console.WriteLine("Total occurrences: " + result.OccurrenceCount);

Test.pdf (17.3 KB)

1 Like

@meldeeb

The thing is that text indexing is done word by word. And the search accordingly is also performed for whole words. Thus, to find all words starting with “wel” (welcome, welcom) you need to use the following query: “wel?(0~255)”. This is wildcard search. Moreover, you can use fuzzy search.

Yes, the latest changes for GroupDocs.Search 24.4 will be added into GroupDocs.Total 24.4 release.

The issues you have found earlier (filed as TOTALNET-68) have been fixed in this update. This message was posted using Bugs notification tool by yevgen-nykytenko