HTML Compare is not working in .Net

Hi,

We are facing issue with HTML files Comparison

Used the below code to compare

  public string CompareDcumentsFromFileToOutputFileWithSettings(string SourceFileName, string TargetFileName)
    {
        GroupDocs.Comparison.Common.License.License license = new GroupDocs.Comparison.Common.License.License();
        license.SetLicense("GroupDocs.Comparison.lic");

        GroupDocs.Comparison.Common.ComparisonSettings.ComparisonSettings objComparisonSettings = new GroupDocs.Comparison.Common.ComparisonSettings.ComparisonSettings();
        objComparisonSettings.GenerateSummaryPage = false;
        objComparisonSettings.DeletedItemsStyle.StrikeThrough = true;
        objComparisonSettings.DeletedItemsStyle.FontColor = System.Drawing.Color.Red;

        objComparisonSettings.InsertedItemsStyle.FontColor = System.Drawing.Color.Green;

        GroupDocs.Comparison.Comparer comparison = new GroupDocs.Comparison.Comparer();


        GroupDocs.Comparison.Common.ICompareResult HTMLresult = comparison.Compare(SourceFileName, TargetFileName, objComparisonSettings);

        HTMLresult.SaveDocument("D:/1_Result.html");
    }

Source, Target, Result Files Comparision.zip (16.9 KB)

Extracted the Above input files using aspose.word(Word to HTML), then Compare is not working.

@kranthireddyr,

I cannot find any attachment.
Please mention API version.

Please provide the solution

@kranthireddyr,

We are investigating this at our end. Your investigation ticket ID is COMPARISONNET-1908. As we have any further update, we’ll notify you.

Please mention API version.

Latest version i.e 19.3.1

Please provide the solution

Reproducesable with above code

@kranthireddyr,

Thanks for the details.
Please see this screenshot (2_Source.html and 2_Target.html) - changes.JPG (126.9 KB).
If you see HTML/CSS of source and target files, you will notice that there is a clear change/difference in some tags.
For example, in source.html, Moisturizing is written like this <span style="font-family:Calibri">Moisturizing Cream with Sunscreen</span>
In target.html, it is like this <span style="font-family:Calibri">Mois</span><span style="font-family:Calibri">turizing Cream with S</span>.
Hence, API detects such a change and shows output/results accordingly.
Can you please tell us, what output you are expecting?

@atirtahir3

HTML converted by Aspose word from .doc file. both are same files with different versions

Source HTML : Moisturizing Cream with Sunscreen
Target HTML : Moisturizing Cream with Sunscreen Broad Spectrum SPF 30 SUNSCREEN
Result HTML Should be "Moisturizing Cream with Sunscreen Broad Spectrum SPF 30 SUNSCREEN "

check the source and Target in HTML Viewer

@kranthireddyr,

You cannot see difference in any viewer or browser. Difference is in the code and I shared that with you. In target.html, Moisturizing word is not in a single tag. Instead, we have Mois in a different span and turizing in a different span.
However, in source.html, Moisturizing word is in a single span. API is detecting such difference and showing it in output.

Aspose.Word is Splitting into different tags and Groupdocs.Compare is not identifying this.
What to do ? to Compare this kind HTML files ?

@kranthireddyr,

Target.html generated/processed from Aspose.Words has Moisturizing word in two spans and Source.html file has this word in a single span. This is the change GroupDocs.Comparison detects.
The difference that you see in the result file from one span is the change that occurred due to the change of node positions. As you can see in the screenshot - changes.JPG (126.9 KB), the position nodes of which are shown in green. API is comparing these documents correctly as per their content.

@kranthireddyr,

We will further investigate it. Can you please share the original Word files?