a DotNetStyling - .Net World by Armen Ayvazyan : Optimal String Comparison
Welcome to DotNetStyling Sign in | Join | Help
Add to Technorati Favorites Add to Google Reader or Homepage Add to My AOL Subscribe in FeedLounge Subscribe in Bloglines Add to Excite MIX Add to flurry Add to Pageflakes Subscribe in NewsGator Online


Optimal String Comparison
Technorati tags: , , , , , , ,

.Net Framework gives us several tools how to compare two strings. We can use String.Equals(), String.Compare(), StringCompareTo(), etc. What we are going to discover now it is what is the performance impact of each of this methods.

 Let's first take a look into implementation of some of them.

String.Equals()

Static String.Equals(string, string) method is one of the fastest ways how to compare two strings because String.Equals(string, string) ignores CultureInfo information. It is the best way how to compare two ordinal strings. Basically in most of cases we are comparing ordinal text used for internal purposes of application and it is not related to any specific culture.

Implementation of String.Equals(string, string) includes several steps which increase comparison performance.

public static bool Equals(string a, string b)

{

    if (a == b)

    {

        return true;

    }

    if ((a != null) && (b != null))

    {

        return EqualsHelper(a, b);

    }

    return false;

}

 

 

private static unsafe bool EqualsHelper(string strA, string strB)

{

    int length = strA.Length;

    if (length != strB.Length)

    {

        return false;

    }

 ...

 

  1. First, method compares whether both strings are referencing to the same object. If they do, method will immediately return true value without checking context.
  2. Then method checks whether stings have some value. Both input parameters should contain any value.
  3. After method compares length of both strings. It is obvious that if strings contain different number of characters, they cannot be equal.
  4. And then if strings passed first three steps, method starts to compare context char by char.

Although String.Equals(String, String) is good approach for strings comparison, it is recommended to use another overload of Equals method - String.Equals(String, String, StringComparation). Except of string parameters, method accepts StringComparation enumeration which specifies comparison type. 

public enum StringComparison
{
    CurrentCulture,
    CurrentCultureIgnoreCase,
    InvariantCulture,
    InvariantCultureIgnoreCase,
    Ordinal,
    OrdinalIgnoreCase
}

Method becomes much more readable. Signature says itself which type of comparison has been used.

'==' or '!=' Operators

'==' and '!=' operators have the same impact like String.Equals does. Internally they call Equals(string, string) method. Using operators is also good and fast approach but you have to keep in mind that it compares ordinal (not culturally aware) strings.

op_Equality
public
static bool operator ==(string a, string b)
{

    return Equals(a, b);

}

 

 

op_Inequality

public static bool operator !=(string a, string b)

{

    return !Equals(a, b);

}

 

 

 

String.Compare()

Static method String.Compare(string, string) compares strings by using culture information which makes it slower than String.Equals. Cultural aware comparison immediately checks strings symbol by symbol as even two stings with different lengths could be equal.

public static int Compare(string strA, string strB)

{

    return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.None);

}

Culturally aware strings can contain symbol expanded to another. For instance in German language 'ß' always expands to 'ss'. If we will compare those two stings by using String.Equals('ss', 'ß)' method result will be 'false' unlike using String.Compare('ss', 'ß') will return us 'true' value (of course first we have to set CurrentCulture to 'de-DE').

 

 

String.CompareTo()

CompareTo() performs the same way as String.Compare does with one difference that CompareTo is not a static method. The method exists due to having String class inherited from IComparable.

public int CompareTo(string strB)

{

    if (strB == null)

    {

        return 1;

    }

    return CultureInfo.CurrentCulture.CompareInfo.Compare(this, strB, CompareOptions.None);

}

 

It is also using culture comparison as String.Compare() does.

 

 

 

String.CompareOrdinal()

As you can see by name of the method String.CompareOrdinal() compares two ordinal strings.

public static int CompareOrdinal(string strA, string strB)

{

    if (strA == strB)

    {

        return 0;

    }

    if (strA == null)

    {

        return -1;

    }

    if (strB == null)

    {

        return 1;

    }

    return CompareOrdinalHelper(strA, strB);

}

 

 

 

String.Intern()

The most interesting mechanism of all is string Interning.

If application frequently compares strings by using ordinal comparison or application uses a lot of strings with the same value, we can improve performance by using interning mechanism.

When CLR initialize it create an internal Hash table where keys are strings and values are references to the String object in Heap.

String.Intern(String) method used to add strings to hash table. It first checks whether identical string does not exist already, if there is already a string with the same signature it will not add string to table but just return reference to existing string object. In case if such string not exits yet, it will add then. Once strings have been added to hash table it is not possible to remove it from table. Hash table destroys after destroying or restarting AppDomain.

After having strings in the table and due to fact that Strings are immutable it is enough to use Object.ReferenceEquals(string, string) method for string comparison. And as you can guess it is extremely fast way to achieve comparison result.

Having too big Array of strings can decrease the performance of application. Although even it is a fast solution we have to be very careful when using intern mechanism.

 

 

Test

Let's support our theory with some concrete numbers. I created simple console application where use all mechanism listed above. Application writes out the result of performance on console.

First of all there are a couple of fields which will be used internally.

private const int iterationNumber = 50000000;

private const string PHRAZE = "This is match text.";

private static string[] stringArray = new string[3]

      {

            "This is a first text.",

            "This is a second text.",

            "This is match text."

      };

  • There is a stringArray which contains several strings inside.
  • PHRAZE constant is used to compare text to one of the strings from stringArray.
  • To see some valuable duration I had to put comparison to cycle as computer is so fast that it does comparison instantly. iterationNumber is a number of iterations in cycle.

 

PublishOutputs is a help method which will write out time result to console. It was created to avoid repeating the same code in all methods.

/// <summary>

/// Writes out to

/// console the duration of execution.

/// </summary>

/// <param name="startTime">

/// Date when concrete execution

/// have been started.

/// </param>

/// <param name="approachName">

/// The name of used approach

/// </param>
static void PublishOutputs(
                  DateTime startTime,
                  string approachName)

{

      TimeSpan durationSpan = new TimeSpan(

            DateTime.Now.Subtract(startTime).Ticks

            );

 

      Console.WriteLine("{0} performed in {1} seconds.",

                        approachName,

                        durationSpan.Seconds);

 

      //Adding space between the lines.

      Console.WriteLine(Environment.NewLine);

}

 

Now we are creating individual methods which have pretty similar implementation with one difference that each of them uses different comparison mechanism

/// <summary>

/// Compare two strings by using String.Compare method

/// </summary>

static void CompareTwoStringsByUsingStringCompare()

{

      DateTime startTime = DateTime.Now;

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  String.Compare(PHRAZE, s);

            }

      }

PublishOutputs(startTime, "String.Compare()");

}

 

/// <summary>

/// Compare two strings by using CompareTo method.

/// </summary>

static void CompareTwoStringsByUsingStringCompareTo()

{

      DateTime startTime = DateTime.Now;

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  s.CompareTo(PHRAZE);

            }

      }

 

PublishOutputs(startTime, "String.CompareTo()");

}

 

/// <summary>

/// Compare two strings by using '==' or '!=' operators.

/// </summary>

static void CompareTwoStringsByUsingEqualOperator()

{

      DateTime startTime = DateTime.Now;

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  if (s == PHRAZE)

                        break;

            }

 

      }

 

PublishOutputs(startTime, "'==' operation");

}

 

/// <summary>

/// Compare two strings by using String.CompareOrdinal() aproach.

/// </summary>

static void CompareTwoStringsByUsingStringCompareOrdinal()

{

      DateTime startTime = DateTime.Now;

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  String.CompareOrdinal(PHRAZE, s);

            }

      }

 

PublishOutputs(startTime, "String.CompareOrdinal()");

}

 

/// <summary>

/// Compare two strings by using String.Equals() mehtod.

/// </summary>

static void CompareTwoStringsByUsingStringEquals()

{

      DateTime startTime = DateTime.Now;

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  String.Equals(PHRAZE, s);

            }

      }

 

PublishOutputs(startTime, "String.Equals()");

}

 

During intern mechanism we are first adding all strings to internal Array and then starting to compare whether string instances are referencing to the same object.

/// <summary>

/// Compare two strings by using Inter mechanism.

/// </summary>

static void CompareTwoStringsByUsingStringIntern()

{

      DateTime startTime = DateTime.Now;

 

      foreach (string s in stringArray)

      {

            String.Intern(s);

      }

 

      String.Intern(PHRAZE);

 

      for (int i = 0; i <= iterationNumber; i++)

      {

            foreach (string s in stringArray)

            {

                  Object.ReferenceEquals(PHRAZE, s);

            }

      }

 

PublishOutputs(startTime, "String.Intern()");

}

 

Lets now run all methods in one.

static void Main(string[] args)

{

      //Compare two strings by using Compare method

      CompareTwoStringsByUsingStringCompare();

 

      //Using CompareTo method of one string instance

      CompareTwoStringsByUsingStringCompareTo();

 

      //Using Equal

      CompareTwoStringsByUsingStringEquals();

 

      //Using == operator

      CompareTwoStringsByUsingEqualOperator();

                 

      //Using CompareOrdinal.

      CompareTwoStringsByUsingStringCompareOrdinal();

                 

      //Compare two string by interning them before comparison.

      CompareTwoStringsByUsingStringIntern();

                 

      Console.ReadKey();

}

 

The output:

image

 

As you can see the intern mechanism is the fastest approach but we should be very carefully using it, I am not sure if it will perform with the same result when internal Array will be much more bigger. 

Very close to Intern result is String.Equals(). 

And of course, the longest duration is comparing culturally aware stings: String.Compare(), CompareTo()

 

Conclusion

Use String.Equals(string, string, StringComparation.Ordinal) to compare two ordinal strings and String.Equals(String, String, StringComparation.CurrentCulture) to compare two culturally aware strings. It performs fast and signature of the method shows exactly which type of comparison used: Ordinal or Culture aware.

 

 

kick it on DotNetKicks.com

Posted: 4. října 2007 10:08 by admin
Filed under: , , ,

Comments

New Comments to this post are disabled