Optimal String Comparison
.Net Framework gives us several tools how to compare two strings. We can use String.Equals(), String.Compare(), StringCompareTo(), etc. What we are going to discover now it is what is the performance impact of each of this methods.
Let's first take a look into implementation of some of them.
String.Equals()
Static String.Equals(string, string) method is one of the fastest ways how to compare two strings because String.Equals(string, string) ignores CultureInfo information. It is the best way how to compare two ordinal strings. Basically in most of cases we are comparing ordinal text used for internal purposes of application and it is not related to any specific culture.
Implementation of String.Equals(string, string) includes several steps which increase comparison performance.
public static bool Equals(string a, string b)
{
if (a == b)
{
return true;
}
if ((a != null) && (b != null))
{
return EqualsHelper(a, b);
}
return false;
}
private static unsafe bool EqualsHelper(string strA, string strB)
{
int length = strA.Length;
if (length != strB.Length)
{
return false;
}
...
-
First, method compares whether both strings are referencing to the same object. If they do, method will immediately return true value without checking context.
-
Then method checks whether stings have some value. Both input parameters should contain any value.
-
After method compares length of both strings. It is obvious that if strings contain different number of characters, they cannot be equal.
-
And then if strings passed first three steps, method starts to compare context char by char.
Although String.Equals(String, String) is good approach for strings comparison, it is recommended to use another overload of Equals method - String.Equals(String, String, StringComparation). Except of string parameters, method accepts StringComparation enumeration which specifies comparison type.
public enum StringComparison
{
CurrentCulture,
CurrentCultureIgnoreCase,
InvariantCulture,
InvariantCultureIgnoreCase,
Ordinal,
OrdinalIgnoreCase
}
Method becomes much more readable. Signature says itself which type of comparison has been used.
'==' or '!=' Operators
'==' and '!=' operators have the same impact like String.Equals does. Internally they call Equals(string, string) method. Using operators is also good and fast approach but you have to keep in mind that it compares ordinal (not culturally aware) strings.
op_Equality
public static bool operator ==(string a, string b)
{
return Equals(a, b);
}
op_Inequality
public static bool operator !=(string a, string b)
{
return !Equals(a, b);
}
String.Compare()
Static method String.Compare(string, string) compares strings by using culture information which makes it slower than String.Equals. Cultural aware comparison immediately checks strings symbol by symbol as even two stings with different lengths could be equal.
public static int Compare(string strA, string strB)
{
return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.None);
}
Culturally aware strings can contain symbol expanded to another. For instance in German language 'ß' always expands to 'ss'. If we will compare those two stings by using String.Equals('ss', 'ß)' method result will be 'false' unlike using String.Compare('ss', 'ß') will return us 'true' value (of course first we have to set CurrentCulture to 'de-DE').
String.CompareTo()
CompareTo() performs the same way as String.Compare does with one difference that CompareTo is not a static method. The method exists due to having String class inherited from IComparable.
public int CompareTo(string strB)
{
if (strB == null)
{
return 1;
}
return CultureInfo.CurrentCulture.CompareInfo.Compare(this, strB, CompareOptions.None);
}
It is also using culture comparison as String.Compare() does.
String.CompareOrdinal()
As you can see by name of the method String.CompareOrdinal() compares two ordinal strings.
public static int CompareOrdinal(string strA, string strB)
{
if (strA == strB)
{
return 0;
}
if (strA == null)
{
return -1;
}
if (strB == null)
{
return 1;
}
return CompareOrdinalHelper(strA, strB);
}
String.Intern()
The most interesting mechanism of all is string Interning.
If application frequently compares strings by using ordinal comparison or application uses a lot of strings with the same value, we can improve performance by using interning mechanism.
When CLR initialize it create an internal Hash table where keys are strings and values are references to the String object in Heap.
String.Intern(String) method used to add strings to hash table. It first checks whether identical string does not exist already, if there is already a string with the same signature it will not add string to table but just return reference to existing string object. In case if such string not exits yet, it will add then. Once strings have been added to hash table it is not possible to remove it from table. Hash table destroys after destroying or restarting AppDomain.
After having strings in the table and due to fact that Strings are immutable it is enough to use Object.ReferenceEquals(string, string) method for string comparison. And as you can guess it is extremely fast way to achieve comparison result.
Having too big Array of strings can decrease the performance of application. Although even it is a fast solution we have to be very careful when using intern mechanism.
Test
Let's support our theory with some concrete numbers. I created simple console application where use all mechanism listed above. Application writes out the result of performance on console.
First of all there are a couple of fields which will be used internally.
private const int iterationNumber = 50000000;
private const string PHRAZE = "This is match text.";
private static string[] stringArray = new string[3]
{
"This is a first text.",
"This is a second text.",
"This is match text."
};
- There is a stringArray which contains several strings inside.
- PHRAZE constant is used to compare text to one of the strings from stringArray.
- To see some valuable duration I had to put comparison to cycle as computer is so fast that it does comparison instantly. iterationNumber is a number of iterations in cycle.
PublishOutputs is a help method which will write out time result to console. It was created to avoid repeating the same code in all methods.
/// <summary>
/// Writes out to
/// console the duration of execution.
/// </summary>
/// <param name="startTime">
/// Date when concrete execution
/// have been started.
/// </param>
/// <param name="approachName">
/// The name of used approach
/// </param>
static void PublishOutputs(
DateTime startTime,
string approachName)
{
TimeSpan durationSpan = new TimeSpan(
DateTime.Now.Subtract(startTime).Ticks
);
Console.WriteLine("{0} performed in {1} seconds.",
approachName,
durationSpan.Seconds);
//Adding space between the lines.
Console.WriteLine(Environment.NewLine);
}
Now we are creating individual methods which have pretty similar implementation with one difference that each of them uses different comparison mechanism
/// <summary>
/// Compare two strings by using String.Compare method
/// </summary>
static void CompareTwoStringsByUsingStringCompare()
{
DateTime startTime = DateTime.Now;
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
String.Compare(PHRAZE, s);
}
}
PublishOutputs(startTime, "String.Compare()");
}
/// <summary>
/// Compare two strings by using CompareTo method.
/// </summary>
static void CompareTwoStringsByUsingStringCompareTo()
{
DateTime startTime = DateTime.Now;
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
s.CompareTo(PHRAZE);
}
}
PublishOutputs(startTime, "String.CompareTo()");
}
/// <summary>
/// Compare two strings by using '==' or '!=' operators.
/// </summary>
static void CompareTwoStringsByUsingEqualOperator()
{
DateTime startTime = DateTime.Now;
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
if (s == PHRAZE)
break;
}
}
PublishOutputs(startTime, "'==' operation");
}
/// <summary>
/// Compare two strings by using String.CompareOrdinal() aproach.
/// </summary>
static void CompareTwoStringsByUsingStringCompareOrdinal()
{
DateTime startTime = DateTime.Now;
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
String.CompareOrdinal(PHRAZE, s);
}
}
PublishOutputs(startTime, "String.CompareOrdinal()");
}
/// <summary>
/// Compare two strings by using String.Equals() mehtod.
/// </summary>
static void CompareTwoStringsByUsingStringEquals()
{
DateTime startTime = DateTime.Now;
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
String.Equals(PHRAZE, s);
}
}
PublishOutputs(startTime, "String.Equals()");
}
During intern mechanism we are first adding all strings to internal Array and then starting to compare whether string instances are referencing to the same object.
/// <summary>
/// Compare two strings by using Inter mechanism.
/// </summary>
static void CompareTwoStringsByUsingStringIntern()
{
DateTime startTime = DateTime.Now;
foreach (string s in stringArray)
{
String.Intern(s);
}
String.Intern(PHRAZE);
for (int i = 0; i <= iterationNumber; i++)
{
foreach (string s in stringArray)
{
Object.ReferenceEquals(PHRAZE, s);
}
}
PublishOutputs(startTime, "String.Intern()");
}
Lets now run all methods in one.
static void Main(string[] args)
{
//Compare two strings by using Compare method
CompareTwoStringsByUsingStringCompare();
//Using CompareTo method of one string instance
CompareTwoStringsByUsingStringCompareTo();
//Using Equal
CompareTwoStringsByUsingStringEquals();
//Using == operator
CompareTwoStringsByUsingEqualOperator();
//Using CompareOrdinal.
CompareTwoStringsByUsingStringCompareOrdinal();
//Compare two string by interning them before comparison.
CompareTwoStringsByUsingStringIntern();
Console.ReadKey();
}
The output:
As you can see the intern mechanism is the fastest approach but we should be very carefully using it, I am not sure if it will perform with the same result when internal Array will be much more bigger.
Very close to Intern result is String.Equals().
And of course, the longest duration is comparing culturally aware stings: String.Compare(), CompareTo()
Conclusion
Use String.Equals(string, string, StringComparation.Ordinal) to compare two ordinal strings and String.Equals(String, String, StringComparation.CurrentCulture) to compare two culturally aware strings. It performs fast and signature of the method shows exactly which type of comparison used: Ordinal or Culture aware.
