Whether you are still learning C# or already have some experience, it is usually difficult to know what you don’t know. This article aims to short circuit your C# learning.  

String manipulation is a crucial aspect of programming in C#. In this blog post, we will cover some essential tips and tricks for handling strings, characters, and formatting in C#. First, we will explore the difference between strings and characters, and how they are used in C#. Then, we will dive into some tips for formatting strings, including string interpolation and composite formatting. Finally, we will discuss some best practices for working with strings, such as avoiding string concatenation and using StringBuilder instead.

Some of the main topics we will introduce include string, character and formatting techniques, techniques for using files, paths and URIs, techniques for organizing and structuring classes and codes, techniques for code compilation, and conversion and conversion techniques.  

Tips covered in this blog post include:

  • Understanding the difference between strings and characters in C#
  • Using string interpolation and composite formatting for easier string formatting
  • Utilizing StringBuilder for more efficient string manipulation
  • Avoiding string concatenation for better performance and readability
  • Handling special characters and escape sequences in strings

By the end of this blog post, readers will have a solid understanding of how to work with strings, characters, and formatting in C#, and will be equipped with some useful tips and best practices for writing efficient and readable code.

Simplifying String Empty and Null Checking Code

Understanding the difference between strings and characters in C#

In C#, a string is a sequence of characters, and a character is a single Unicode character. Strings are represented by the string data type, while characters are represented by the char data type.

The main difference between strings and characters is their length and usage. Strings can contain multiple characters and can be used to represent words, sentences, paragraphs, and more. On the other hand, characters are used to represent a single Unicode character, such as a letter, number, or symbol.

In C#, both strings and characters are enclosed in quotes. Single quotes (') are used for characters, while double quotes (") are used for strings. For example:

char myChar = 'a';
string myString = "Hello World!";

It’s important to understand the difference between strings and characters when working with them in C#. Knowing when to use one or the other can help you write more efficient and effective code.

Most of us have come across a requirement of checking whether a given string is null or empty. 

Let us have a look at the usual way to code this condition:

private static string GetFirstName()
        {
            while (true)
            {
                WriteLine("Please enter first name");
                string firstName = ReadLine(); 
                if (firstName == null || firstName.Length == 0 || IsAllWhiteSpace(firstName)) 
                {
                    WriteLine("ERROR: Invalid first name"); 
                }
                else 
                { 
                    return firstName; 
                }
            }
        }

As you can see in the above code once the user has typed in the first name, we wanted to do some validation on it. The 

if statement above checking that first name is invalid if it is null or if the length is 0 or if it contains all whitespaces. 

To simplify this we can make use of 

string.IsNullOrWhiteSpacemethod. In our case, this statement will return true if the string is a null and an empty string or contains only whitespace characters, such as space and tabs. 

if (string.IsNullOrWhiteSpacee(firstName)) 
                {
                    WriteLine("ERROR: Invalid first name"); 
                }
                else 
                { 
                    return firstName; 
                }

There’s also another convenience method on the 

string class called string.IsNullOrEmpty. This method returns true if the specified string is null or empty, meaning that it contains no characters. But in our case, we make use of  IsNullOrWhiteSpace because we want to make sure that the user hasn’t typed in a few spaces and hit Enter.

Handling special characters and escape sequences in strings

When working with strings in C#, it’s important to know how to handle special characters and escape sequences. Special characters are characters that have a special meaning in the context of a string, such as a newline character or a tab character. Escape sequences are a way to represent these special characters in a string.

In C#, escape sequences start with a backslash (\) character and are followed by one or more characters that represent the special character. Some commonly used escape sequences in C# include:

  • \n – represents a newline character
  • \t – represents a tab character
  • \" – represents a double quote character
  • \' – represents a single quote character
  • \\ – represents a backslash character

For example, if you wanted to include a double quote character in a string, you would use the escape sequence \". Similarly, if you wanted to include a newline character in a string, you would use the escape sequence \n.

Here’s an example of how to use escape sequences in a C# string:

string myString = "This is a \"quoted\" string.\nIt has a newline character.";

In this example, we use the escape sequence \" to include the double quotes around the word “quoted”, and we use the escape sequence \n to include a newline character after the word “string”.

By understanding how to use special characters and escape sequences in strings, you can create more complex and meaningful strings in your C# code.

Let’s look at next how we can validate whether or not a value is a valid Unicode character.

Testing Char Unicode Validity

Understanding the difference between strings and characters in C#

It’s important to understand the relationship between characters and their corresponding Unicode values.

In C#, characters are represented using the char data type, which is a 16-bit value that can represent any Unicode character. Each character is assigned a unique Unicode value, which is a standardized numerical representation of that character.

Unicode is a character encoding standard that assigns each character in the world a unique code point. Unicode supports a wide range of characters, including those used in different languages, symbols, and emoji.

When working with characters in C#, it’s important to know the corresponding Unicode value of each character. You can use the Convert.ToUInt16() method to get the Unicode value of a character. For example:

char myChar = 'A';
ushort unicodeValue = Convert.ToUInt16(myChar);

In this example, we assign the character 'A' to the myChar variable and then use the Convert.ToUInt16() method to get its corresponding Unicode value, which is 65.

Understanding the relationship between characters and Unicode values is important when working with different languages, character sets, and text encodings. By knowing the Unicode value of a character, you can ensure that it is represented correctly in your code and in any output.

private static string GetEmployeeCode()
        {
            while (true)
            {
                WriteLine("Please enter employee code");

                char emplyeeCode = ReadLine().First();
                return emplyeeCode;
            }
        }

Above we have our GetEmplyeeCode method. All this method does is to loop around until we’ve got an employee code. We don’t have any additional validation in there, such as checking for null or empty or whitespace strings. In a real situation, you’d want to have more validation in this, but lets keep things simple for demo purposes.

We’re making use of the console’s ReadLine method, and then we’re just getting the first character. Sometimes you might want to check that a value represents a valid Unicode character.

To do this, we can take a character and get its Unicode category. Below we declare a variable unCategory of type UnicodeCategory , which exist in system.Globalization  namespace. To get the Unicodecharacter , we can use a static method of the char class, and GetUnicodeCategory method and in our case provide employeeCode:

UnicodeCategory ucCategory = char.GetUnicodeCategory(employeeCode);

Now we know what Unicode category the employee code belongs to. We can check to make sure if it is correct by declaring a variable called isValidUnicode, where we can validate if it’s Unicode category is not equal to UnicodeCategory.OtherNotAssignedenum value.

bool isValidUnicode = ucCategory != UnicodeCategory.OtherNotAssigned;

UnicodeCategory enum can be used to validate another type of category to determine if the character is a currency symbol, dash punctuation, lowercase letter, and so on. You can find a full list of UnicodeCategory Enum on the link. 

private static string GetEmployeeCode()
        {
            while (true)
            {
                WriteLine("Please enter employee code");

                //char emplyeeCode = ReadLine().First();
                emplyeeCode = (char)888;

                UnicodeCategory ucCategory = char.GetUnicodeCategory(employeeCode);

                bool isValidUnicode = ucCategory != UnicodeCategory.OtherNotAssigned;

                if(!isValidUnicode)
                {
                    WriteLine("ERROR: Invalid employee code (invalid character)");
                }
                else 
                {
                return emplyeeCode;
                }
            }
        }

It is a bit tricky to input an invalid Unicode character, especially when using the Console.ReadLine method, so in above code I have just set employee code to an integer 888 and cast to a char. This will trigger the error since 888 is not a valid Unicode character thus assert the implementation.

Handling special characters and escape sequences

It’s also important to know how to handle special characters and escape sequences in strings, especially when working with non-ASCII characters.

Unicode supports a wide range of characters from different languages, symbols, and emoji, and some of these characters require multiple bytes to represent them. When working with non-ASCII characters, it’s important to use the correct encoding to represent these characters in your code and in any output.

In C#, you can use the Encoding class to encode and decode strings using different encodings, such as UTF-8, UTF-16, and ASCII. The Encoding class provides methods to convert strings to byte arrays and vice versa, using the specified encoding.

For example, to encode a string as UTF-8, you can use the following code:

string myString = "こんにちは";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(myString);

In this example, we use the Encoding.UTF8.GetBytes() method to encode the string "こんにちは" as a byte array using the UTF-8 encoding.

When working with special characters and escape sequences in strings, you should also be aware of how they are interpreted by the encoding used. Some escape sequences, such as \u and \U, allow you to specify Unicode code points directly in the string. However, the interpretation of these escape sequences may differ depending on the encoding used.

By understanding how to handle special characters and escape sequences in strings, and by using the correct encoding for non-ASCII characters, you can ensure that your C# code works correctly with a wide range of characters from different languages and character sets.

String Formatting and String Interpolation

Using string interpolation and composite formatting for easier string formatting

String interpolation allows you to embed expressions directly within a string literal by using the $ symbol before the string. You can then include expressions within curly braces {} to be evaluated and included in the resulting string.

For example:

string name = "Alice";
int age = 30;
string message = $"My name is {name} and I am {age} years old.";

In this example, the expressions within the curly braces {} are evaluated and included in the resulting string. The value of the name variable is inserted where {name} appears, and the value of the age variable is inserted where {age} appears.

Composite formatting, on the other hand, uses placeholders within a string to specify where arguments should be inserted. Placeholders are represented by a set of braces {} with a format specifier inside. The String.Format() method is used to format the string and insert the arguments.

For example:

string name = "Bob";
int age = 25;
string message = String.Format("My name is {0} and I am {1} years old.", name, age);

Let’s say we wanted to start off by outputting the first name and employee code. Often we see following string concatenation use to address:

WriteLine("First Name:" + emplyee.FirstName + " Emplyee Code: " + emplyee.EmplyeeCode);

In this example, the placeholders {0} and {1} represent the first and second arguments passed to the String.Format() method. The value of the name variable is inserted where {0} appears, and the value of the age variable is inserted where {1} appears.

Both string interpolation and composite formatting offer a powerful and flexible way to format strings in C#. They can help make your code more readable and maintainable by allowing you to easily combine variables and other expressions with text. By choosing the right approach for your use case, you can ensure that your string formatting is easy to understand and modify.

Utilizing StringBuilder for more efficient string manipulation

When working with large or complex strings, it’s often more efficient to use the StringBuilder class instead of string concatenation.

String concatenation involves creating a new string every time two strings are added together, which can be very inefficient when done repeatedly. StringBuilder, on the other hand, provides a way to efficiently build and manipulate strings by modifying a single buffer.

Here’s an example of using StringBuilder to concatenate a list of items:

var items = new List<string> { "apples", "bananas", "oranges" };
var sb = new StringBuilder();

foreach (var item in items)
{
    sb.Append(item);
    sb.Append(", ");
}

// Remove the trailing comma and space
sb.Length -= 2;

var result = sb.ToString(); // "apples, bananas, oranges"

In this example, we create a new StringBuilder instance and then use the Append() method to add each item to the buffer. Once all items have been added, we remove the trailing comma and space using the Length property and then convert the StringBuilder instance to a string using the ToString() method.

Using StringBuilder can be much more efficient than string concatenation, especially when building large strings or when the number of concatenations is unknown ahead of time. By using StringBuilder, you can minimize memory usage and avoid unnecessary string allocations, resulting in faster and more efficient code.

There are some interesting use cases below you may find interesting:

String format index:

int oranges = 2; 
int apples = 4; 
int bananas = 3; 

string line = "There are {0} oranges, {1} apples and {2} bananas"; 
Console.WriteLine(line, oranges, apples, bananas);

// Expected output:
//There are 2 oranges, 4 apples and 3 bananas

String format numeric data:

Console.WriteLine("{0} {1, 12}", "Decimal", "Hexadecimal"); 
Console.WriteLine("{0:D} {1,8:X}", 502, 546); 

// Expected output: 
// Decimal Hexadecimal 
// 502 222

String format date and time:

DateTime today = DateTime.Now;
Console.WriteLine("Short date: {0:d}", today);
Console.WriteLine("Long date: {0:D}", today);
Console.WriteLine("Short time: {0:t}", today);

// Expected output:
// Short date: 11/28/2019
// Long date: Thursday, November 28, 2019
// Short time: 1:27 PM

Further on String Interpolation

One disadvantage of the string.Format approach is that we’re somewhat disconnected between the items in the format string, such as 0 and 1, and the actual data that’s being passed into them. An alternative method that solves this problem is to make use of C#’s string interpolation. This feature is available starting with C# 6.

To identify a string literal as an interpolated string, prefixing it with the $ symbol. You can embed any valid C# expression that returns a value in an interpolated string. In the following example, as soon as an expression is evaluated, its result is converted into a string and included in a result string:

WriteLine($"First Name: {emplyee.FirstName} Emplyee Code: {emplyee.EmplyeeCode}"); 

// Expected output: 
// First Name: Morgan Emplyee Code: S double 

a = 3; 
double b = 4; 
Console.WriteLine($"Area of the right triangle with legs of {a} and {b} is {0.5 * a * b}"); 
Console.WriteLine($"Length of the hypotenuse of the right triangle with legs of {a} and {b} is {CalculateHypotenuse(a, b)}"); 

double CalculateHypotenuse(double leg1, double leg2) => Math.Sqrt(leg1 * leg1 + leg2 * leg2); 

// Expected output: 
// Area of the right triangle with legs of 3 and 4 is 6 
// Length of the hypotenuse of the right triangle with legs of 3 and 4 is 5

Control the field width and alignment of the formatted interpolation expression:

const int NameAlignment = -9;
const int ValueAlignment = 7;

double a = 3;
double b = 4;
Console.WriteLine($"Three classical Pythagorean means of {a} and {b}:");
Console.WriteLine($"|{"Arithmetic",NameAlignment}|{0.5 * (a + b),ValueAlignment:F3}|");
Console.WriteLine($"|{"Geometric",NameAlignment}|{Math.Sqrt(a * b),ValueAlignment:F3}|");
Console.WriteLine($"|{"Harmonic",NameAlignment}|{2 / (1 / a + 1 / b),ValueAlignment:F3}|");

// Expected output:
// Three classical Pythagorean means of 3 and 4:
// |Arithmetic|  3.500|
// |Geometric|  3.464|
// |Harmonic |  3.429|

Use escape sequences in an interpolated string:

var xs = new int[] { 1, 2, 7, 9 };
var ys = new int[] { 7, 9, 12 };
Console.WriteLine($"Find the intersection of the {{{string.Join(", ",xs)}}} and {{{string.Join(", ",ys)}}} sets.");

var userName = "Jane";
var stringWithEscapes = $"C:\\Users\\{userName}\\Documents";
var verbatimInterpolated = $@"C:\Users\{userName}\Documents";
Console.WriteLine(stringWithEscapes);
Console.WriteLine(verbatimInterpolated);

// Expected output:
// Find the intersection of the {1, 2, 7, 9} and {7, 9, 12} sets.
// C:\Users\Jane\Documents
// C:\Users\Jane\Documents

The ternary conditional operator ?: in an interpolation expression:

var rand = new Random();
for (int i = 0; i < 7; i++)
{
    Console.WriteLine($"Coin flip: {(rand.NextDouble() < 0.5 ? "heads" : "tails")}");
}

Using the invariant culture:

string messageInInvariantCulture = FormattableString.Invariant($"Date and time in invariant culture: {DateTime.Now}");
Console.WriteLine(messageInInvariantCulture);

// Expected output is like:
// Date and time in invariant culture: 05/17/2018 15:46:24

Interpolated strings support all the capabilities of the string composite formatting feature. That makes them a more readable alternative to the use of the String.Format method.

Conclusion:

In conclusion, string manipulation is a vital skill for any C# developer. By understanding the difference between strings and characters and using the right formatting techniques, developers can write cleaner, more efficient code. By utilizing best practices like avoiding string concatenation and using StringBuilder, developers can improve the performance and readability of their code. By incorporating these tips and tricks into their programming practices, developers can write better, more effective C# code.