C# basics in practice - manipulating text files

If you have just started learning C# and .NET, you might be tired of simple demo apps that don’t do anything other than output “Hello world!” text on the screen. The good news is that if you have mastered the basic syntax of C#, you are already equipped to build some useful apps with it.

This short tutorial will walk you through such an app. What we will look at is a simple console application that serves a very useful purpose – reading text from a text file and saving it into another text file as HTML.

This is a utility that I have built for my own purposes. The website that I have built for blogging a while ago, mobiletechtracker.co.uk, has initially started as a learning project. Therefore, instead of using any pre-existing blogging platform, such as WordPress, it was build from scratch.

Although the process of building every component of the website has taught me many useful skills and have improved my programming career, the disadvantage of this approach is that I have to write every article as HTML before I post it. As I type most of my articles on a mobile phone while I’m out and about, typing HTML is quite awkward. Therefore, I have built this tool that does HTML formatting for me, which I am now sharing with the world.

In this tutorial, you will be given access to the source code of this tool and you will be taught how it reads text files, processes the text and saves output into another text file. This will be enough for you to be able to adopt the concepts to your own goals.

This tutorial will be suitable for anyone who doesn’t know much C# beyond its basic syntax. Or it will be a good refresher to those who have forgotten how to do file processing in C#.

Let’s begin.

Importing the source code

The complete code of the .NET Core console application can be found in the following GitHub repository:

HtmlTextParser

If you aren’t familiar with Git, this page will tell you how to install it amd this page will tell you how to clone the code to your own machine. However, in order to be able to follow this tutorial, you wouldn’t have to clone the code. You can explore all of it on the web page of the GitHub repository itself.

The application starts

Any .NET application will need to have an entry point to start. It can be any class that has a static void method called Main() which, optionally, can accept parameters as an array of strings. By Convention, the class is called Program. However, there is nothing that stops you from giving it any name you want.

From C# version 7 and higher, the Main method can be of type Task as well as void.

Our entry point is located within Program.cs file inside of our solution folder. Let’s have a look at it.

When our program starts, it first goes through the following code:

Console.WriteLine("Welcome to HTML text parser.");
Console.WriteLine("The following actions will be performed:");
Console.WriteLine("- The string would be HTML-encoded.");
Console.WriteLine("- Every paragraph will be enclsed in <p> tag.");
Console.WriteLine("- Every text enclosed in =1= will be enclosed in <h1> tag.");
Console.WriteLine("- Every text enclosed in =2= will be enclosed in <h2> tag.");
Console.WriteLine("- Every text enclosed in =3= will be enclosed in <h3> tag.");
Console.WriteLine("- A line with <br/> tag will be added before any header tag.");
Console.WriteLine("- Any text enclosed in == tags will be enclosed in <a> tag with a placeholder for the URL.");
Console.WriteLine("Please specify the text file to convert to HTML.");

Console.WriteLine() is a standard way of writing a line of text as a command line output in C#. So, what we are doing here is writing a number of lines that tell the user what the application does and ask him to provide the full path to the text file that is to be processed.

Next, we instantiate a FileProcessor class and save it into a variable with the name of fileProcessor.

var fileProcessor = new FileProcessor();

The following line waits for the user to input a textual value. The standard way of doing so in C# is to call Console.ReadLine(). The execution of the code will pause on that line until the user has entered the input.

var fullFilePath = Console.ReadLine();

Then, we take this variable and launch ProcessFile() method on the fileProcessor variable by passing this newly defined variable.

fileProcessor.ProcessFile(fullFilePath);

That’s it with the logic that launches the file processor. Let’s now have a look at how the text in the file is being processed. Let’s now open FileProcessor.cs file and have a look inside it’s ProcessFile() method.

Processing the text

The first line that you see is this:

var inputText = System.Web.HttpUtility.HtmlEncode(File.ReadAllText(fullFilePath));

What this does is attempts to go to the path specified, open the file and read it’s entire content. It will then convert any functional HTML characters into textual representation of these characters that can be displayed on a HTML page. For example, if you have ” character, it will convert it to ". If you don’t do it, certain characters may break your HTML. For full list of characters that HtmlEncode() method is going to convert, you can visit this page.

You may be asking at this point: what if the path that has been provided is incorrect or doesn’t point at a text file? In this case, an error will be thrown and it will be handled by “catch” block inside Program class that has called this method. The following line will display the exact error message to the user:

Console.WriteLine($"Error processing file: {ex.Message}");

The following piece of code splits the text into individual paragraphs. If does so by using Regular expression, which matches any instances of new line and carriage return.

var paragraphs = Regex.Split(inputText, @"(\r\n?|\n)") .Where(p => p.Any(char.IsLetterOrDigit));

We then instantiate a StringBuilder for an efficient construction of output text:

var sb = new StringBuilder();

Then, we are looking at each of the individual paragraphs and mutate it by our specialized processing methods:

foreach (var paragraph in paragraphs)
{
  if (paragraph.Length == 0)
    continue;

  ApplyHeadingIfRelevant(paragraph, sb, 1);
  ApplyHeadingIfRelevant(paragraph, sb, 2);
  ApplyHeadingIfRelevant(paragraph, sb, 3);
  ApplyParagraphIfRelevant(paragraph, sb);
}

Let’s have a look at them individually.

The first one of them, ApplyHeadingIfRelevant(), checks whether the paragraph should be treated as a header and mutates it accordingly:

private void ApplyHeadingIfRelevant(string paragraph, StringBuilder sb, int headerType)
{
  if (paragraph.StartsWith($"={headerType}=") && paragraph.EndsWith($"={headerType}="))
  {
    sb.AppendLine("<br/>");
    sb.AppendLine($"<h{headerType}>{paragraph.Replace($"={headerType}=", string.Empty)}</h{headerType}>");
  }
}

As you have probably noticed, we are calling this method three times. And this is because the utility application supports three types of headers: <h1>, <h2> and <h3>.

Of course, it is much easier to write =2= instead of placing <h2> opening tag and </h2> closing tag. And there is decreased chances of accidental errors being introduced due to HTML misspelling. So, as you type your original content in, all you’ll have to do is surrond your line in either =1=, =2= or =3=. The above method will automatically convert it into syntactically correct HTML header and will prepend a brake line before it, so it’s more readable on a web page.

The next method we’ll have a look at is ApplyParagraphIfRelevant():

private void ApplyParagraphIfRelevant(string paragraph, StringBuilder sb) {
  if (IsParagraph(paragraph))
  {
    paragraph = ApplyAnchorsIfRelevant(paragraph);
    sb.AppendLine($"<p>{paragraph}</p>");
  }
}

What this does is checks whether the piece of text that has been passed in is meant to be a paragraph. If so, it will first replace anchor placeholders with syntactically correct HTML anchors and then apply opening <p> tag at the beginning of the text and closing </p> tag at the end.

Verification of whether the text is paragraph by IsParagraph() method is simple. It does the reverse of what ApplyHeadingIfRelevant() and will return false if the text starts and ends with any valid header markers.

The anchor replacement is done by the following methods:

private string ApplyAnchorsIfRelevant(string paragraph)
{
  if (IsParagraph(paragraph) &&
    paragraph.Contains("=="))
  {
    // Exit if there is an uneven number of anchor placeholders
    if (CountStringOccurrences(paragraph, "==") % 2 == 0)
      paragraph = ApplyAnchorPlaceholders(paragraph, "==");
  }

      return paragraph;
}

private int CountStringOccurrences(string text, string pattern)
{
  int count = 0;
  int currentIndex = 0;
  while ((currentIndex = text.IndexOf(pattern, currentIndex)) != -1)
  {
    currentIndex += pattern.Length;
    count++;
  }
  return count;
}

private string ApplyAnchorPlaceholders(string text, string pattern)
{
  int count = 0;
  int currentIndex = 0;

  while ((currentIndex = text.IndexOf(pattern, currentIndex)) != -1)
  {
    count++;

    if (count % 2 != 0)
    {
      var prepend = "<a href=\"...\">";
      text = text.Insert(currentIndex, prepend);
      currentIndex += prepend.Length + pattern.Length;
    }
    else
    {
      var append = "</a>";
      text = text.Insert(currentIndex, append);
      currentIndex += append.Length + pattern.Length;
    }
  }

  return text.Replace(pattern, string.Empty);
}

Anchor HTML element <a> provides the ability to add a web link to a piece of text. In the raw input, we mark such text by surrounding it with == characters.

The first thing that we do is check whether the paragraph has an even number of these markers by counting their occurrences via CountStringOccurrences() method and applying a modulo (%) of two. If the whole number cannot be neatly divided by two, the modulo operation will return a whole number remainder after the division, So the result of zero would indicate that the count is even.

Once we have counted the occurrences, we then replace the placeholders with actual anchor elements. We do so, once again, by performing a modulo operation. If the instance of the placeholder has an uneven count (count % 2 != 0), we replace it with the opening tag of <a href=”…”>. Otherwise, we are applying a closing anchor tag of </a>.

Please note that in the code, we have backslashes next to the double quote characters inside the opening anchor tag text. This is because double quotes in C# indicate opening and closing of string values. To use them as characters inside string values, we need to escape them with the backslash character immediately before them.

We are not placing any concrete web links into anchor elements just yet. All we are doing is providing placeholders for them. Once you have processed the text and created the HTML, you can then open the output file and search for all the instances of href=”…” and then replacing … with any web links of your choice.

Finally, once we have processed all of our paragraphs and have converted the text into HTML, it’s time to save the text into the output file. We are doing so inside the WriteToFile() method:

var outputFilePath = Path.GetDirectoryName(fullFilePath) + Path.DirectorySeparatorChar +
  Path.GetFileNameWithoutExtension(fullFilePath) + "-html" + Path.GetExtension(fullFilePath);

using (StreamWriter file =
new StreamWriter(outputFilePath))
{
  file.Write(text);
}

Let’s break it down.

We need to define the path of the output file. First, we are extracting the directory name of the original input file, so the output file will bne saved into the same directory. We are doing this via the following call:

Path.GetDirectoryName(fullFilePath)

Next, we append a directory separator to it, so the file name can be written after it. As the directory separator will differ between Windows (\) and Unix-based system (/), we are using in-built Path.DirectorySeparatorChar constant to return the one that is relevant to our specific host environment.

Next, we are appending “-html” to the original file name, so it will be easily identifiable, but will be different from the input file:

Path.GetFileNameWithoutExtension(fullFilePath) + "-html"

After that, we are appending the original file extension:

Path.GetExtension(fullFilePath)

So, if your origional file path was C:\Temp\text.txt, the output file will be C:\Temp\text-html.txt.

Finally, we are writing our output text into the file:

using (StreamWriter file =
new StreamWriter(outputFilePath))
{
  file.Write(text);
}

“using” statement in this context will guarantee that the runtime will dispose of StreamWriter object after we have finished with it.

Parting words

This code is simple enough to give a C# novice a good real-life example. However, it is also useful enough that even more experienced C# programmers may benefit from it.

If you want to copy the code for your own purposes, copy all you want. I really don’t mind.

I may decide to add other features to the code. If this happens, this article will be updated accordingly.

If you are interested to gain a good knowledge of .NET Core and C# basics, you may find the following free Udemy course helpful:

.NET Core for absolute beginners

And there are some more online courses available via my website:

Coding Courses

Good luck and happy hacking!