C# Get Plain Text from HTML String

Below function will help to remove all HTML tags, scripts, css, styles from html string and convert it to a plain text.

private string GetPlainTextFromHtml(string htmlString)
{
    string htmlTagPattern = "<.*?>";
    var regexCss = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)", RegexOptions.Singleline | RegexOptions.IgnoreCase);
    htmlString = regexCss.Replace(htmlString, string.Empty);
    htmlString = Regex.Replace(htmlString, htmlTagPattern, string.Empty);
    htmlString = Regex.Replace(htmlString, @"^\s+$[\r\n]*", "", RegexOptions.Multiline);
    htmlString = htmlString.Replace("&nbsp;", string.Empty);

    return htmlString;
}

Enjoy!!

8 thoughts on “C# Get Plain Text from HTML String

  1. When I attempt to implement this, I get a “not enough )’s” exception on the second line in the method. How to fix…?

Leave a comment