Below function will help to remove all HTML tags, scripts, css, styles from html string and convert it to a plain text.
private string GetPlainTextFromHtml(string htmlString) { string htmlTagPattern = "<.*?>"; var regexCss = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)", RegexOptions.Singleline | RegexOptions.IgnoreCase); htmlString = regexCss.Replace(htmlString, string.Empty); htmlString = Regex.Replace(htmlString, htmlTagPattern, string.Empty); htmlString = Regex.Replace(htmlString, @"^\s+$[\r\n]*", "", RegexOptions.Multiline); htmlString = htmlString.Replace(" ", string.Empty); return htmlString; }
Enjoy!!
Thanks. This was simple and helpful
how can i add spaces in the html text
add character
http://www.blackbeltcoder.com/Articles/strings/convert-html-to-text
try this
Thank you very much!
Working perfectly, thank you
thanks. this is a simple and beautiful way of extracting text from html.
When I attempt to implement this, I get a “not enough )’s” exception on the second line in the method. How to fix…?