Included are 2 very powerful functions to convert HTML to Text and Text to HTML. The functions are optimised for speed and use regular expressions to do most of the work. HTML2Text formats links to URLs, undoes all HTML char encoding (name and number codes), removes styles, scripts, and of course HTML tags. Spacing and formatting are also cleaned up. Text2HTML formats URLs to links, observes line feeds, tabs and spacing. It also does HTML encoding. An extra function to detect if HTML tags exist in the text is also included.
The HTMLText.asp file contains 3 functions. See the demo script (HTMLTextDemo.asp) to see how it works.
HTML to Text
myText = HTML2Text (myHTMLText)
Text to HTML
myHTML = Text2HTML (myText)
If you run HTML2Text, and the text does not actually contain any HTML, then all of the text formatting in the text is stripped (as HTML formatting is removed by the function). Hence, if you are unsure of the type of text you have (HTML or Text) then use this function to find out first.
if ContainsHTML (myUnknownText) then myText = HTML2Text (myUnknownText)
If you improve this code, please send me a copy! Thanks!
hunter @ beanland.net.au
1.5 HTML2Text: Added self terminating/ending tag support for br, p. Added div, tr, td to RegEx for better detection. Text2HTML: Outputs xhtml (self terminating) tags.
1.46 ContainsHTML: improved detection. Text2HTML: aware of < > around links. HTML2Text: Support tag inside a <A>...</A> tag set. Errors ignored when attempting to replace double byte chars. Changed order of stripping. Remove extra space from table conversions.
1.4 Added ContainsHTML function. HTML2Text: Changed LF to CRLF
1.3 HTML2Text: Removed spacing in HTML, support <a ...> with other tags before </a>. Text2HTML: fixed HTML encode again.
1.2 HTML2Text: Added TR = new line. Text2HTML: fixed HTML encode. added arial font
1.1 Added most functionality
1.0 First version - very basic.