I was recently pondering the best way to parse "HTML" text in JavaScript. To be more specific I was actually trying to parse an RSS feed and get to the <description> tag. I needed just the text so I could create a Silverlight TextBlock and display in my own fashion.
Here's the type of thing I was trying to parse:
<description><div><b>Title:</b> Your customers' laptop data could be at risk. Check out the Data Encryption Toolkit for Mobile PCs.</div> <div><b>Description:</b> <div>The Data Encryption Toolkit for Mobile PCs provides tested guidance and powerful tools to help you protect your customers' vulnerable laptop data. The methods outlined in the toolkit are easy to understand, and show you how to optimize your customer data protection strategy using two key encryption technologies: Microsoft BitLocker Drive Encryption and the Encrypting File System. </div></div> <div><b>URL:</b> <a href="http://go.microsoft.com/?linkid=7792650">Learn more.</a></div> </description>
Not the most elegant string to parse! So I started off down various roads, String.Replace(), RegEx etc. None of which seemed too great a solution, but probably not much else to go at in JavaScript.
Then I started to remember that when I looked at the HTML DOM objects in the Studio, the innerText property of any element I looked did exactly what I was after. So a few lines of code later I had what is probably the simplest solution going! Create a new DOM element, set the innerHTML using my text, and then reference the innerText.
Where description contains the html text above:
var descNode = document.createElement("div");
descNode.innerHTML = description);
this._description = descNode.innerText;