Best Way To "fix" Malformed Html For Use In An Xsl Transform
I have an input xml document that contains mal-formed html which has been xml encoded. i.e. the xml document itself is technically valid. Now I am applying an xsl transform to the
Solution 1:
Is using a third party library acceptable? The HTML Agility Pack (available on NuGet) might got part of the way to solving your invalid HTML and it also (according to the website) supports XSLT.
Solution 2:
I went for using a sgml parsing library and converting to valid xml.
I went for Mind Touch's library: https://github.com/MindTouch/SGMLReader
Once compiled and added to the GAC I could use this xsl:
<msxsl:script language="C#"implements-prefix="myns">
<msxsl:assemblyname="SgmlReaderDll, Version=1.8.11.0, Culture=neutral, PublicKeyToken=46b2db9ca481831b"/>
<![CDATA[
public XPathNodeIterator SGMLStringToXml(string strSGML)
{
Sgml.SgmlReader sgmlReader = newSgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = newSystem.IO.StringReader(strSGML);
// create documentXmlDocument doc = newXmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc.CreateNavigator().Select("/*");
}
publicstringCurDir()
{
return (newSystem.IO.DirectoryInfo(".")).FullName;
}
]]>
</msxsl:script>
<xsl:templatematch="node()"mode="PreventSelfClosingTags"><xsl:copy><xsl:apply-templatesselect="@* | node()"/><xsl:text></xsl:text></xsl:copy></xsl:template><xsl:templatematch="@*"mode="PreventSelfClosingTags"><xsl:copy><xsl:apply-templatesselect="@* | node()"/></xsl:copy></xsl:template>
and use it like so:
<xsl:apply-templates select="myns:SGMLStringToXml(.)/body/*" mode="PreventSelfClosingTags"/>
N.B. You have to run the transform manually with an XslCompiledTransform
instance.
The asp:xml
control doesn't like the DLL reference.
Post a Comment for "Best Way To "fix" Malformed Html For Use In An Xsl Transform"