Skip to content Skip to sidebar Skip to footer

Normalize Space Issue With Html Tags

Here's one for you XSLT gurus :-) I have to deal with XML output from a Java program I cannot control. In the docs outputted by this app the html tags remain as

Solution 1:

An XSLT 1.0 solution is an XPath expression to replace a sequence of several whitespace characters with a single one. The idea is not my own, it is taken from an answer by Dimitre Novatchev.

The advantage over the built-in normalize-space() function is that trailing whitespace (in your case, before and after the b element) is kept.

EDIT: As a response to you editing your question. Below is the said XPath expression incorporated into your stylesheet. Also:

  • Explicitly saying omit-xml-declaration="no" is redundant. It is the default action taken by the XSLT processor
  • Several of your templates have the same content. I summarized them using | to a single one.

Stylesheet

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*" />  


<xsl:template match="@*|node()">
 <xsl:copy> 
  <xsl:apply-templates select="@*|node()"/>
 </xsl:copy>
</xsl:template>


<xsl:template match="Text//*|Instruction//*|Title//*">
  <xsl:value-of select="concat('&lt;',name(),'&gt;')" />
  <xsl:apply-templates />
  <xsl:value-of select="concat('&lt;/',name(),'&gt;')" />
</xsl:template>

<xsl:template match="text()">
  <xsl:value-of select=
  "concat(substring(' ', 1 + not(substring(.,1,1)=' ')),
          normalize-space(),
          substring(' ', 1 + not(substring(., string-length(.)) = ' '))
          )
  "/>
  </xsl:template>

</xsl:stylesheet>

XML Output

<?xml version="1.0" encoding="UTF-8"?>
<Locator Precode="7">
   <Text LanguageId="7">The next word is &lt;b&gt;bold&lt;/b&gt; and is correctly spaced around the html tag, but the sentence has extra whitespace and line breaks</Text>
</Locator>

Post a Comment for "Normalize Space Issue With Html Tags"