Normalize Space Issue With Html Tags
Here's one for you XSLT gurus :-) I have to deal with XML output from a Java program I cannot control. In the docs outputted by this app the html tags remain as
Solution 1:
An XSLT 1.0 solution is an XPath expression to replace a sequence of several whitespace characters with a single one. The idea is not my own, it is taken from an answer by Dimitre Novatchev.
The advantage over the built-in normalize-space()
function is that trailing whitespace (in your case, before and after the b
element) is kept.
EDIT: As a response to you editing your question. Below is the said XPath expression incorporated into your stylesheet. Also:
- Explicitly saying
omit-xml-declaration="no"
is redundant. It is the default action taken by the XSLT processor - Several of your templates have the same content. I summarized them using
|
to a single one.
Stylesheet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Text//*|Instruction//*|Title//*">
<xsl:value-of select="concat('<',name(),'>')" />
<xsl:apply-templates />
<xsl:value-of select="concat('</',name(),'>')" />
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select=
"concat(substring(' ', 1 + not(substring(.,1,1)=' ')),
normalize-space(),
substring(' ', 1 + not(substring(., string-length(.)) = ' '))
)
"/>
</xsl:template>
</xsl:stylesheet>
XML Output
<?xml version="1.0" encoding="UTF-8"?>
<Locator Precode="7">
<Text LanguageId="7">The next word is <b>bold</b> and is correctly spaced around the html tag, but the sentence has extra whitespace and line breaks</Text>
</Locator>
Post a Comment for "Normalize Space Issue With Html Tags"