<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>kushalm.com &#187; XML</title>
	<atom:link href="http://kushalm.com/category/programming/xml/feed" rel="self" type="application/rss+xml" />
	<link>http://kushalm.com</link>
	<description></description>
	<lastBuildDate>Wed, 08 Sep 2010 18:37:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The Perils of XPath Expressions (Specifically, Escaping Quotes)</title>
		<link>http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes</link>
		<comments>http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes#comments</comments>
		<pubDate>Thu, 28 Jun 2007 22:26:15 +0000</pubDate>
		<dc:creator>kushal</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes</guid>
		<description><![CDATA[Escaping a single/double quote in an XPath expression such as this: "books/book[@publisher = 'publisher name here']";]]></description>
			<content:encoded><![CDATA[<p><P><br />
The other day, I was grappling with a particularly irritating problem with XPaths. I was using <a href="http://msdn2.microsoft.com/en-us/library/system.xml.xmlnode.selectsinglenode.aspx">SelectSingleNode</a> to dig some info out of an XML document.<br />
</P></p>
<h3>The problem:</h3>
<p>&#8230; was simple. Escaping a single/double quote in an XPath expression such as this:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;books/book[@publisher = 'publisher name here']&quot;</span><span style="color: #008000;">;</span></pre></div></div>

<p>If the publisher name were to have an apostrophe in it (e.g. <span class="km_code">O&apos; Reilly</span>) I&#8217;d be in trouble.</p>
<h3>Lazy Hack #1:</h3>
<p>The simple, straightforward solution would be the following:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;books/book[@publisher = <span style="color: #008080; font-weight: bold;">\&quot;</span>O'Reilly<span style="color: #008080; font-weight: bold;">\&quot;</span>]&quot;</span><span style="color: #008000;">;</span></pre></div></div>

<p>&#8230; i.e. enclose the <a href="http://www.w3.org/TR/xpath#NT-PredicateExpr">PredicateExpr</a> in double quotes instead of single quotes.<br />
But of course as is often the case, words like &quot;simple&quot; and &quot;straightforward&quot; are merely a replacement for words like &quot;short-sighted&quot;.<br />
<br />
The problem with that solution of course, was what if that blasted <span class="km_code">publisher name</span> had a double quote in it?<br />
Would I go back to enclosing it in single quotes? What if it had both? What if I simply didn&#8217;t know, and I was building up the string like this:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;books/book[@publisher = '&quot;</span> <span style="color: #008000;">+</span> publisherName <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;']&quot;</span><span style="color: #008000;">;</span></pre></div></div>

<p>.. assuming <span class="km_code">publisherName</span> was a user-entered string I had no control over. (which was in fact, the case)</p>
<h3>Lazy Hack #2:</h3>
<p>I could of course, wimp out and prevent the user from entering double or single quotes (or worse, both). I could even rationalise it by pretending this was really because I was thinking of the &quot;bigger picture&quot; and that resources and time aren&#8217;t really worth fixing this issue. But I decided not to. Mostly because its irritating enough listening to pseudo-managerial-cop-out-speak when it isn&#8217;t coming from me; I really didn&#8217;t need to add to it.</p>
<h3>Wrong Solution <strike>Lazy Hack</strike> #3:</h3>
<p>My first thought was that I should replace single quotes with &amp;apos; (or its hex equivalent &amp;#39;) and double quotes with &amp;quot; (or &amp;#34;) according to the XML 1.0 <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#syntax">markup rules</a>. That should have worked right?<br />
<br />
But apparently that isnt the case. Even though the guys at W3C <a href="http://www.w3.org/TR/xpath">recommend</a> it.<br />
<br />
It turns out that I didn&#8217;t need to escape any of the standard XML <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#dt-entref">entities</a><sup><a href="#fn1-28Jun07">1</a></sup> in my XPath query at all. (Even though I positively <I>do</I> need to do this in my XML markup)<br />
<br />
So not only is this a valid XPath expression:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;tvshows/tvshow[@name = 'Starsky &amp; Hutch']&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #008080; font-style: italic;">//no need to use &amp;amp; in place of ampersand.</span></pre></div></div>

<p>
&#8230; but also this would <I>not</I> return the result I would expect:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;tvshows/tvshow[@name = 'Starsky &amp;amp; Hutch']&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #008080; font-style: italic;">// this will *not* return the tvshow node with an attribute</span>
    <span style="color: #008080; font-style: italic;">//called &quot;Starsy &amp; Hutch&quot;</span></pre></div></div>

<h3>Solution:</h3>
<p>It turned out the only solution was to use the <a href="http://www.w3.org/TR/xpath#function-concat">concat function</a> defined in the W3C XPath recommendation.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;books/book[@publisher = &quot;</span> <span style="color: #008000;">+</span>
   <span style="color: #666666;">&quot;concat('Single', &quot;</span><span style="color: #666666;">'&quot;, '</span>quote. <span style="color: #FF0000;">Double</span><span style="color: #666666;">', '</span><span style="color: #666666;">&quot;', 'quote.')]&quot;</span><span style="color: #008000;">;</span>
   <span style="color: #008080; font-style: italic;">//looks for a publisher called Single'quote. Double&quot;quote</span></pre></div></div>

<p>i.e. break up my search string around single and double quotes, and concatenate all the bits using this concat function (it takes a variable number of string arguments) &#8211; thereby enclosing the single quotes in double quotes, and the double quotes in single quotes.<br />
<br />
Pretty crazy, huh? BTW, this is true in .Net, Java<sup><a href="#fn2-28Jun07">2</a></sup>, Mozilla&#8217;s implementation of XPaths, as well as Internet Explorer&#8217;s. (In IE, you would be using the MSXML parser. More on this below).<br />
<br />
So, since I was building up a string like this:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span>
    <span style="color: #666666;">&quot;books/book[@publisher = '&quot;</span> <span style="color: #008000;">+</span> publisherNameHere <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;']&quot;</span><span style="color: #008000;">;</span></pre></div></div>

<p>I had no alternative but to write a method that would generate the required concat function call for me. i.e.:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">string</span> myXPathExpression <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;books/book&quot;</span> <span style="color: #008000;">+</span>
  <span style="color: #666666;">&quot;[@publisher = &quot;</span> <span style="color: #008000;">+</span> GenerateConcatForXPath<span style="color: #000000;">&#40;</span>publisherNameHere<span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;]&quot;</span><span style="color: #008000;">;</span></pre></div></div>

<p>Here is the method written in C#. </p>
<div class="km_collapsible">
    <a name="GenerateConcatForXPath" href="#GenerateConcatForXPath" onclick="km_collapse(this);return false;" title="GenerateConcatForXPath" border="0"><img src="/images/plus.gif" height="19" width="20" border="0"></img>GenerateConcatForXPath</a>
<div class="km_collapsible_content">

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #008080; font-style: italic;">//you may want to use constants like HtmlTextWriter.SingleQuoteChar and</span>
<span style="color: #008080; font-style: italic;">//HtmlTextWriter.DoubleQuoteChar intead of strings like &quot;'&quot; and &quot;\&quot;&quot;</span>
<span style="color: #0600FF;">private</span> <span style="color: #0600FF;">static</span> <span style="color: #FF0000;">string</span> GenerateConcatForXPath<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">string</span> a_xPathQueryString<span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #FF0000;">string</span> returnString <span style="color: #008000;">=</span> <span style="color: #FF0000;">string</span>.<span style="color: #0000FF;">Empty</span><span style="color: #008000;">;</span>
    <span style="color: #FF0000;">string</span> searchString <span style="color: #008000;">=</span> a_xPathQueryString<span style="color: #008000;">;</span>
    <span style="color: #FF0000;">char</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> quoteChars <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">char</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">'<span style="color: #008080; font-weight: bold;">\'</span>'</span>, <span style="color: #666666;">'&quot;'</span> <span style="color: #000000;">&#125;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #FF0000;">int</span> quotePos <span style="color: #008000;">=</span> searchString.<span style="color: #0000FF;">IndexOfAny</span><span style="color: #000000;">&#40;</span>quoteChars<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>quotePos <span style="color: #008000;">==</span> <span style="color: #008000;">-</span><span style="color: #FF0000;">1</span><span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        returnString <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;'&quot;</span> <span style="color: #008000;">+</span> searchString <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;'&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
    <span style="color: #0600FF;">else</span>
    <span style="color: #000000;">&#123;</span>
        returnString <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;concat(&quot;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">while</span> <span style="color: #000000;">&#40;</span>quotePos <span style="color: #008000;">!=</span> <span style="color: #008000;">-</span><span style="color: #FF0000;">1</span><span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            <span style="color: #FF0000;">string</span> subString <span style="color: #008000;">=</span> searchString.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span><span style="color: #FF0000;">0</span>, quotePos<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            returnString <span style="color: #008000;">+=</span> <span style="color: #666666;">&quot;'&quot;</span> <span style="color: #008000;">+</span> subString <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;', &quot;</span><span style="color: #008000;">;</span>
            <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span>searchString.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span>quotePos, <span style="color: #FF0000;">1</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">==</span> <span style="color: #666666;">&quot;'&quot;</span><span style="color: #000000;">&#41;</span>
            <span style="color: #000000;">&#123;</span>
                returnString <span style="color: #008000;">+=</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span>'<span style="color: #008080; font-weight: bold;">\&quot;</span>, &quot;</span><span style="color: #008000;">;</span>
            <span style="color: #000000;">&#125;</span>
            <span style="color: #0600FF;">else</span>
            <span style="color: #000000;">&#123;</span>
                <span style="color: #008080; font-style: italic;">//must be a double quote</span>
                returnString <span style="color: #008000;">+=</span> <span style="color: #666666;">&quot;'<span style="color: #008080; font-weight: bold;">\&quot;</span>', &quot;</span><span style="color: #008000;">;</span>
            <span style="color: #000000;">&#125;</span>
            searchString <span style="color: #008000;">=</span> searchString.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span>quotePos <span style="color: #008000;">+</span> <span style="color: #FF0000;">1</span>,
                             searchString.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">-</span> quotePos <span style="color: #008000;">-</span> <span style="color: #FF0000;">1</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            quotePos <span style="color: #008000;">=</span> searchString.<span style="color: #0000FF;">IndexOfAny</span><span style="color: #000000;">&#40;</span>quoteChars<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #000000;">&#125;</span>
        returnString <span style="color: #008000;">+=</span> <span style="color: #666666;">&quot;'&quot;</span> <span style="color: #008000;">+</span> searchString <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;')&quot;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
    <span style="color: #0600FF;">return</span> returnString<span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

</div>
</div>
<h3>The Exception (there&#8217;s always one):</h3>
<p>Microsoft&#8217;s <a href="http://msdn2.microsoft.com/en-us/library/ms763742.aspx">MSXML</a> parser (the COM implementation, not the .Net one &#8211; and they <I>are</I> different) is still widely in use. Mostly in Visual Studio 6 based apps (like VB6), on apps with client-side XML processing done on IE, and those glorified batch files written in <a href="http://msdn2.microsoft.com/en-us/library/ms950396.aspx">Windows Scripting Host</a>. Also, there are probably more than a few .Net apps using MSXML via the COM Interop Services.<br />
<br />
This problem of escaping quotes exists for MSXML too of course, and the solution is the same &#8211; but only for MSXML4 and later. For versions 3 and before, you would have to escape single and double quotes with C-style backslashes.<br />
This naturally also means that you would have to escape backslashes themselves with two backslashes &#8211; something you need to be aware of if you are porting your application from MSXML 1, 2 or 3 to anything later than that.</p>
<p>Sigh! Sometimes I miss the old XPath-free days when shoot&#8217;em ups were still innovative, they actually ran on two megabytes of RAM, and no-one had heard of Paris Hilton. </p>
<div class="km_footnotes">
<div class="km_footnote">
<a name="fn1-28Jun07">1</a> Predefined XML Entities: &amp;, &lt;, &gt;, &quot; and &apos;
</div>
<div class="km_footnote">
<a name="fn2-28Jun07">2</a> XPaths in Java: I tested it using Apache&#8217;s <a href="http://xml.apache.org/xalan-j/">Xalan</a> XSLT Processor. And using the <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/XPath.html#compile(java.lang.String)">compile</a> method which of course adheres to Sun&#8217;s <a href="http://java.sun.com/webservices/jaxp/">JAXP</a> specification.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes/feed</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>

