webstuff:digiTransformer:v1

XSLT Phone Number Transformation Tool

This is the first version, I made around august'04. Archived here: most of the code explanations are still valid for version 2. The second version adds a dynamic user-interface.

Some time ago, someone on the xsl-list mentioned a coding challenge called Phraser. I was inspired by Zhanyong Wan's notes on the Haskell code he wrote. I am currently learning XSLT, which is (like Haskell) a functional programming language, and I couldn't resist trying to "port" Phraser to XSLT.

The goal was: meet the guidelines as much as possible ("The merit of an entry will be judged mainly by how clear the code is, not necessarily the efficiency") and produce a "client-side transformation" stylesheet in standard XSLT: no compiler, library or whatever needed, just run the program in your favorite browser. XSLT 1.0 is fully supported in IE 6, NS 7 and the Mozilla type browsers (Mozilla 1.7, Firefox 0.9).

demo

If your browser can't handle XSLT, you can check out the output here.

Running the stylesheet locally

Input

input.xml

This is our input file. The first line asks the browser to use the digiTransformer.xsl stylesheet to transform the document:

<?xml-stylesheet type="text/xsl" href="digiTransformer.xsl"?>

Like any XML document, the file contains one root element (digiTransformer). It accepts an optional maxDigits attribute, a dial number and a pointer to a dictionary.

<digiTransformer maxDigits="3">
  <dial>642-394-6369</dial>
  <dictionary href="english.xml"/>
</digiTransformer>

english.xml

This is our dictionary, and as such, a second input file. I used the same Basic English wordlist as the other submissions, with 4 words added.

<dictionary name="english">
  <source href="http://www.langmaker.com/wordlist/basiclex.htm">Basic English Words List</source>
  <note>I added the words fox, is, nice, win</note>
  <w>a</w>
  <w>able</w>
  ...
  <w>young</w>
</dictionary>

I had to convert the text file to XML, because XSLT is designed to work best with XML files. But you will see, that the combination of XML, XSLT and XPath is very powerful and flexible!

Transformation

What XSLT does, basically, is transforming one XML node into another XML node. In this case, the digiTransformer element from the input file is transformed to a complete XHTML document.

digiTransformer.xsl

This is the stylesheet that generates all possible "phrases" and writes the results to XHTML.

In XSLT, source code is organised in an XML document: every statement is an XML node. The root element xsl:stylesheet contains a number of variables and templates. A template, in most cases, just takes some XML as input, and transforms it to some other XML. Output can be plain text, simple xml, html, whatever. By using namespaces, several types of output can be combined.

I have added very little comments to the xsl. I always try to use short descriptive names for variables and functions (here: templates). Ideally, by just reading the code, you should be able to understand what happens. Let's take a look at the code, I will give step-by-step explanations here:

<xsl:stylesheet
  version="1.0"
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  >

We will use XSLT 1.0 with two namespaces: the (default) xhtml and the xsl namespace. All XML nodes you will see in this stylesheet, are either an XHTML output element (without a namespace prefix) or an XSL instruction (name starts with "xsl:").

The first child element of xsl:stylesheet is xsl:output. Here, I specify that I want to generate XHTML 1.0 Strict:

<xsl:output
  method="xml"
  version="1.0"
  encoding="iso-8859-1"
  indent="yes"
  doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
  doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
  />

Then I declare a few global variables. In XSLT (like in other functional programming languages), the value of a variable cannot be changed once it's declared.

<xsl:variable name="upper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="lower" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="digit" select="'22233344455566677778889999'"/>

The scope of a variable is its direct parent. Because in this case, that's the root element, these variables can be used throughout the stylesheet. You 'use' the variable by putting a dollar sign in front of its name, like "$upper". Note the double brackets in the select attribute: this to indicate that the variable holds a string.

These three variables will be used to convert strings. In XSLT 1.0, we have a limited set of functions at hand. Eg. there are no case conversion functions (tolower, toupper...) but we can use the translate function: translate($string, $upper, $lower) will convert the string to lower case. We can use the same mechanism to "digitize" or "hash" the string: translate($string, $lower, $digit). Now isn't that beautifully simple?

Next, we collect the information provided in the input file, and put it into variables:

<xsl:variable name="maxDigits">
  <!-- if it's provided in the input file, read it, otherwise take -1 (means no max) -->
  <xsl:choose>
    <xsl:when test="string(/digiTransformer/@maxDigits)">
      <xsl:value-of select="number(/digiTransformer/@maxDigits)"/>
    </xsl:when>
    <xsl:otherwise>-1</xsl:otherwise>
  </xsl:choose>
</xsl:variable>

<xsl:variable name="dial" select="translate(/digiTransformer/dial/.,'-/.() ','')"/>
<xsl:variable name="dictionary" select="document(/digiTransformer/dictionary/@href)/dictionary"/>

/digiTransformer/@maxDigits is XPath for "the 'digiTransformer' root element's 'maxDigits' attribute".

The dial variable is a copy of the dial input element, containing only the digits (by stripping out the characters from '-/.() '). The dictionary variable holds the contents of the external dictionary file. In XSLT, you access external files with the document function: document($fileName).

In the next variable, I create a subset of the words in the dictionary:

<xsl:variable name="words" select="$dictionary/w[contains($dial, translate(translate(., $upper, $lower), $lower, $digit))]"/>

The contents of the select attribute is an XPath statement that reads: all 'w' elements in the dictionary whose 'digitized' value matches any part of the phonenumber. In our example, we now have a subset of 15 words to iterate, and we can forget about the 850+ words in the dictionary!

Now the core algorithm... let's collect all possible phrases. I use recursion:

<xsl:template name="collect-phrases">
  <xsl:param name="index" select="1"/>
  <xsl:param name="phrase" select="''"/>
  <xsl:param name="countDigits" select="0"/>
  <xsl:choose>
    <xsl:when test="$index > string-length($dial)">
      <!-- complete phonenumber has been processed: write this phrase to the output tree -->
      <li><xsl:value-of select="$phrase"/></li>
    </xsl:when>
    <xsl:otherwise>
      <xsl:if test="$maxDigits < 0 or $countDigits < $maxDigits">
        <!-- simply use the digit as a 'word' in the phrase and continue with the next digit -->
        <xsl:call-template name="collect-phrases">
          <xsl:with-param name="index" select="$index + 1"/>
          <xsl:with-param name="phrase" select="concat($phrase, substring($dial, $index, 1))"/>
          <xsl:with-param name="countDigits" select="$countDigits + 1"/>
        </xsl:call-template>
      </xsl:if>
      <xsl:for-each select="$words">
        <xsl:if test="($index + string-length(.) - 1) <= string-length($dial)">
          <xsl:variable name="value" select="translate(translate(., $upper, $lower), $lower, $digit)"/>
          <xsl:if test="starts-with(substring($dial, $index), $value)">
            <!-- match found: fill in word and recurse into the rest of the string -->
            <xsl:call-template name="collect-phrases">
              <xsl:with-param name="index" select="$index + string-length(.)"/>
              <xsl:with-param name="phrase" select="concat($phrase, .)"/>
              <xsl:with-param name="countDigits" select="$countDigits"/>
            </xsl:call-template>
          </xsl:if>
        </xsl:if>
      </xsl:for-each>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

A bit further, we will call this template with <xsl:call-template name="collect-phrases"/> which means: without parameters passed: this will invoke their default values and start the recursion process. I borrowed the algorithm from Frans Bouma's C# program. I just renamed the parameters and switched the 'use digit' and 'use word' blocks around, to obtain a sorted result (digits first, then the words). Note the default value for index being 1 rather than 0, because in XSLT/XPath, arrays are numbered starting from 1.

The result is a bunch of phrases each inside an XHTML <li> element, because in the next template, they will be put inside a <ul> element, as part of the html output document.

The rest of the stylesheet is all about generating the output file:

<xsl:template match="/digiTransformer">
  <!-- input root element: start generating xhtml output -->
  <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
    <head>
      <title>digiTransformer</title>
      <meta http-equiv="content-type" content="text/html;charset=iso-8859-1"/>
      <meta name="generator" content="digiTransformer XSLT stylesheet"/>
      <meta name="author" content="Anton Triest [anton@cking.be]"/>
      <link rel="stylesheet" type="text/css" href="digiTransformer.css"/>
    </head>
    <body>
      <h1>digiTransformer</h1>
      <p class="larger"><a href="http://users.telenet.be/cking/webstuff/digiTransformer/v1/info.html" title="more info">An XSLT Phone Number Transformation Tool</a></p>
      <p>
        This is <a href="mailto:anton@cking.be" title="anton@cking.be">my</a> solution to Zhanyong Wan's
        <a href="http://blogs.msdn.com/the1/archive/2004/03/29/102123.aspx">Phraser</a> programming challenge.
      </p>
      <h2>input</h2>
      <xsl:apply-templates select="dial"/>
      <xsl:apply-templates select="@maxDigits"/>
      <xsl:apply-templates select="dictionary"/>
      <h2>output</h2>
      <xsl:call-template name="print-words"/>
      <ul class="phrases">
        <xsl:call-template name="collect-phrases"/>
      </ul>
    </body>
  </html>
</xsl:template>

<xsl:template match="dial">
  <p>dial: <span class="dial"><xsl:value-of select="."/></span></p>
</xsl:template>

<xsl:template match="@maxDigits">
  <p>maxDigits: <b><xsl:value-of select="."/></b></p>
</xsl:template>

<xsl:template match="dictionary">
  <p>dictionary: <a href="{@href}"><xsl:value-of select="@href"/></a>
  (<xsl:value-of select="count($dictionary/w)"/> words)</p>
</xsl:template>

<xsl:template name="print-words">
  <p>
    subset:
    <b>
      <xsl:for-each select="$words">
        <xsl:value-of select="."/> 
      </xsl:for-each>
    </b>
    (<xsl:value-of select="count($words)"/> words)
  </p>
</xsl:template>

digiTransformer.css

This is the CSS stylesheet that handles the HTML layout.

Output

As already mentioned, any modern "standards compliant" browser should be able to process the stylesheet and generate the html output document. Of course, instead of a browser, any XSLT processor can be used. Here's a html file that I generated with Saxon 6.5.3:

output.html