<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/assets/atom.xsl" type="text/xsl"?><feed
	xmlns="http://www.w3.org/2005/Atom"
	xmlns:thr="http://purl.org/syndication/thread/1.0"
	xml:lang="en-US"
	><title>Sacha Chua - tag - xslt</title>
	<subtitle>Emacs, sketches, and life</subtitle>
	<link rel="self" type="application/atom+xml" href="https://sachachua.com/blog/tag/xslt/feed/atom/index.xml" />
  <link rel="alternate" type="text/html" href="https://sachachua.com/blog/tag/xslt" />
  <id>https://sachachua.com/blog/tag/xslt/feed/atom/index.xml</id>
  <generator uri="https://11ty.dev">11ty</generator>
	<updated>2008-12-20T05:39:22Z</updated>
<entry>
		<title type="html">Summarizing my WordPress posts using XSLT; 2008 as a PDF</title>
		<link rel="alternate" type="text/html" href="https://sachachua.com/blog/2008/12/summarizing-my-wordpress-posts-using-xslt-2008-as-a-pdf/"/>
		<author><name><![CDATA[Sacha Chua]]></name></author>
		<updated>2008-12-20T10:42:32Z</updated>
    <published>2008-12-20T05:39:22Z</published>
    <category term="blogging" />
<category term="geek" />
<category term="wordpress" />
		<id>https://sachachua.com/blog/?p=5465</id>
		<content type="html"><![CDATA[<p>It&#8217;s the time of the year for annual updates. I was thinking of reviewing all the blog posts I&#8217;d written this year. My weekly and monthly posts are incomplete, though, and I want to make sure I cover everything. I also know a few people who are slowly working their way through my archives. So I thought I&#8217;d export all of my posts from 2008 into something that people can read with fewer clicks.</p>
<p>If you want to skip past all the geek details, you can get the files here: <a href="https://sachachua.com/notebook/files/sacha-chua-2008-blog.pdf">2008 blog (4.6 MB, 307 pages(!))</a>, <a href="https://sachachua.com/notebook/files/sacha-chua-2008-nongeek.pdf">2008 mostly nongeek entries (3.8 MB, 195 pages)</a>.</p>
<p>After some tinkering around with wptex and other modules that are supposed to make this easier, I gave up and decided to do it myself. I toyed with the idea of writing a short Ruby program that either parsed the XML or read the database, but I eventually ended up taking it as an excuse to learn XSLT, a language for transforming XML. WordPress can export posts and comments as XML. After I scrubbed my WordPress of spam and raised my PHP execution times, I downloaded the XML file and started figuring out how to get it into the form I wanted: a document organized by month, with a table of contents listing all the posts.</p>
<p>Here&#8217;s the main stylesheet I used:</p>
<pre>
 &lt;xsl:stylesheet version="1.0"
                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 xmlns:content="http://purl.org/rss/1.0/modules/content/"
                 xmlns:wp="http://wordpress.org/export/1.0/"&gt;
   &lt;xsl:output method="html"/&gt;
   &lt;xsl:template match="/"&gt;
     &lt;html&gt;&lt;body&gt;
       &lt;h0&gt;January 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Jan 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;February 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Feb 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;March 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Mar 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;April 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Apr 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;May 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'May 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;June 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Jun 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;July 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Jul 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;August 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Aug 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;September 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Sep 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;October 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Oct 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;November 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Nov 2008') and wp:status='publish']"/&gt;
       &lt;h0&gt;December 2008&lt;/h0&gt;
       &lt;xsl:apply-templates select="/rss/channel/item[contains(pubDate, 'Dec 2008') and wp:status='publish']"/&gt;
   &lt;/body&gt;&lt;/html&gt;
   &lt;/xsl:template&gt;
   &lt;xsl:template match="//item"&gt;
     &lt;h1&gt;&lt;a&gt;
       &lt;xsl:attribute name="href"&gt;
         &lt;xsl:value-of select="link"/&gt;
       &lt;/xsl:attribute&gt;
       &lt;xsl:value-of select="title"/&gt;&lt;/a&gt;&lt;/h1&gt;
     &lt;div class="link"&gt;&lt;xsl:value-of select="link"/&gt;&lt;/div&gt;
     &lt;div class="date"&gt;&lt;xsl:value-of select="pubDate"/&gt;&lt;/div&gt;
     &lt;div class="content"&gt;
       &lt;xsl:value-of select="content:encoded" disable-output-escaping="yes" /&gt;
     &lt;/div&gt;
   &lt;/xsl:template&gt;
 &lt;/xsl:stylesheet&gt;
</pre>
<p>For the non-geek version, I replaced the template with:</p>
<pre>
   &lt;xsl:template match="//item"&gt;
     &lt;xsl:if test="not(category[@nicename='emacs']) and not(category[@nicename='drupal']) and not(category[@nicename='geek'])"&gt;
     &lt;h1&gt;&lt;a&gt;
       &lt;xsl:attribute name="href"&gt;
         &lt;xsl:value-of select="link"/&gt;
       &lt;/xsl:attribute&gt;
       &lt;xsl:value-of select="title"/&gt;&lt;/a&gt;&lt;/h1&gt;
     &lt;div class="link"&gt;&lt;xsl:value-of select="link"/&gt;&lt;/div&gt;
     &lt;div class="date"&gt;&lt;xsl:value-of select="pubDate"/&gt;&lt;/div&gt;
     &lt;div class="content"&gt;
       &lt;xsl:value-of select="content:encoded" disable-output-escaping="yes" /&gt;
     &lt;/div&gt;
     &lt;/xsl:if&gt;
   &lt;/xsl:template&gt;
</pre>
<p>I didn&#8217;t want to figure out how to demote all the headings in my blog posts (I have a few), so I used &lt;h0&gt; as my root element. I used xsltproc to transform the XML file I got from WordPress. Then I adjusted all the headings with the following bit of Emacs Lisp:</p>
<pre>
 (defun sacha/demote-all-headings ()
  (interactive)
   (while (re-search-forward "&lt;/?h\\([1-7]\\)&gt;" nil t)
    (replace-match (number-to-string (1+ (string-to-number (match-string 1)))) nil t nil 1)))
</pre>
<p>It&#8217;s all held together with bubblegum and string, really.</p>
<p><a href="https://sachachua.com/notebook/files/sacha-chua-2008-blog.pdf">2008 blog (4.6 MB, 307 pages(!))</a>, <a href="https://sachachua.com/notebook/files/sacha-chua-2008-nongeek.pdf">2008 mostly nongeek entries (3.8 MB, 195 pages)</a></p>
<p>I haven&#8217;t looked at these files much yet &#8211; I just scrolled through them quickly. No, don&#8217;t worry, I&#8217;m not going to send my 2008 update as 307 pages in the mail. ;) But it&#8217;s there so that we can flip through it or you borrow the code, and someday I&#8217;ll even figure out how to format the output neatly and everything.</p>
<p>Next step: I need to read all of that and highlight a couple of things that made my year.</p>
<p>(307 pages! Wow.)</p>
<p>You can <a href="https://sachachua.com/blog/2008/12/summarizing-my-wordpress-posts-using-xslt-2008-as-a-pdf/#comment">view 1 comment</a> or <a href="mailto:sacha@sachachua.com?subject=Comment%20on%20https%3A%2F%2Fsachachua.com%2Fblog%2F2008%2F12%2Fsummarizing-my-wordpress-posts-using-xslt-2008-as-a-pdf%2F&body=Name%20you%20want%20to%20be%20credited%20by%20(if%20any)%3A%20%0AMessage%3A%20%0ACan%20I%20share%20your%20comment%20so%20other%20people%20can%20learn%20from%20it%3F%20Yes%2FNo%0A">e-mail me at sacha@sachachua.com</a>.</p>]]></content>
		</entry>
</feed>