My Life In Plain Text

This is the first in what I intend to be a series of articles about the various ways I use Org Mode in my life, naturally starting out with how it serves as the basis for this website. It's a bit of a long read for something so niche, but if you're curious to learn something about the capabilities and internals of Org Mode or you just enjoy taking something simple and making it hard for the fun of it (and dare I say profit?), read on!

Why Org Mode?

I've used Emacs as my primary editor more or less as long as I've been programming, and it continues to embed itself into more aspects of my life thanks to its extensive capabilities beyond just editing code ¹ . Of particular note is Org Mode, an Emacs major mode aptly described by its tag line: "Your life in plain text". At its simplest Org is a text format useful for structured documents, like a Markdown alternative that happens to have good editing and navigation support in Emacs. Under the surface however, it hosts a dizzying array of features including:

I think Org is uniquely well-suited as a text format for static site generation, particularly for technical content. First, the format itself is much more expressive than something like Markdown - you can for example attach arbitrary metadata to headings with properties or wrap content with drawers or blocks to signal that it should be treated specially on export. Consider for example a feature encountered fairly often in blogs and technical documentation: callouts. We can represent callouts in an org-mode document with a custom block type and pass in a parameter to signal the nature of the callout and inform its styling. Here's what that looks like in Org syntax:

org

#+begin_callout info
Blocks in Org Mode are defined like this:

#+begin_src org
#+begin_name
...
#+end_name
#+end_src

where =name= is any string of nonwhitespace characters. Some types of blocks
have a special significance in Org mode, but "callout" isn't one of them. As
we'll see later though, the workflow I've set up for this blog makes it very
easy to attach significance to new block types when we ingest Org content into
Astro.
#+end_callout

Blocks in Org Mode are defined like this:

org

#+begin_name
...
#+end_name

where name is any string of nonwhitespace characters. Some types of blocks have a special significance in Org mode, but "callout" isn't one of them. As we'll see later though, the workflow I've set up for this blog makes it very easy to attach significance to new block types when we ingest Org content into Astro.

Granted, you could pretty easily come up with a scheme to represent something like this with a custom Markdown extension. Obsidian, for instance, supports callouts using block quotes that start with a specially-formatted first line as described here. As far as I know however, this isn't supported anywhere outside of Obsidian and there's no official open source Obsidian-flavored Markdown parser, so you're on your own if you want to use it anywhere else. Compare that to our Org Mode example. When we use Org Mode's export functionality, the block above is provided to the exporter as an AST node that contains all the information we need in a structured format:

emacs-lisp

;; You can inspect the AST representation of any element in an Org Mode document
;; by moving the cursor to that element and running (org-element-at-point)
'(special-block (... :type "callout" :parameters "info"))

Not Your Father's Source Blocks

Next let's look at how Org Mode deals with source code. Org Mode comes with a framework for executing source code called Babel. It supports a variety of different use cases, but to highlight some of the features that feel particularly useful to me for writing a technical blog:

Here's a motivating example that showcases a few of Babel's features: let's use source blocks to demonstrate how the Monte Carlo method can be used to estimate the value of π. I'll include both the Org Mode syntax for each code block (in a separate code block labeled "Org") and the result when the code block is rendered, explaining the nontrivial bits of syntax as we go.

Monte Carlo methods refer to a broad class of algorithms that use random sampling to solve a mathematical problem. As a simple example, consider that we can estimate the value of π by randomly generating points in the range [(0,0), (1,1)] and calculating how many of them fall inside a circle with radius 1.

org

#+name: generate_data
#+begin_src ruby :results value table :exports code :var sample_size=100
  srand(123) # fix the random seed so we get consistent results every time we run this
  (0..sample_size).map do |_x|
    point = [rand(), rand()]
    [point[0], point[1], Math.sqrt(point[0]**2 + point[1]**2) <= 1 ? 0 : 1]
  end
#+end_src

We're using a few header arguments here in the source block that are worth explaining:

generate_data

ruby

srand(123) # fix the random seed so we get consistent results every time we run this
(0...samples).map do |_x|
  point = [rand(), rand()]
  [point[0], point[1], Math.sqrt(point[0]**2 + point[1]**2) <= 1 ? 0 : 1]
end

Now we can use the random samples to estimate the value of π with the formula:

π \approx 4 \frac{r}{n}

where r is the number of points inside the circle of radius 1 and n is the total number of points.

org

#+name: monte_carlo_pi_estimate
#+headers: :results value table :exports both
#+headers: :var data_10=generate_data(samples=10)
#+headers: :var data_100=generate_data(samples=100)
#+headers: :var data_1000=generate_data(samples=1000)
#+begin_src ruby
  [data_100, data_1000].map do |data|
    points_inside = data.select { |point| point[2] == 0 }
    pi_estimate = 4.0 * points_inside.length / data.length
    [data.length, pi_estimate]
  end
#+end_src

monte_carlo_pi_estimate

ruby

estimates = [data_10, data_100, data_1000].map do |data|
  points_inside = data.select { |point| point[2] == 0 }
  pi_estimate = 4.0 * points_inside.length / data.length
  [data.length, pi_estimate]
end

# note: the `nil` here causes Org to add a horizontal line below the previous
# row when it renders the table, so it's visually clear that the first row
# should be treated as the table header
[["samples", "estimate"], nil] + estimates

samples	estimate
10	3.6
100	3.36
1000	3.14

org

#+name: alt_text
#+begin_src emacs-lisp :var data="" text=""
  (format "#+attr_alt_text: %s\n%s" text data)
#+end_src

#+headers: :results output :file ../assets/monte_carlo.svg :exports results
#+headers: :var data_10=generate_data(samples=10)
#+headers: :var data_100=generate_data(samples=100)
#+headers: :post alt_text(data=*this*, text="2D plots of the data from the generate_data code block")
#+begin_src gnuplot
  set terminal svg size 600,300
  set nokey

  set palette maxcolors 2
  set palette defined (0 "green", 1 "red")
  unset colorbox

  set multiplot layout 1,2 title "Estimating Pi with Monte Carlo method"

  set title "10 Samples"
  set object 1 ellipse center 0,0 size 2,2
  plot data_10 using 1:2:3 with points palette

  set title "100 Samples"
  set object 2 ellipse center 0,0 size 2,2
  plot data_100 using 1:2:3 with points palette

  unset multiplot
#+end_src

Again we're invoking the generate_data code block to pass the data into gnuplot - the Ruby array from that block is magically transformed into a gnuplot variable! We also use :results output :file ./images/monte_carlo.svg to tell Babel to use the output of the code block as the result and to write it to a file. When we export this to JSON, the code itself is omitted but a link to the generated SVG is included (and then translated into an <img /> during the Astro build). Finally we use the :post header to post-process the outputted link, attaching an attribute to indicate what to set the alt text to in the exported HTML.

Pretty neat! With very little effort, we've got Ruby, Emacs Lisp and Gnuplot code working in harmony.

Liftoff

Now that we've covered the motivation for writing content with Org Mode (perhaps you're not convinced, but I've talked myself into it in any case), let's move on to how we're going to publish that content on the Internet in a format your browser can actually comprehend. Org Mode does come with an HTML exporter so in theory this could be as simple as running org-html-export-to-html . Out of the box this gives us a pretty raw but functional translation of an Org document, and we could slap some CSS on that to make it prettier and call it a day. We can even use Org Mode's publishing framework to publish our entire site at once with some additional niceties like an automatically-generated sitemap. There's a deeper problem, however, in that as soon as we want to customize our site in a way that deviates from the assumptions of the exporter we're probably going to have to write a bunch of Emacs Lisp code.

Consider for instance that we might want to provide pages like /tags/[tag] that list all articles with the given tag. There's no builtin feature for that, so we'd either need to manually maintain a separate Org document for every tag or programmatically generate them. Now, maybe you're thinking it's a bit odd for a so-called Emacs enthusiast to complain about having to write some Lisp code but for me it's a matter of ergonomics and using the right tool for the job. This is of course a matter of taste, but when I look at the Org HTML exporter code compared to Astro's syntax it's clear which one I personally find to be a more ergonomic way to generate a website. The HTML exporter largely consists of a collection of functions that take different types of Org elements as arugments and use string interpolation to craft a string that represents hopefully-valid HTML. In my experience, this kind of approach to generating markup tends to be finnicky compared to using a proper templating system where it's easier to tell at a glance what the output will look like and where tooling can often provide a better editing experience.

With all that in mind, I decided to take the approach of exporting from Org to JSON and then ingesting the result into Astro. I chose JSON as the export format because just about every static site generator is able to import JSON and turn it into HTML, and it should be easy to translate from Org to JSON without losing fidelity. I've never used Astro prior to this, but it stood out to me primarily because of its flexibility and ergonomics. If you just need a static site (as is the case for this site as of writing), Astro generates plain HTML with no Javascript by default. When you do need to sprinkle in some interactivity, it's easy to do so using the framework of your choice and in a way that doesn't affect the load times of the static parts of your site (thanks to what Astro refers to as Islands Architecture).

While there's no builtin Org exporter for JSON, I found a project that seemed to fit the bill - ox-json. Initially, it produced JSON output that pretty faithfully captured the full AST of an Org document but it had a few obvious issues:

org

#+title: Will it JSON?
#+date: <2025-03-06 Thu>

* Heading

** Nested Heading
Hopefully this gets exported properly. Let's test source blocks too for good measure:

#+begin_src emacs-lisp
  (+ 2 2)
#+end_src

json

{
    "$$data_type": "org-document",
    "properties": {
        "title": [
            "Will it JSON?"
        ],
        ...,
    },
    "contents": [
        {
            "$$data_type": "org-node",
            "type": "section",
            "ref": "org4b27978",
            "properties": {
                "standard-properties": {
                    "$$data_type": "error",
                    "message": "Don't know how to encode value [1 1 1 50 50 0 nil first-section nil nil nil 1 50 nil #<buffer test.org<tom><2>> ..."
                }
            }
        },
        ...,
        {
            "$$data_type": "org-node",
            "type": "src-block",
            "ref": "org01e6955",
            "properties": {
                "language": "emacs-lisp",
                ...,
                "value": {
                    "$$data_type": "error",
                    "message": "Expected string or symbol, got [org-element-deferred org-element--unescape-substring (23 31) t]"
                }
            }
        }
    ]
}

That's not quite what we were looking for. First, every node in the exported JSON seems to have a standard-properties property that ox-json is unable to encode as JSON. Worse still, the error message includes a string representation of the full value which among other things includes the AST of the entire document! The second issue is that the value of the source block seems to have the type org-element-deferred and the actual contents of the block are missing from the JSON output.

After a little digging, I discovered that both of these issues were related to the same Org Mode commit. As the commit message explains, some properties that are common to most AST nodes are now stored in an array under standard-properties . Separately, some values are now deferred and computed on-demand when accessed - so apparently we must be accessing those values in the wrong way. Fortunately the Org Mode changelog explained both of these changes and the way to handle them clearly:

Previously, one could rely on the fact that every node in an Org AST had the structure (type properties-plist contents-list) and inspect properties-plist directly to get all the node's property values, but now some of those might be deferred and others have been consolidated into an array for performance reasons ³ . Indeed, this is exactly what the code in ox-json was relying on:

emacs-lisp

(defun ox-json-node-properties (node)
  "Get property plist of element/object NODE."
   ; It's the 2nd element of the list
  (cadr node))

Let's try following the recommendation in the changlog and use org-element-properties-map to iterate over a node's properties:

emacs-lisp

(defun ox-json-node-properties (node)
  "Get property plist of element/object NODE."
  (if (fboundp 'org-element-properties-map)
      (let ((expanded-properties nil))
        (org-element-properties-map
         (lambda (name value)
           (setq expanded-properties (plist-put expanded-properties name value)))
         node t) ; the last argument is a flag that will cause deferred properties to be resolved
        expanded-properties)
    ;; for org versions < 9.7, just return the property list, which is the second
    ;; element of the list
    (cadr node)))

json

{
    "$$data_type": "org-document",
    "properties": {
        "title": [
            "Will it JSON?"
        ],
        ...,
    },
    "contents": [
        {
            "$$data_type": "org-node",
            "type": "section",
            "ref": "org4b27978",
            "properties": {
                "post-affiliated": 1,
                "post-blank": 0,
                ...,
            }
        },
        ...,
        {
            "$$data_type": "org-node",
            "type": "src-block",
            "ref": "org01e6955",
            ...,
            "properties": {
                "language": "emacs-lisp",
                ...,
                "value": "(+ 2 2)\n"
            }
        }
    ]
}

This won't be the end of our modifications to ox-json, but at this point we've got something workable so let's move on to ingesting it into Astro.

Org Mode in Space

My plan at a high level is to define an Article component to represent a full Org document that recursively descends through its child nodes, with a roughly one-to-one mapping between node types and Astro components. Since the Org AST is also pretty complicated and doesn't have any official formal specification, I'm also going to define TypeScript types as I flesh out these components to document that structure and hopefully make my life easier down the road.

/src/pages/index.astro

web

---
import ArticlePreview from '@components/ArticlePreview.astro';
import type { OrgDocument } from 'ox-json-types';
import BaseLayout from 'src/layouts/BaseLayout.astro';
const articles = import.meta.glob<OrgDocument>('../articles/*.json', { eager: true });
---

<BaseLayout>
    <h1>Posts</h1>
    {
        Object.values(articles).map((article) => (
            <ArticlePreview article={article} />
        ))
    }
</BaseLayout>

/src/pages/articles/[path].astro

web

---
import type { OrgDocument } from 'ox-json-types';
import { articlePath } from '@lib/org';
import Article from '@components/Article.astro';
import BaseLayout from 'src/layouts/BaseLayout.astro';

interface Props {
    path: string,
    article: OrgDocument,
}


export function getStaticPaths() {
    const articles = import.meta.glob<OrgDocument>(
        '../../articles/*.json',
        { eager: true }
    );

    return Object.values(articles).map((article) => {
        return { params: { path: articlePath(article) }, props: { article } };
    });
}

const { article } = Astro.props;
---

<BaseLayout>
    <Article article={article} />
</BaseLayout>

This leverages the glob import feature of Vite to import every article into an array, and then uses Astro's getStaticPaths() function to generate static pages for each article at build time. Now to render the contents of the article:

/src/components/Article.astro

web

---
import type { OrgDocument } from 'ox-json-types';
import Node from '@components/Node.astro';
interface Props {
article: OrgDocument,
}

const { article } = Astro.props;
---

<h1>{article.properties.title[0]}</h1>

{article.contents.map((node) => <Node node={node} />)}

/src/components/Node.astro

web

---
import type { OrgNode } from 'ox-json-types';

import Headline from '@components/Headline.astro';

interface Props {
    node: OrgNode,
}

const { node } = Astro.props;
---

{
  node.type === 'headline' ? <Headline node={node} /> :
  <p> unhandled node {node.ref} of type {node.type}</p>
}

From here, all that's left is the tedious work of defining components for every type of Org Mode element and a whole bunch of fussing with CSS. This ended up being fairly straightfoward so I'll spare you the details, but you can see the final result here if you're curious.

Finishing Touches

With all the essential Astro components I needed defined, things were looking pretty good but there was still one glaring issue that would require going back to where we began. At the time, source blocks were exported as plain strings so there was no syntax highlighting in the exported HTML, an embarassing state of affairs for a technical blog. Astro has builtin support for adding syntax highlighting to code blocks via the Code component so there's an easy fix, but I liked the idea of preserving highlighting exactly as I see it in Emacs - not necessarily with the exact same color scheme but rather annotated with labels like "keyword" and "variable name" that we can apply appropriate styles to in CSS. Org Mode's HTML exporter uses a package called htmlize to transform source blocks into HTML snippets, inspecting the font faces throughout to preserve syntax highlighting. We can reuse this in ox-json, but first we'll need to take a closer look at its structure to understand how nodes are converted to JSON.

The entrypoint for ox-json (and all Org exporters) is an invocation of the org-export-define-backend function:

emacs-lisp

(org-export-define-backend 'json
  ;; Transcoders
  (ox-json--merge-alists
    '(
       (template . ox-json-transcode-template)
       (plain-text . ox-json-transcode-plain-text)
       (headline . ox-json-transcode-headline)
       (link . ox-json-transcode-link)
       (timestamp . ox-json-transcode-timestamp))
    ; Default for all remaining element/object types
    (cl-loop
      for type in (append org-element-all-elements org-element-all-objects)
      collect (cons type #'ox-json-transcode-base)))
  ;; Filters
  :filters-alist '()
  ;; Options
  :options-alist
  '(
     (:json-data-type-property nil "json-data-type-property" "dataType")
     (:json-exporters nil nil nil)
     (:json-property-types nil nil nil)
     (:json-strict nil nil nil)
     (:json-include-extra-properties nil nil t))
  ;; Menu
  :menu-entry
  '(?j "Export to JSON" (
  (?J "As JSON buffer" ox-json-export-to-buffer)
  (?j "To JSON file" ox-json-export-to-file))))

The 'transcoders' argument is an alist mapping node types to functions that return their exported representation. Currently, source blocks use the default ox-json-transcode-base function but we can define a new transcoder that maps the source block's contents to HTML:

emacs-lisp

(org-export-define-backend 'json
  ;; Transcoders
  (ox-json--merge-alists
   '(
     ...
     (src-block . ox-json-transcode-src-block))
   ...))

(defun ox-json-transcode-src-block (src-block _contents info)
  "Transcode a src-block object to JSON"
  (let ((org-html-htmlize-output-type 'css))
    (ox-json-export-node-base
     src-block
     info
     :extra-properties
     `((value .
              ,(ox-json-encode-string (org-html-format-code src-block info)))))))

The source code itself is stored in the value property of the source block node, so we just override the way that property is encoded. org-html-format-code wraps the code in <span> tags with the class attribute set to the name of the font face in that region of the code. For example:

html

<span class="org-function-name">ox-json-node-properties</span>

This worked well for some languages but not others, through no fault of htmlize . Rather, syntax highlighting in Org Mode code blocks is not always totally consistent with syntax highlighting in regular source files. The way Org Mode highlights source blocks is by copying the code into a temporary buffer, invoking the block language's major mode, and then copying the code back into the Org document (see org-src-font-lock-fontify-block ). This doesn't work perfectly in some cases - I use web-mode to edit Astro code, for example. web-mode supports Astro, but it usually relies on the file extension to detect that you're editing an Astro file. The temporary buffer that Org Mode creates isn't backed by a file and so highlighting doesn't work as expected. It also doesn't work, funnily enough, for Org source blocks although I'm not totally clear on the reason for this.

There may be more refined ways to work around these issues, but I opted for the crude but effective approach of extracting source blocks to individual files and htmlizing those files rather than the code blocks directly. This relies on a feature of Org Mode called tangling, which allows you to extract source blocks into files. You can provide the :tangle header argument to a source block to control whether the block should be extracted or not, and so we can selectively use this technique for blocks where syntax highlighting within Org Mode doesn't work the way we want and skip the overhead for blocks where it does.

emacs-lisp

(defun ox-json-transcode-src-block (src-block _contents info)
  "Transcode a src-block object to JSON"
  (let* ((org-html-htmlize-output-type 'css)
         (buffer-fn (buffer-file-name (buffer-base-buffer)))
         (block-info (org-babel-get-src-block-info 'no-eval src-block))
         (src-lang (nth 0 block-info))
         (src-tfile (cdr (assq :tangle (nth 2 block-info))))
         (filename (org-babel-effective-tangled-filename buffer-fn src-lang src-tfile))
         (htmlized-src
          (if (eq filename nil)
              (org-html-format-code src-block info)
            (with-current-buffer (find-file-noselect filename)
              ;; If flycheck kicks in before the buffer is passed to htmlize-buffer,
              ;; it can pollute the exported code with exclamation marks.
              (flycheck-mode 0)
              (setq htmlized-src
                    (with-current-buffer (htmlize-buffer (current-buffer))
                      (buffer-substring
                       (plist-get htmlize-buffer-places 'content-start)
                       (plist-get htmlize-buffer-places 'content-end))))
              (kill-current-buffer)
              htmlized-src))))
    (ox-json-export-node-base src-block info
                              :extra-properties `((value . ,(ox-json-encode-string htmlized-src))))))

There's a lot more going on here, but you can mostly ignore all the variable assignments in the let* form and just know that filename gets assigned to the name of the file the block has been extracted to if its configured to be tangled and nil otherwise. If we do have a non-nil filename, we open the file and pass that into htmlize. The invocation of htmlize is a little more complicated here because htmlize-buffer produces a whole HTML document, but we really just want the fragment with the marked-up source code (I adapted this from what ox-html does internally).

Closing Thoughts

If you've made it this far you may be questioning my sanity and indeed your own for humoring me, but hopefully you've walked away with a good impression of the power and flexibility of Org Mode. I think this was a worthwhile pursuit which gave me the opportunity to peak under the hood and make a small contribution (PR pending) that might be useful to someone else one day.

I'm also happy with where this ended up: as I was putting this article together I found the experience of iterating both on the content and presentation really gelled for me. As an example, at first I was relying on MathJax to render mathematicaly formulas since Org Mode has good support for converting LaTeX to MathJax, but it felt kind of gratuitous to ship 2 MB of JavaScript just to render the occasional formula. With a couple minutes of work I brought in katex as a replacement, which I could use at build time from Astro to transform a LaTeX formula into HTML that just needed a stylesheet and font totalling tens of kilobytes to render properly (diff).

I intend to mix things up for the next few post, but I have a lot more to say about Org Mode and Emacs generally so stay tuned for more content like this in the future!

Footnotes

It might be more accurate to view Emacs as a Lisp interpreter that just so happens to come bundled with a bunch of very useful facilities for editing text. The origin of the name "Emacs" is sort of revealign - it was short for "Editor Macros" back when it was simply a collection of macros for the TECO text editor/programming language. Although Emacs has evolved a lot since then, I feel the spirit of this idea has been preserved on some level.

If you're wondering why the plot for 1000 samples is missing, I ran into an interesting bug where the temporary file that Org creates to store the value of the data_10 variable was overwritten by the data_1000 variable. There also seems to be a performance issue with loading large tables into gnuplot - the version you see here took about a second to run while it took 2 minutes to run when I added the 1000 sample data. This might make for an interesting followup post when I get around to taking a closer look!

If you're scratching your head as to how this could yield a significant performance improvement, it might be helpful to take a closer look at the data structures we're dealing with here. plist is short for "property list" and is a collection of key-value pairs represented with a flat linked list where the first element is the first key, the second is the first value, and so on. This makes them poorly-suited for random access on long lists. While the property lists for Org nodes tend to be pretty short, squashing all common properties into a single entry will still substantially reduce the plist length on average, which I would expect to have a substantial impact in contexts where the properties of a large number of nodes need to be accessed.