Adding anchors to headings

Solutions to automatically add anchors to links in Jekyll

Motivation and context

⏳ The timesaving power of URI fragments

If you are not familiar with URI¹ fragments (or you are but never knew the name) they are what’s attached to the end of a URL² following a # hash mark. Clicking a link with a URI fragment has probably caused your browser to conveniently jump to a part of the linked page with what you were looking for; much like these footnotes³.

The potential timesaving power of URI fragments

Sounds great especially for long articles unfortunately it doesn’t seem common to use them. Perhaps it’s different for you but I find the accompanying comment “it’s halfway down the page” more often than a link with a URI fragment pointing to that thing “halfway down the page”.

Why aren’t more people using these?

Wikipdia, for example, has an id for each section of an article but the only way to link to it is via the contents page at the top of the document requiring the user to scroll back up. Users are required to remember the section name and spend additional time scrolling in order to share a link with a URI fragment.

In comparison Confluence copies a link to the heading into your clipboard by clicking on the small icon next to the heading. Fast and more convenient than copying the URL from your browser’s URL bar.

The problem(s)

I would like this blog to have the same behaviour as Confluence. The aim of this post is to make this happen.

This can already by done manually writing each heading and it’s id.

<h1 id="hello-world"><a href="#hello-world">Hello world</a></h1>

This is quite verbose and stops Kramdown (the default markdown processor for Jekyll) from adding the heading to the Kramdown provided table of contents.

Solutions

What is already available

One of the solutions I found was a simple plugin that ran during the post-rendering phase. The plugin can be found here and is extremely simple it uses RegEx for conversion. But the author notes that using RegExps as a substitute for a real HTML parser is a bad time. In my own short trial with this plugin I have never produced HTML complex enough for it to fail parsing so take that for what you will.

Jekyll::Hooks.register :documents, :post_render do |doc|
  if doc.output_ext == ".html"
    doc.output =
      doc.output.gsub(
        /<h([1-6])(.*?)id="([\w-]+)"(.*?)>(.*?)<\/h[1-6]>/,
        '<a href="#\3"><h\1\2id="\3"\4>\5</h\1></a>'
      )
  end
end

GitHub Pages is a popular use case of Jekyll, unfortunately a small subset of plugins are allowed to run under GitHub. There is a need to use non-plugin solutions like Jekyll Pure Liquid Heading Anchors. Looking at the code it appears to run string manipulation by splitting on <h and </h which are the start and end of heading elements. This is no better (arguably a little worse) than the RegEx solution mentioned earlier but it is really nice for use in GitHub pages.

Another solution is to use client-side JavaScript to modify the DOM (Document Object Model) and produce the links. This solves the foibles of the previous solutions. Since we’re working on the DOM we don’t need ill-advised RegEx to parse and modify the heading elements and it’s also able to work in GitHub pages since it isn’t a plugin, it’s client-side JavaScript.

document.addEventListener("DOMContentLoaded", () => {
  document.querySelectorAll("h1, h2, h3, h4, h5, h6").forEach((el) => {
    if (!el.id) {
      return;
    }
    el.innerHTML = `<a href="#${el.id}">${el.innerHTML}</a>`;
  });
});

The only issue I personally hold against this solution is the unnecessary use of JavaScript, if it can be done without requiring the user to download and run author code then I believe it should be.

My solution

Since I don’t have the restriction applied by GitHub Pages I felt I might try my hand at creating another plugin that modifies Kramdown to do this.

Kramdown produces an internal representation of the parsed markdown file before converting it into HTML. I can modify this internal representation like how the JavaScript solution modifies the DOM.

Diagram of Kramdown's input and output formats

Unlike the JavaScript solution the modification will occur at processing time and won’t require the user to download scripts.

I found the Kramdown documentation pretty thin on it’s implementation or how to make modifications. Luckily Kramdown is a “pure-Ruby” converter which will make hacky modification easier.

🙊 Monkey patching in Ruby

Monkey patching in Ruby is the act of modifying a pre-existing class or object. For example the builtin String class can be extended.

class String  
  def write_size  
    self.size  
  end  
end  
size_writer = "Tell me my size!"  
puts size_writer.write_size 

Now every string has a write_size method.

It’s already pretty clear that monkey patching is dangerous. But for our purposes we just want something working. There are some design choices we can make later to mitigate issues of monkey patching.

First solution

Jumping into the source code we can find how Kramdown outputs HTML under lib/kramdown/converter/ along with other output formats. The code was surprisingly easy to understand, the convert_header method is what we’re after.

# ... stuff
def convert_header(el, indent)
  attr = el.attr.dup
  if @options[:auto_ids] && !attr['id']
    attr['id'] = generate_id(el.options[:raw_text])
  end
  @toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
  level = output_header_level(el.options[:level])
  format_as_block_html("h#{level}", attr, inner(el, indent), indent) # <-- this is the line we want
end
# ... other stuff

We can now create a new script in our _plugins folder which “opens” the class and rewrites the convert_header method.

Expand for my_processor.rb

# Modifies kramdown's html converter. Works for 2.3.1
module Kramdown
  module Converter
    class Html
      def convert_header(el, indent)
        attr = el.attr.dup
        if @options[:auto_ids] && !attr['id']
          attr['id'] = generate_id(el.options[:raw_text])
        end
        @toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
        level = output_header_level(el.options[:level])
        format_as_block_html("h#{level}", attr, "<a href=\"##{attr['id']}\">#{inner(el, indent)}</a>", indent)
      end
    end
  end
end

As already mentioned in Monkey patching in Ruby, this is not maintainable. What if an update changes the internals of the HTML class? What if convert_header gets used in a way that is incompatible with my modification? I currently don’t have an answer for the first one but the second one can be fixed by using inheritance.

Expand for my_processor_v2.rb

# Modifies kramdown's html converter. Works for 2.3.1
module Kramdown
  module Converter
    class MyHtml < Html
      def convert_header(el, indent)
        attr = el.attr.dup
        if @options[:auto_ids] && !attr['id']
          attr['id'] = generate_id(el.options[:raw_text])
        end
        @toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
        level = output_header_level(el.options[:level])
        format_as_block_html("h#{level}", attr, "<a href=\"##{attr['id']}\">#{inner(el, indent)}</a>", indent)
      end
    end
  end

  class MyJekyllDocument < JekyllDocument
    def to_html
      output, warnings = Kramdown::Converter::MyHtml.convert(@root, @options)
      @warnings.concat(warnings)
      output
    end
  end
end

# Modifies Jekyll's html converter. Works for 4.2.0
module Jekyll::Converters
  class Markdown
    class MyKramdownParser < KramdownParser
      def convert(content)
        document = Kramdown::MyJekyllDocument.new(content, @config)
        html_output = document.to_html
        if @config["show_warnings"]
          document.warnings.each do |warning|
            Jekyll.logger.warn "Kramdown warning:", warning
          end
        end
        html_output
      end
    end
  end
end

Be sure to modify your _config.yml to use your new markdown processor.

Universal Resource Identifier ↩
Universal Resource Locator, a type of URI ↩
Hello there ↩