Solutions to automatically add anchors to links in Jekyll
Motivation and context
⏳ The timesaving power of URI fragments
If you are not familiar with URI1 fragments (or you are but
never knew the name) they are what’s attached to the end of a URL2 following
a #
hash mark. Clicking a link with a URI fragment has probably
caused your browser to conveniently jump to a part of the linked page with what
you were looking for; much like these footnotes3.
The potential timesaving power of URI fragments
Sounds great especially for long articles unfortunately it doesn’t seem common to use them. Perhaps it’s different for you but I find the accompanying comment “it’s halfway down the page” more often than a link with a URI fragment pointing to that thing “halfway down the page”.
Why aren’t more people using these?
Wikipdia, for example, has an id
for each section of an article but the only
way to link to it is via the contents page at the top of the document requiring
the user to scroll back up. Users are required to remember the section name
and spend additional time scrolling in order to share a link with a URI
fragment.
In comparison Confluence copies a link to the heading into your clipboard by clicking on the small icon next to the heading. Fast and more convenient than copying the URL from your browser’s URL bar.
The problem(s)
I would like this blog to have the same behaviour as Confluence. The aim of this post is to make this happen.
This can already by done manually writing each heading and it’s id
.
1
<h1 id="hello-world"><a href="#hello-world">Hello world</a></h1>
This is quite verbose and stops Kramdown (the default markdown processor for Jekyll) from adding the heading to the Kramdown provided table of contents.
Solutions
What is already available
One of the solutions I found was a simple plugin that ran during the post-rendering phase. The plugin can be found here and is extremely simple it uses RegEx for conversion. But the author notes that using RegExps as a substitute for a real HTML parser is a bad time. In my own short trial with this plugin I have never produced HTML complex enough for it to fail parsing so take that for what you will.
1
2
3
4
5
6
7
8
9
Jekyll::Hooks.register :documents, :post_render do |doc|
if doc.output_ext == ".html"
doc.output =
doc.output.gsub(
/<h([1-6])(.*?)id="([\w-]+)"(.*?)>(.*?)<\/h[1-6]>/,
'<a href="#\3"><h\1\2id="\3"\4>\5</h\1></a>'
)
end
end
GitHub Pages is a popular use case of Jekyll, unfortunately a small
subset of plugins are allowed to run under
GitHub. There is a need to use non-plugin solutions like Jekyll Pure Liquid
Heading Anchors. Looking at the
code it appears to run string manipulation by splitting on <h
and </h
which
are the start and end of heading elements. This is no better (arguably a
little worse) than the RegEx solution mentioned earlier but it is really nice
for use in GitHub pages.
Another solution is to use client-side JavaScript to modify the DOM (Document Object Model) and produce the links. This solves the foibles of the previous solutions. Since we’re working on the DOM we don’t need ill-advised RegEx to parse and modify the heading elements and it’s also able to work in GitHub pages since it isn’t a plugin, it’s client-side JavaScript.
1
2
3
4
5
6
7
8
document.addEventListener("DOMContentLoaded", () => {
document.querySelectorAll("h1, h2, h3, h4, h5, h6").forEach((el) => {
if (!el.id) {
return;
}
el.innerHTML = `<a href="#${el.id}">${el.innerHTML}</a>`;
});
});
The only issue I personally hold against this solution is the unnecessary use of JavaScript, if it can be done without requiring the user to download and run author code then I believe it should be.
My solution
Since I don’t have the restriction applied by GitHub Pages I felt I might try my hand at creating another plugin that modifies Kramdown to do this.
Kramdown produces an internal representation of the parsed markdown file before converting it into HTML. I can modify this internal representation like how the JavaScript solution modifies the DOM.
Unlike the JavaScript solution the modification will occur at processing time and won’t require the user to download scripts.
I found the Kramdown documentation pretty thin on it’s implementation or how to make modifications. Luckily Kramdown is a “pure-Ruby” converter which will make hacky modification easier.
🙊 Monkey patching in Ruby
Monkey patching in
Ruby is the act of modifying a pre-existing class or object. For example
the builtin String
class can be extended.
1
2
3
4
5
6
7
class String
def write_size
self.size
end
end
size_writer = "Tell me my size!"
puts size_writer.write_size
Now every string has a write_size
method.
It’s already pretty clear that monkey patching is dangerous. But for our purposes we just want something working. There are some design choices we can make later to mitigate issues of monkey patching.
First solution
Jumping into the source code we can find how Kramdown outputs HTML under
lib/kramdown/converter/
along with other output formats.
The code was surprisingly easy to understand, the convert_header
method
is what we’re after.
1
2
3
4
5
6
7
8
9
10
11
# ... stuff
def convert_header(el, indent)
attr = el.attr.dup
if @options[:auto_ids] && !attr['id']
attr['id'] = generate_id(el.options[:raw_text])
end
@toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
level = output_header_level(el.options[:level])
format_as_block_html("h#{level}", attr, inner(el, indent), indent) # <-- this is the line we want
end
# ... other stuff
We can now create a new script in our _plugins
folder which “opens” the class and
rewrites the convert_header
method.
Expand for my_processor.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Modifies kramdown's html converter. Works for 2.3.1
module Kramdown
module Converter
class Html
def convert_header(el, indent)
attr = el.attr.dup
if @options[:auto_ids] && !attr['id']
attr['id'] = generate_id(el.options[:raw_text])
end
@toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
level = output_header_level(el.options[:level])
format_as_block_html("h#{level}", attr, "<a href=\"##{attr['id']}\">#{inner(el, indent)}</a>", indent)
end
end
end
end
As already mentioned in Monkey patching in Ruby, this is not
maintainable. What if an update changes the internals of the HTML class? What if convert_header
gets used in a way that is incompatible with my modification? I currently don’t have an
answer for the first one but the second one can be fixed by using inheritance.
Expand for my_processor_v2.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Modifies kramdown's html converter. Works for 2.3.1
module Kramdown
module Converter
class MyHtml < Html
def convert_header(el, indent)
attr = el.attr.dup
if @options[:auto_ids] && !attr['id']
attr['id'] = generate_id(el.options[:raw_text])
end
@toc << [el.options[:level], attr['id'], el.children] if attr['id'] && in_toc?(el)
level = output_header_level(el.options[:level])
format_as_block_html("h#{level}", attr, "<a href=\"##{attr['id']}\">#{inner(el, indent)}</a>", indent)
end
end
end
class MyJekyllDocument < JekyllDocument
def to_html
output, warnings = Kramdown::Converter::MyHtml.convert(@root, @options)
@warnings.concat(warnings)
output
end
end
end
# Modifies Jekyll's html converter. Works for 4.2.0
module Jekyll::Converters
class Markdown
class MyKramdownParser < KramdownParser
def convert(content)
document = Kramdown::MyJekyllDocument.new(content, @config)
html_output = document.to_html
if @config["show_warnings"]
document.warnings.each do |warning|
Jekyll.logger.warn "Kramdown warning:", warning
end
end
html_output
end
end
end
end
Be sure to modify your _config.yml
to use your new markdown processor.