skip to main content

Writing a Tufte-book in Markdown

published icon  |  category icon software

Somehow, Writing Academic Papers in Markdown is one of my most popular blog posts. I’m glad so many people are looking to partially ditch LaTeX and separate content from markup! Pandoc is a wonderful tool that takes in a plain .md Markdown file and spits out whatever you’d like: Word, HTML, or of course, PDFs using a TeX engine of your choice—which is what we’re interested in.

Writing a paper in Markdown is easy enough since most of the post processing is done by the conference or journal template you slap on afterwards. For my PhD dissertation, things are a bit more complicated, as I wanted to use the tufte-book document style. Edward Tufte’s books are simply amazing. He’s a statistics and visualization expert that has inspired an entire army of design and styling guidelines—including a TeX package. That means we can do things like this:

An excerpt of an early chapter in my thesis.

Tufte makes maximum use of margins: they can house margin figures, footnotes, references, or images can stretch into full width. The beautiful font face and styling is a free bonus.

But. I want to write primarily in Markdown, which means I’ll need Pandoc’s ability to convert it into .tex, which means tufte-book specific environments like \newthought{blah} are technically impossible to do unless you start mixing TeX and MD, again muddling the content—we don’t want that. I stumbled on a lot of issues and had to jump through a lot of hoops in order to get the most out of it. In this post, I’ll try to summarize all dirty hacks for prosperity.

Most custom stuff below is simply a Python script that gets executed after running the pandoc command, but before calling upon xelatex to render the PDF. This series of commands builds everything:

  pandoc -f markdown \
    -V documentclass=tufte-book \
    --include-in-header=preamble.tex \
    --include-before-body=voorblad.tex \
    --pdf-engine=xelatex \
    --natbib \
    --template=../pandoc/templates/pandoc-tufte.tex \
    --top-level-division=part \
    --metadata-file=metadata.yml \
    -t latex+smart \
    --highlight-style=haddock > thesis.tex \
    chapters/ch0-preface.md chapters/ch1-introduction.md \
    chapters/pt1.md chapters/pt1ch1-whatever.md
  python ../pandoc/filters/tufte-postprocessor.py thesis.tex
  xelatex thesis.tex
  bibtex thesis
  xelatex thesis.tex

References

Because of the Pandoc citation system, @someref says that... and others say as well [@otherref]. will be translated into \citet{someref} syas that... and others say as well \citep{otherref}.. That’s excellent, because tufte-book replaces \cite{} to make citations appear in the margin. And I don’t want that, as it’s a dissertation, quickly overrunning the margin.

We’ll want to use apacite in conjunction with natbibapa, but leave the natbib options empty using these options:

classoption: justified,symmetric,marginals=raggedright,notoc,numbers,nobib
# leave these intentionally blank!
natbiboptions: 
biblio-style: 

Don’t forget the --metadata-file=metadata.yml and --natbib Pandoc options. The apacite package will take care of your citations as long as you stick to the @ notation that Pandoc translates. I killed a bunch of statements in the pandoc template that checks which citation system you use because I had trouble compiling but can’t remember the specifics.

Okay, and what about possessive citations, like “Kaufman’s (2009) framework is such and such”? By default, @kaufman's framework becomes “Kaufman (2009)’s framework”. This Overleaf hint inspired me to auto-replace \citet{(\w+)}'s into \citeauthor{\1}'s \citeyearpar{\1}.

Also, there’s a couple of interesting Pandoc filters made by Tom Duck called pandoc-fignos, -secnos, and -tablenos. They make it possible to avoid using \ref{} in your text, but unfortunately rely on header includes which get overridden by my --include-in-header flag to pass in custom preamble. Nevertheless, the filters inspired me to come up with something simple for myself.

This will translate

![#fig:label Some Caption](somefig.jpg)

@fig:label shows some cool graph. Blah blah. See also @pt1ch2-something for more details.

into

\begin{figure}
...
\label{fig:label}
...
\end{figure}

Figure~\ref{fig:label} shows some cool graph. Blah blah. See also Chapter~\ref{pt1ch2-something} for more details.

using a simple regex: re.sub(r"\\citet{fig:(\w+)}", r"Figure~\\ref{fig:\1}", file) (and the same for producing the image label). But why replace a \citet{}? See above; the Pandoc system auto-replaces @blah into \citet{}. But why adding in Figure~? I know there are packages that pandoc-fignos uses internally that take care of that for you but wanted to keep things simple. It also means I don’t have to type “Chapter” or “Figure” each time in the source file.

Figures

A lot of figures are misaligned depending on the left-hand or right-hand side since the caption appears in the margin. This is very irritating since adding or removing text moves them around, breaking the layout. That’s fixed by hacking in \checkoddpage \ifoddpage \forcerectofloat \else \forceversofloat \fi just after each \begin{figure}, see this GitHub issue.

Another problem: how can you produce \begin{figure*}—note the star—to create full-width images spanning across the extended margin? By default, you can’t. You can do this:

![](sup.jpg){width=100%}

And Pandoc will interpret the width ratio and produce includegraphics[width=1\textwidth,height=\textheight]. Which of course does not work, as it’s still wrapped in a regular figure block. I had to regex for it, then go back up to find the enclosing block and add a *.

I have no solution for margin figures except for a custom property within {}that does more or less the same.

As for tables, Pandoc generates longtable blocks instead of regular ones, and it’s full of weird crap. Most of the tables I have require special TeX commands anyway, for instance to rotate certain column headers, so I gave up and simply relied on TeX for those blocks instead.

If you want subtables: do not use the deprecated subfigure package which is incompatible with tufte-book! booktabs and subfig (with caption=false) does the trick, see this stackexchange post.

Acronyms

Inspired by pandoc-acro, I created a simplified version by replacing \s\+([A-Z]\w+) with \ac{\1}. That means you write:

Te +SE world is a peculiar one. 

Many students in +SE don't know how to grok Node.

Will become in the PDF text:

The Software Engineering (SE) world is a peculiar one.

Many students in SE don't know how to grok Node.

The second +SE won’t get unfolded but that’s customizable, for instance if you want to do so for each new chapter.

Don’t forget to include package acro and define each acronym in your preamble using \DeclareAcronym{SE}{short = SE, long = Software Engineering}. There’s all kinds of options there for you to fiddle with as well. I also auto-replace +SEs with \acfp{SE}—the full plural version. If that’s too much effort for you, just try out the original filter, but I wanted more control and already had a script that grepped around, so whatever.

Layouting

Tufte starts out his later books with a “new thought” in each new chapter and section, where the first three or four words are capitalized and spread out. tufte-book supports this with newthought{}, but I don’t want to add this manually in the Markdown file, hence another hack. It’s too barebones (and dirty!) to share here but it boils down to:

  1. Find all \begin{section blocks. Take optional []s into account.
  2. Scan for the next line that is not empty; a TeX command; or the start of a TeX block—in case of that last one, fast-forward to the first \end{}.
  3. Break up the line, push the first words into \newthought{}, and save.

As for text alignment, tufte-book uses left alignment instead of a justified one as Tufte believes it’s easier to read. I think I agree, but as it’s an academic text-heavy work, I still like it to be justified. The justified option for the document breaks more than it fixes though, as a lot of hyphenation errors occurred, to the point that fixing them manually \hyphen{} was fruitless. Thanks to this blog article, adding

\tolerance=1
\emergencystretch=\maxdimen
\hyphenpenalty=10000
\hbadness=10000

Creates a Word-like justified style, spreading out words rather than breaking them.

Other TeX-specific settings

Remember that tufte-book by default doesn’t show sections in the table of contents, and that dotted lines are absent. This can be fixed with:

\renewcommand*\l@section{\@dottedtocline{1}{0em}{2.3em}}  
\renewcommand*\l@figure{\@dottedtocline{1}{0em}{2.3em}}

\setcounter{secnumdepth}{1}
\setcounter{tocdepth}{1}

I also use titletoc to customize the styling of the title.

If you’re interested to get things up and running but encounter difficulties, feel free to reach out, I’m happy to share scripts and source material!

tags icon writing pandoc Markdown latex

I'm Wouter Groeneveld, a Brain Baker, and I love the smell of freshly baked thoughts (and bread) in the morning. I sometimes convince others to bake their brain (and bread) too.

If you found this article amusing and/or helpful, you can support me via PayPal or Ko-Fi. I also like to hear your feedback via Mastodon or e-mail. Thanks!