Migrating my blog from Jekyll to Pelican

05 Jul 2022 - Tobias Erdle

After using Jekyll and GitHub Pages to generate my blog for a few years now, more and more major and minor code maintenance challenges arose over time.

A simple example of this was updating my operating system to Ubuntu 22.04, which ships Ruby 3 by default. However, the version of Jekyll required by GitHub Pages must be built using Ruby 2.7, which no longer compiled on my system. The workaround for this was too much work for me, though, since I'm not a pro for Ruby and its ecosystem. Besides, it was a pain every time to set up Jekyll in such a way that easy development was possible.

Of course there are solutions for all these and upcoming problems, but I don't want to invest time again and again just to generate some HTML. So I started looking for an alternative to this setup, which is easier for me and less work intensive.

What is this article

This article is a kind of personal story of why and how I migrated from Jekyll to Pelican without showing each technical step in detail, as most of those can be read in the various links to Pelican's documentation. The articles demonstrates the main actions I took and what the result looked like. The whole code can be viewed at the erdlet.de GitHub repository.

What is this article not?

This article is no step by step introduction to Pelican. Please refer to the documentation for detailed information on this topic.

Migration to my own server

First I wanted to solve the problem of outdated versions caused by GitHub Pages. The simple solution: move to my own server. For this I used an already installed virtual server, which provides a web server. This was extended by the corresponding domain and the generated content was stored there.

The result: it works, but requires manual effort for deployment.

Automation of the deployment - a failed attempt

After deploying the site manually a few times, I wanted to automate this task. This should be done by a simple shell job in my Jenkins instance.

For this Ruby 3.0 should be installed via RVM. The operating system of the server was Ubuntu 22.04, so I thought to encounter no problems. However, the installation failed immediately, because Ruby 3.0 is based on an old OpenSSL version. There would be possibilities for a downgrade of the library (see RVM GitHub Issue 5209), but this can again lead to problems with other packages, also on system level. It would definitely not be worth it to me. Likewise, although a manual upgrade of the RVM index would have been possible, this information simply escaped me in the crowd of possibilities.

Thus, the attempt to set up a simple deployment already failed when installing the underlying programming language.

The search for a successor for Jekyll

After this "experience" I decided to look for another static page generator and to realize my blog in it.

To keep the migration effort within limits, the new generator should have the following features:

Programming language in latest version is easy to install on Ubuntu
Established framework - no alpha or beta versions of any hip frameworks
No programming language new to me - I want to spend little time for setup, etc.
As lightweight as possible, so no full-fledged CMS like Wordpress
Markdown must work directly - I don't want to have to change all blog entries in a big way

Since my main used programming languages are Java, Python and, from time to time, Go, I tried to find a corresponding generator in these languages. The search then resulted in the following hits:

Go
- Hugo
Python
- Pelican
- MkDocs
Java:
- JBake

After looking at all of these generators a little closer, my choice fell pretty quickly on Pelican. From the usage point of view this was the closest to what I had in mind, Python 3 runs without problems on Ubuntu and I already have experience with the tools used there. Java seemed too complex from a setup point of view and Go would have required too much effort in my eyes.

Migration to Pelican

With the choice of Pelican as the new tool for the homepage, the conversion began immediately. But before I dive into the real migration, I'll show some Pelican fundamentals.

Pelican installation

First a virtual environment for Python was set up in the project directory to install the dependencies. Therefore I created the project directory an navigated into it. Inside the directory I ran following bash commands, assuming I'm working on Ubuntu 22.04. This may differ on other operating systems.

# Create virtual environment in project directory
python3 -m venv ./venv

# Activate virtual environment
. ./venv/bin/activate

# Install pelican with Markdown support
python -m pip install "pelican[markdown]"

Then a new Pelican project was generated directly in the project directory by running pelican-quickstart. This command will ask you several questions I'll skip here for brevity. The resulting directory structure looks like this.

erdlet.de/
├── content
│   └── (pages)
├── output
├── tasks.py
├── Makefile
├── pelicanconf.py " contains the basic Pelican configuration
└── publishconf.py " contains configuration for publishing the page (if done by Pelican)

By running pelican --listen inside the project directory, the content will be generated and served on localhost:8000. Also, pelican supports auto-reload by adding --autoreload to the command mentioned before.

Pelican concepts

Now that the Pelican project is up and running, let's start looking into the concepts of Pelican. Pelican is based on "themes", which provide specific Jinja2 templates for the respective types of content. The topic "Themes" will be discussed later.

In addition, Pelican distinguishes between "articles" and "pages". An "article" are chronological entries, e.g. blog entries, which are provided with a date. The "pages" on the other hand are static in nature and independent of a specific date, such as an imprint.

Each content can be enriched with certain metadata, which can and will then be used for processing in the templates. The following example shows the possible metadata taken tags delivered by Pelican see Writing content topic. You can also add your own data, but it must not overlap with the existing data.

Metadata	Description
title	Title of the article or page
date	Publication date (e.g., YYYY-MM-DD HH:SS)
modified	Modification date (e.g., YYYY-MM-DD HH:SS)
tags	Content tags, separated by commas
keywords	Content keywords, separated by commas (HTML content only)
category	Content category (one only — not multiple)
slug	Identifier used in URLs and translations
author	Content author, when there is only one
authors	Content authors, when there are multiple
summary	Brief description of content for index pages
lang	Content language ID (en, fr, etc.)
translation	If content is a translation of another (true or false)
status	Content status: draft, hidden, or published
template	Name of template to use to generate content (without extension)
save_as	Save content to this relative file path
url	URL to use for this article/page

An example for using this metadata in an article is the source of this one:

Title: Migrating my blog from Jekyll to Pelican
Date: 2022-07-05
Category: python
Tags: python, pelican, migration
Slug: migrating-blog-from-jekyll-to-pelican
Authors: Tobias Erdle
Status: draft

Those tags add the title, which is used for generating headlines, browser tab titles, et cetera by the most themes. The other tags are described above, so there won't be further explanation.

Pelican themes

The last concept explained are the "themes" which are used for rendering the content. A Pelican theme has a fix directory structure like it is shown below (taken from Pelican Documentation, topic themes).

themename
├── static
│   ├── css
│   └── images
└── templates
    ├── archives.html         // to display archives
    ├── period_archives.html  // to display time-period archives
    ├── article.html          // processed for each article
    ├── author.html           // processed for each author
    ├── authors.html          // must list all the authors
    ├── categories.html       // must list all the categories
    ├── category.html         // processed for each category
    ├── index.html            // the index (list all the articles)
    ├── page.html             // processed for each page
    ├── tag.html              // processed for each tag
    └── tags.html             // must list all the tags. Can be a tag cloud.

Those files are mandatory to have within a theme, thus it is not necessary to implement them all. If e.g. categories or tags aren't used, the templates /templates/category, /templates/categories/ et cetera can be kept empty.

The static/css contains all necessary styling and static/images assets like logos or icons. You're free to add custom directories like static/fonts in case a site specific font is used. Personally I added this directory, as using Google Fonts via CDN is problematic in Germany due to data privacy laws.

There are already a lot of themes if someone wants to use them instead of a custom implementation. Those themes can be found in the pelican-themes GitHub repository.

Migrating files to their corresponding directory

At beginning of the migration, the Jekyll repository looked like the following directory structure.

.
├── _layouts/
│   ├── layout.html
│   └── base.html
├── _posts/
│   ├── 2019-08-07-jsr371-multi-language-support.md
│   ├── 2019-08-23-jsr371-custom-locale-resolver.md
│   └── 2019-09-10-jsr371-talk-at-jug-in.md
│   └── ...
├── assets/
│   └── css/
│       └── styles.css
├── .gitignore
├── .ruby-version
├── .rvmrc
├── 404.html
├── Gemfile
├── Gemfile.lock
├── _config.yml
├── about_me.html
├── impress.html
├── index.html
└── privacy.html

And, as described above, the new Pelican project looks like this:

erdlet.de/
├── content
│   └── (pages)
├── output
├── tasks.py
├── Makefile
├── pelicanconf.py
└── publishconf.py

In the old project, the _layouts/layout.html contained the basic layout, whereas _layouts/post.html contained all the posts (or articles). Static files, like CSS, were stored in the assets/ directory. Last but not least, the static pages, like the imprint, were simply laying around in the projects root directory.

Now to bring this in the pelican directory structure, at first all files from _posts/ were moved into the erdlet.de/content/ directory, as well as the static files into erdlet.de/content/pages/. The result looked like this structure:

erdlet.de
├── content/
│   ├── pages/
│   │   ├── imprint.html
│   │   ├── privacy.html
│   │   └── about_me.html
│   ├── 2019-08-07-jsr371-multi-language-support.md
│   ├── 2019-08-23-jsr371-custom-locale-resolver.md
│   ├── 2019-09-10-jsr371-talk-at-jug-in.md
│   └── ...
├── output
├── tasks.py
├── Makefile
├── pelicanconf.py
└── publishconf.py

After this, the next step is to migrate the content/pages/*html files into markdown files. This could be done easily by changing the file ending, as HTML tags are allowed within markdown. This way, a lot of time can be saved by avoiding the full migration of bigger pages. After this, the repository looks like this:

erdlet.de
├── content/
│   ├── pages/
│   │   ├── imprint.md
│   │   ├── privacy.md
│   │   └── about_me.md
│   ├── 2019-08-07-jsr371-multi-language-support.md
│   ├── 2019-08-23-jsr371-custom-locale-resolver.md
│   ├── 2019-09-10-jsr371-talk-at-jug-in.md
│   └── ...
├── output
├── tasks.py
├── Makefile
├── pelicanconf.py
└── publishconf.py

After running pelican --autoreload --listen a bunch of errors occured, as the metadata of my articles and pages was wrong. Fixing this took the most time of the migration. As it is a straigth forward action, as the necessary metadata was explained above, it won't be covered in detail. A short example of the 2019-08-07-jsr371-multi-language-support.md metadata migration shall be enough:

# Before

---
layout: post
title:  "MVC API: I18n fundamentals"
date:   2019-08-07 10:00:00 +0200
categories: jakarta-ee
author: Tobias Erdle
---


# After
Title:  Jakarta MVC: I18n fundamentals
Date:   2019-08-07
Modified: 2022-07-04
Category: jakartaee
Tags: jakartaee, jakarta-mvc, java
Slug: jakarta-mvc-i18n
Authors: Tobias Erdle
Status: published

One important thing to mention here is, that Status is necessary to control if the article is loaded into the articles collection when implementing templates or not. With setting the slug, the generated URL can be set.

Creating custom theme

After the default theme worked like intended, it was time to create the custom theme, as the original style and templating shall be the same. Therefore, the directory structure was extended to match the following diagram.

erdlet.de
├── content/
│   ├── pages/
│   │   ├── imprint.md
│   │   ├── privacy.md
│   │   └── about_me.md
│   ├── 2019-08-07-jsr371-multi-language-support.md
│   ├── 2019-08-23-jsr371-custom-locale-resolver.md
│   ├── 2019-09-10-jsr371-talk-at-jug-in.md
│   └── ...
├── output
├── themes/
│   └── erdlet/
│       ├── static/
│       │   ├── css
│       │   ├── fonts
│       │   └── images
│       └── templates/
│           ├── article.html
│           ├── authors.html
│           ├── author.html
│           ├── base.html
│           ├── categories.html
│           ├── category.html
│           ├── index.html
│           ├── page.html
│           ├── periodic_archives.html
│           ├── tag.html
│           └── tags.html
├── tasks.py
├── Makefile
├── pelicanconf.py
└── publishconf.py

The assets/css/styles.css was copied directly into the themes/erdlet/static/css/ directory and the necessary fonts downloaded into themes/erdlet/static/fonts/. Adding the fonts into the CSS file won't be covered here. Afterwards the necessary templates need to be implemented. In this blog's case, only article.html, index.html and page.html are filled, as well as base.html which contains the overall layout. It will be necessary to change some variables to match Pelican's data structures and use the Jinja2 syntax to extend the pages from base.html.

To activate the custom theme, the setting THEME = 'themes/erdlet' had to be set in pelicanconf.py. Now running the page worked nearly like expected.

The last thing to do was to set PAGE_SAVE_AS = 'pages/{slug}.html' in pelicanconf.py, so the files of the pages in output/ will be named bei their slug instead of their title. Those names are shorter and easier to link.

Now running pelican --autoreload --listen worked and showed the pages like intended.

Automation of the deployment - an successful attempt

The trigger for this migration was, as mentioned at the beginning, the dissatisfaction with the installation and configuration of the automatic deployment on the Jenkins server.

Python 3 is already installed on the server and runs without problems. To be able to use venv, the package python3.10-venv has to be installed. This works without problems. After that a simple Jenkins job can be created with a script like this one and, if the rights of the user fit, executed.

# Create virtual environment
python3 -m venv ./venv

# Activate virtual environment
. ./venv/bin/activate

# Installiere die Abhängigkeiten mit pip
pip install -r requirements.txt

# Generiere `output/`
make html

# Kopiere generierten Inhalt in Webserververzeichnis (wenn auf gleichem Server)
cp -r output/* /var/www/foobar/

The automatic deployment works as expected and no futher actions must be taken.

Conclusion

Would I do this migration again? I am sure: yes.

The whole migration from Jekyll to Pelican took me about 4 hours, so probably less than rewriting everything in another language. I've also always had problems keeping Ruby and Jekyll running, which has never happened to me with Python and its frameworks (never say never - haha). Besides the fact that I find it easier to run with Pelican, I also like the clear structure of the repository much better and the clear guidelines on themes complete the picture for me.

Would I recommend Pelican?

Here, too, I clearly say: yes. If I had known Pelican before, I would have used it directly.