Site meta: Incorporating Old Posts
Building out the recent posts list to include direct links to posts on the old blog, while ignoring them if that post exists here
One of the challenges I’ve experienced when creating the new software stack for my website/blog is how to import content over. WordPress, the previous hosting system, handled everything in a database and then did exports in XML. Jekyll, the new home for this site, handles everything in individual Markdown files. And then, there’s the obvious question about if some of these posts should even make the transition over.
My first thought is to use PowerShell for everything, but there isn’t a good XML parser utility for PowerShell. So, I had ChatGPT generate some code to parse the WordPress XML output file using a language of its choosing, and it chose Python.
It’s a lot of code, so rather than embed it here, I’ve put it up on GitHub: rewrite-wordpress-export.py.
It seemed to be successful, outputting what I had in mind: the title, the link, and the post date. Here’s an example element:
{
"title": "Friday Five: Updates to A Brief History of Midtown Phoenix",
"link": "https://edwardjensen.net/friday-five/friday-five-updates-to-a-brief-history-of-midtown-phoenix/6493/",
"pubDate": "Fri, 18 Mar 2022 16:30:00 +0000"
}
There are a couple of issues here. First, the URL it detected is the original post URL, which is on the edwardjensen.net domain rather than its new place, old.edwardjensen.net. Second, the pubDate is a text string, so I need to convert that into something machine-readable. Lastly, I need to add a way to indicate that this post is from the post list, not the list of posts on the new site. This is where a bit of PowerShell comes into play:
foreach ($p in $posts) {
$p.link = $p.link -replace "https://edwardjensen.net/", "https://old.edwardjensen.net/"
$p.pubDate = ($p.pubDate | Get-Date)
Add-Member -InputObject $p -MemberType NoteProperty -Name "ejnet" -Value $true
}
Easy. We’ve transformed the JSON block to something that’s usable:
{
"title": "Friday Five: Updates to A Brief History of Midtown Phoenix",
"link": "https://old.edwardjensen.net/friday-five/friday-five-updates-to-a-brief-history-of-midtown-phoenix/6493/",
"pubDate": "2022-03-18T09:30:00-07:00",
"ejnet": true
}
Now export the output as a new JSON file, upload it to the _data folder in Jekyll, and we’re in business. Right? There were a couple of hiccups that I found based on the original Python script:
- Jekyll uses
dateas the date field, notpubDateas I used in the Python script. OK, that’s simple to replace - just do a find/replace action to replace"pubDate"with"date"in the JSON file. - Similarly, Jekyll uses
"url"instead of"link". Same action with find and replace.
With those out of the way, we’re now in business. Liquid, the templating engine for Jekyll, can do a bit of data analysis. If a post exists on the new blog, I don’t need to have it repeated from the old blog post list. So now, I need to construct an operation to import posts from the new blog, the JSON list from the old blog, and delete duplicates on the JSON list if the post is on the new blog.
This is the liquid I used:
{% assign recent_posts = site.posts %}
{% assign ejnet_posts = site.data.ejnet-posts %}
{% assign combined_posts = recent_posts | concat: ejnet_posts %}
{% assign unique_posts = combined_posts | uniq: 'title' | sort: 'date' | reverse %}
Et voilà ! The posts are now imported as one list. In the HTML that generates the list, it’s now easy to add. I’ve set up the formatting such that if the post is from the JSON post list, the post title is in italic type. Part of that is to create a visual reference to a post on a different infrastructure, but part of that is also for me as a check for posts that I still need to import over. I’ve found, for instance, that a couple of recent posts didn’t make it over when I ran the original WordPress to Markdown import operation!
{% if post.ejnet %}
<span class="italic">{{ post.title }}</span>
{% else %}
<span>{{ post.title }}</span>
{% endif %}
I’ve also added a utility to group posts by each year, sorted from newest to oldest. That’s accomplished, again, using Liquid:
{% assign posts_by_year = unique_posts | group_by_exp: "post", "post.date | date: '%Y'" %}
{% for year in posts_by_year %}
{% for post in year.items %}
And that’s how it looks now on the edwardjensen.net/writing page.
Because my writing up until 2018 was on central-city Phoenix matters, I’ve added this note at the top of the writing index page:
Post titles with italic type are links to the those posts on the previous version of this blog prior to them being migrated over. NB: Most historical posts from 2011 through 2018 are when I took an acute interest in central-city Phoenix matters, and may not be as relevant today. Still, in the interest of historical completeness, those posts are going to be linked below.
I’m looking at a lot of these posts, and while I do stand by them as something I’ve written, some of the posts are mildly cringeworthy. I’m thinking some of them might be left behind in the migration, because they’re just not relevant any longer.
Technically, I’m quite pleased with how this has worked. Check out the GitHub repository for more info: edwardjensen/wordpress-json-list.