sc
Scott avatar
ttwillsey
新無料サイト!!

Astro RSS Compiled Content Revisited

In the past I wrote about full-text RSS in Astro, and using sanitize-html on the blog post content to escape and filter html tags. This is or was the recommended procedure when creating full-text RSS feeds according to Astro’s documentation on RSS feeds, and worked fine until recently.

At some point, sanitize-html started breaking when htmlparser2 was updated (and sanitize-html apparently wasn’t). I started getting the following compilation error:

SanitizeErrorMessage

My temporary solution was to pin htmlparser2 to version 8, which I didn’t love, especially since I have multiple Astro sites, and since it would be hard to know when an update would fix the issue.

However, after this problem persisted for awhile, I started wondering what was going on. Surely it must be affecting other people making Astro sites too, but I didn’t see any mention of it at all in the Astro Discord. I posted a question about it and got crickets until today, when a contributor named Armand responded with several helpful bits of information.

First off, he mentioned that instead of sanitize-html, he himself uses ultrahtml to handle the html cleanup and parsing. He even provided an example of it in use for me: astro-blog-full-text-rss/src/pages/rss.xml.ts at latest · delucis/astro-blog-full-text-rss.

He went on to test my issue and pointed out that my code was calling the async function post.compiledContent() without await, and this was breaking htmlparser2 as it received a Promise object from sanitize-html, which was passing it on after receiving it from me:

content: globalImageUrls(
site.url,
sanitizeHtml(post.compiledContent(), {
allowedTags: sanitizeHtml.defaults.allowedTags.concat(["img"]),
}),

I’m not 100% sure why this wasn’t breaking in htmlparser2 8.0 and prior, but in fact it WAS resulting in the expected content not being output at all. Instead, where the full-text content should have been for each post in the RSS feed, was an empty <content:encoded /> tag. So the fact that I was not awaiting the return result of the async post.compiledContent() already WAS causing me issues, and I hadn’t even noticed.

At any rate, by the time he informed me of how stupid I was (he didn’t phrase it that way, but I am), I’d already implemented ultrahtml and moved on.

src/pages/rss.xml.js
import rss from "@astrojs/rss";
import { experimental_AstroContainer as AstroContainer } from "astro/container";
import { getCollection, render } from "astro:content";
import { transform, walk } from "ultrahtml";
import sanitize from "ultrahtml/transformers/sanitize";
import { rfc2822 } from "../components/utilities/DateFormat";
import site from "../data/site.json";
export async function GET(context) {
let baseUrl = site.url;
if (baseUrl.at(-1) === "/") baseUrl = baseUrl.slice(0, -1);
const container = await AstroContainer.create();
const posts = (await getCollection("posts"))
.filter((post) => post.data.draft !== true)
.sort(
(a, b) =>
new Date(b.data.date).valueOf() - new Date(a.data.date).valueOf(),
);
const items = [];
for (const post of posts) {
const { Content } = await render(post);
const rawContent = await container.renderToString(Content);
const content = await transform(rawContent.replace(/^<!DOCTYPE html>/, ""), [
async (node) => {
await walk(node, (node) => {
if (node.name === "a" && node.attributes.href?.startsWith("/")) {
node.attributes.href = baseUrl + node.attributes.href;
}
if (node.name === "img" && node.attributes.src?.startsWith("/")) {
node.attributes.src = baseUrl + node.attributes.src;
}
});
return node;
},
sanitize({ dropElements: ["script", "style"] }),
]);
items.push({
title: post.data.title,
link: `${baseUrl}/${post.id}`,
pubDate: rfc2822(post.data.date),
description: post.data.description,
customData: `<summary>${post.data.description}</summary>`,
content,
});
}
return rss({
title: site.title,
description: site.description,
site: context.site,
xmlns: {
atom: "http://www.w3.org/2005/Atom/",
dc: "http://purl.org/dc/elements/1.1/",
content: "http://purl.org/rss/1.0/modules/content/",
},
items,
});
}

The bottom line is, if you are doing full-text RSS feeds in Astro and you do use sanitize-html or ultrahtml, don’t be like me and send a Promise to things that want the Promise’s returned object instead.