RFC 5005 Part 1 – Paged and archived feeds? Who cares?
An interview with two passionate RFC 5005 fans on how to handle big Atom feeds
This conversation took almost an hour, so I split it into two shows:
- Part 1 talks mostly about the RFC itself, what it means and why.
- Part 2 goes into personal experiences with the RFC and with syndication in general, in particular in the context of web comics. This is part 1.
The why
When serving most RSS/Atom feed readers today, you have to choose: Do you make a complete feed with all the things you ever published, or do you make a shorter feed with just the latest entries?
This is a trade-off with pros and cons, and it seems like a trade-off you have to make, but a solution to let your Atom feed have the cake and eat it too existed already 13 years ago, if only any of our feed readers would adhere to it: RFC 5005, Feed Paging and Archiving
The what
https://tools.ietf.org/html/rfc5005 was published in September 2007
- The XML namespace for RFC 5005 elements is
http://purl.org/syndication/history/1.0
, aliased as fh
below.
- Section 2 defines the complete feed: It is one document (Atom file) that contains the entire set the feed describes. The document is marked with an
fh:complete
element.
- Section 3 defines the paged feed: It is a series of documents connected with Atom
link
elements with rel
set to the link relations first
, last
, previous
or next
.
- Section 4 defines the archived feed: It has a subscription document that may change at any time, and a series of archive documents that are expected to have stable contents and URIs. The link relations defined are
current
, prev-archive
and next-archive
. The semantics are clearer: prev-archive
refers to previously published entries, and because the contents are stable you can stop when you see a URI to a document you already have. Archive documents are marked with the fh:archive
element.
The who
In this show I’m talking to:
fluffy
Jamey
Conversation notes
- Google Reader was terminated 2013-07-01, all subscription data permanently gone on 2013-07-15:
https://www.google.com/reader/about/
- Mastodon had Atom feeds with paging, but the feeds went away when OStatus went away:
https://github.com/tootsuite/mastodon/pull/11247
- HTML4 does indeed define the HTML link relations:
https://www.w3.org/TR/html4/types.html#h-6.12
It has prev
rather than the previous
of RFC 5005, but mentions that some browsers support previous
as an alias.
- HTML5 also defines the HTML link relations:
https://html.spec.whatwg.org/multipage/links.html
Here previous
is a lower-case must for historical reasons.
- IANA manages the Registry of Link Relations:
https://www.iana.org/assignments/link-relations/link-relations.xhtml
It references RFC 5005 for the Section 4 relations, but not the Section 3 ones.
- RFC 5005 singles out its own Section 3 (Paged Feeds) as the best-effort, loose, discouraged model.
- Section 3:
Therefore, clients SHOULD NOT present paged feeds as coherent or complete, or make assumptions to that effect.
- Section 4:
Unlike paged feeds, archived feeds enable clients to do this without losing entries.
- I’m confused about it in the show, but the RFC is clear that an archived feed has one dynamic subscription document, which points to a chain of immutable archive documents.
- Back in 2002, Aaron Swartz published his joke MIME-header-based RSS 3:
http://www.aaronsw.com/weblog/000574
The cultural context at the time and the rivalry between RSS 0.91+, RSS 1.0, RSS 2.0 and Atom deserves a show of its own.