FeedLounge development: the parser
We have noted in our alpha invitations that we intend for FeedLounge (company, people and application) to be as open as we can possibly be. So along those lines, I will be posting here and on the FeedLounge Blog about architecture, features and development of FeedLounge, so that everyone can see inside the beast, so to speak.
Which feed parser should we use?
When are you building a web based feed reader like FeedLounge, having data to read is step one. Luckily, there are many feed parsers already out there, so the “build vs. buy” decision was fairly easy. Focusing on the development of the user experience of the feed reader, the feed parser part of the application is only a ‘necessary evil’ in the scheme of things. After checking out several possiblities, including using my own Java/SAX framework, we decided on feedparser, the canonical namesake of the feed parsing world. Built by Mark Pilgrim, and currently at version 3.3, this is probably the most forgiving feed parser on the planet. Had I gone with my own solution, I would have spent months and months creating something as good. And with a liberal open source license, I am allowed to use it in a commercial project like this.
feedparser features
- feed format support - v3.3 has impress support of 4 feed formats and 15 different versions of those formats. This probably would have taken a good chunk of time to come up with support for.
- encoding detection - Anyone who has done this understands the difficulty without any explanation.
- tidy support - Want clean HTML content as output? No problem, it’s in there
- translated access between specific terms - If you know channel instead of feed, these are the same thing in feedparser. Use the terms that you are comfortable with.
- relative url support - Useful to us since we are ripping the feed apart to store it. Having no relative URLs is a great relief.
- great documentation - Mark produces some of the best, most-useful documentation in the open source world. feedparser is no exception here. Terse, but covering what you need to know. Need to do 401 auth? Here. Wondering about E-Tag support? There.
- over 2000 unit tests - I may run into some arcane case not covered here, but the likelihood is not very high.
- HTML sanitizing - Extremely useful for a feed reader, to prevent bad things. You don’t want to let someone else’s JavaScript run inside your app. Debugging that would be a nightmare, and maliciousness is also a concern.
- date parsing - Support for every date format they came across. You get a simple date format, consistent from feed to feed.
- It just works!- The best is saved for last, as this point cannot be made often enough. In the months of development so far, feedparser has never been the spotlight of a single problem. The closest we have come to some kind of problem is not checking for the existence of some item before accessing it. feedparser has been a huge net positive on development, with an almost nil overhead. To have alpha testers say that some of the feeds that don’t open in nearly anthing else show up in FeedLounge, that wasn’t us, it was feedparser and its magic voodoo.
Mark, thanks a million. I know you have ‘gone dark’ in the blogging world, but you are still rocking mine.

FeedLounge探秘:Feed解析器
tags:cool,feed,feedlounge,feedparser,web,python,opensource