FeedLounge slow this week
For the last week or so, FeedLounge has been vacillating between “moderately slow” and “unusable”. After an excruciating investigation into the bowels of the application, database and server, I believe I have identified the root of the issues with the slowdown of the past week - and thought I would share it with everyone.
Disclaimer: This post might come across as my complete lack of understanding as to how MySQL operates, but let me assure you that I have used MySQL for five+ years, in some heavy enterprise situations.
For the last week, I have been digging into FeedLounge anywhere and everywhere to see if I could uncover the problem. Yesterday, in watching the server using top, vmstat, and mtop (mysqltop), I found the cause.
When designing the FeedLounge database I decided to use MyISAM tables, mainly for the capability of full-text search. I did understand that the MyISAM engine performs table level locking, but this never presented itself as a problem in any of our testing.
It seems that the MyISAM engine has a special condition whereby INSERTs do not lock the table, IF the table has never had a DELETE executed on it. When we moved to our new server, everything was fine until we did some data manintenance a couple weeks ago. The clean-up we did (deleting a few rows) has sent the application in a downward spiral performance-wise.
This morning I caught 40 queries locking on 2 INSERTs, and all the queries had crossed 300 seconds of real time. Obviously, we cannot expect any FL user to wait 5+ minutes for an item from a feed to load - but that is what was happening.
We are currently testing a switch to the InnoDB engine, which does row level locking and also keeps primary keys and data in order (so most queries should speed up for us). This is somewhat counter-intuitive, but FeedLounge needs the scalibility of row-level locking more than the performance of the MyISAM engine.
If you asked me when I started this, I would have told you that a web-based feedreader is just another web application, but that just is not be the case. Because of the input and output sides of a feed reader, the application does nearly as many writes to the database as it does reads from the database. This fundamentally changes the way I tackle problems at the database level.
How did I find this problem? First, *lots* persistence in looking for the problem (since Alex and I both agreed with the users about the slowness). Second, the world’s best MySQL book, High Performance MySQL, by Jeremy Zawodny and Derek Balling. These little caveats are the types of things that the book is loaded with. If you are dealing with a database that is larger than the RAM on your machine by a few orders of magnitude, this book is “The Book” you should be reading.
If our testing shows that we indeed have identified and fixed the problem, we’ll take FeedLounge offline tonight to make the needed changes to the database (it already takes more than an hour to re-load the database, and we’re only in alpha!).
We greatly appreciate the patience our alpha testers have displayed as we work through these growing pains.

THANK YOU!!!!
Yeah.. I could have told you that
.
InnoDb is def the way to go. I havne’t been too impressed with MySQL full text search either.
If you REALLY need it you can just create an overview table and copy the data form the innodb table into the overview table and to your queries there. More IO and memory would be required though.
You could also have a dedicated subcluster that’s identical to your primray cluster with the same tables only running MyISAM thereby allowing full text search.
Replication would work just fine in this scenario btw.
And the INSERT only option is a bit shocking since there’s no way to actually find out if you’ve ever deleted data. :-/. There should be an overview mechanism to figure this out.
I’m not sure how the performance would be compared to just reloading, but you can do myisam to innodb conversions online.
ALTER TABLE TYPE=innodb;
You may also want to look into whether or not it defaults to dynamic or fixed row format. I can’t remember the default, but I suspect you strongly want dynamic (Actually now they I think of it, I think fixed only works if you have no variable length fields).
Anyway, the downside to this is that no inserts can occur while you’re doing this, since it effectively does a table lock, creates the innodb table, copies everything and does a rename and unlock. Might be better than being offline completely though.
The other thing to be aware of is that if you abort it, the rollback will take 5-10x longer than the load. That’s true of a reload or an online conversion unless you break it up into a large number of smaller transactions. This is something they fixed in 5.0.
Wow. Nice find. I look forward to the increased speed. Thanks for the hard work.
Clayton, we would have done that, but we are actually testing on a separate box, so we had to rsync the mysql dump over to the test box.
I had to go back to feeddemon and netnews because, for now, it’s just unusable. It has been trying to load my unread feeds for the last 20 minutes and there are only 60.
I can’t wait till everything is resolved, either by new hardware or new backend.
Please keep us posted.
Database Testing Continues
We’re continuing to test various solutions for the database scale issues we’ve run into. I’ve talked to a couple other developers in this space and they’ve been chuckling and nodding as they read through our previous post (havi…
I don’t have any experience with MySQL whatsoever, but I’ve heard quite a lot of people talking about how MySQL doesn’t really scale and that it should be used only for a small database, like a blog or forum, not something so dynamic as FeedLounge is supposed to be.
Have you ever considered other database servers, like Oracle or SQL Server? Several independent tests show that Oracle is the fastest while SQL Server has the best price/performance.
Considering you only have a couple of hundreds of readers and that you want to provide Feedlounge as a paid service, maybe other platforms are worth a look. And now, while you are upgrading you server, might be just the right time.
We’re looking at a wide variety of alternatives. However, the pricing of some of these databases is rather prohibitive.
…i am waiting my alpha-beta tester ticket!!!! please remember me!!!
Yes, they are expensive, but looking at the SQL Server performance and pricing, it might be a viable solution.
One think that got me going on SQL Server is the Execution Plan, that tells you every single instruction/procedure it ran and how long it took, helping you pin-point performance issues.
A small disclaimer: I have some small experience with Microsoft’s database server and, so far, it delivers.
Wikipedia uses MySQL, and seems to do just fine.
The primary problem is that most databases are tuned to handle a write-heavy application (data entry) or a read-heavy application (like a Wikipedia). On the surface a web-based feed reader may seem like a read-heavy application, but it’s actually both write and read heavy.
Just for fun, run some numbers on your own regarding feed refreshing. Say you have 200 users, each subscribed to an average of 50 unique feeds, and you refresh those every hour. That is 10000 feeds in 60 minutes (7200 seconds).
Some portion of those feeds will return a 304 response, but just as many will be formatted in such a way that you have to compare the items using a checksum instead of relying on their GUID. Your database has to be able to retrieve the current feed data for the feed that is stored in the database, compare that to what has been downloaded, and add new items to the database - in a very short time (< 1 second).
All the while, users want to be able to quickly go through the web interface as they browse through their feed list.
FeedLounge Migration, part duex
I am heading home today, and will return home late tonight to begin the second FeedLounge migration. Seems that we just did this last week, but it was actually almost 2 months ago. Wow.
The first migration was such a smashing success that we are go…
Software Development Guide
I couldn’t understand some parts of this article, but it sounds interesting