A look at Citizendium’s Backend

We spoke with Jason Potkanski earlier for his work with Wafaa Bilal’s “Domestic Tension” project. But since we made contact with him, we found out that he was also the technical director of the expert alternativeto Wikipedia, Citizendium. While we had his attention, we decided to talk with him about the Citizendium project and what it takes to handle such an ambitious Web 2.0 project.
Citizendium is a LAPP (Linux,Apache, PostgreSQL, PHP) configuration, he explained. They went with PostgreSQL for a number of reasons, including better scalability. PostgreSQL is an MVCC database. Unlike Wikipedia, Citizendium never has to lock the database for reads and writes. MySQL can do a lot of things quick and replicate them to slave servers, but PostgreSQL excels at complex functions and full features like JOINs and can do complicated categories and full text searches faster than Wikipedia.
“The reason we went with PostgreSQL was threefold,” Potkanski said. “First, to be different from Wikipedia. Second, we already had Greg Sabino Mullane, a core PostgreSQL developer, on board. Finally, we felt from reading various mailing lists over mediawiki development that mediawiki was hitting the ceiling of the features MySQL can provide as a backend.”
There is a performance hit, however, with PostgreSQL. PostgreSQL has longer TCP setup times and reduces the amount of users Citizendium can serve compared to MySQL, Potkanski explained.
(Continued…)


Potkanski explained that Citizendium’s current setup is five servers, with one dedicated database/file server. The database server was the original “pilot” server that ran everything, now relegated to just database duty.
“I am reluctant to call the caesarwiki [the Citizendium wiki code] a fork,” Potkanski explained. “While the code is pretty divergent from stock mediawiki, we still have to rely on mediawiki developers for various issues with the code. I accidentally called the code a fork on the Citizendium forums the brouhaha that ensued was insane. We make sure to return ‘Mediawiki 1.10 (modified)’ as our official version.”
In the meantime, the site, though still a second cousin to Wikipidia, has seen some measure of success. The site handled over 100,000 unique visitors for the month of April, with Potkanski providing a “guestimate” of 2.2 million page views for a total of 30 GB of data traffic.
To tweak the server for optimal performance, Potanski changed the Maxlimits variable inApache to the exact level each server can handle. Keepalive is on and kept to a low 3. Squid listens on external port 80, Apache listens on localhost port 80, so Squid to Apache to the database is a worst case scenario. Load balancing is done via DNS round robin, provided by GoDaddy and BIND 9.
“From a performance perspective, a big gotcha I was dealing with was using a hostname entry for the database rather than the IP address, so each time a web request would come in, it would lookup db.cz.org, connect and do its thing,” said Potkanski. “Because of this, I was getting hung webservers at random times. I was pulling my hair out. Literally.”
“We use Monit as the service monitor and I would get up and down notices every hour or less, because Monit restarted the server. Add to this, that Steadfast had a flaky dns server. Two and two finally came together when I did a ping of yahoo.com from one of our Web servers and noticed it was failing. Ever since then, the only time web servers have locked up is when we are doing a vacuum on the database.”

No comments yet.

Leave a Reply