I’ve been developing for the web long enough — since my foray into PERL and HTML in 1994 — to know that there will always be a certain “rules” developers will adhere to when banging out lines of code. They are usually good rules because it is important for developers to follow them. However, developers need to know when to break from convention, when a higher purpose requires them to sacrifice the brilliance and elegance of their code. I personally have a few favorite examples from my own years of experience and from working with other developers…
The Almighty Framework
Frameworks are great for web development. The Model-View-Controller (MVC) framework, in its many interpretations for languages such as PHP, Ruby and Python, has made building web sites so much faster. And MVC frameworks make for nice clean code and a logical separation of data, business logic, actions and content. The view layer in and of itself is a godsend and the MVC framework has finally provided recognition for all the hard work put in by templating engine developers (i.e. Smarty).
What happens when the framework becomes a performance barrier? While developing our online project management software, Intervals, we’ve come across two main areas where we’ve had to lift the hood and require some of the framework internals.
First, web-based applications can be complex enough that the SQL queries can become cumbersome if they are not fine tuned. Using the default SELECT and JOIN conventions provided by the framework is not always ideal. When you start getting into the granular levels of optimizing SQL queries you have to get your hands dirty at the Model level. This means writing new queries, and tweaking and tuning them until they run as fast as possible. In some circumstances, this sometimes means associating a Model, especially list models, with a database table other than what it was intended. In addition, the framework is not going to optimize your database structure for you. Once the database is built you will need to tune the indices and learn about vacuuming and clustering.
Second, frameworks consume memory as they sift data up from the database, through layers of business logic, actions, and finally, into the view layer. For basic web-based applications that serve up limited information on a page, this is not a big concern and can usually be overcome using memcache if it does become one. However, if your web-based application is churning through a lot of data and presenting it to the user in real time, you will hit memory limits. This can happen, for example, with reports that contain a lot of data over a large date range (probably why Basecamp limits report data to a given number of months). In this case you will get the best performance with the View layer accessing the database directly using cursors. Yeah, I know developers won’t like this, but there comes a time when providing speed to your customers is more important than the framework upon which it is built. Anyways, cutting out the middle man you remove most of the strain on memory and increase the speed of the reports.
The Normalized Database
When we design databases our primary goal is to reduce the redundancy of data through the use of multiple tables, foreign keys, and queries that rely on JOINs. Developers will nitpick over a normalized database until every last bit of redundancy is ironed out. This approach works great for most web-based applications but when traffic increases all of those carefully crafted tables and keys, along with the JOIN-heavy queries begin costing you milliseconds, then seconds, before the app becomes unusable.
The solution is to begin denormalizing data. We called this a “necessary redundancy” at Pelago. You begin by identifying the slowest queries and removing their JOINs by placed the JOINed data in multiple tables. Than it’s up to stored procedures at the database level or developers at the code base level to make sure the redundant data is always kept redundant. The entire database doesn’t have to be denormalized all at once. Just the tables requiring JOINs that are causing you performance issues.
One Database to Rule Them All
Another harsh reality of web development is that the database may become too large and unwieldy for handling the number of people using the web-based application. Sys admins will start throwing around the four-letter word “sharding” as developers begin to cringe. If your app starts growing large enough, sharding may become a necessity for the app to scale. Breaking up your database onto multiple servers and keeping each copy of the database in sync with the others is a laborious task and should be a last resort. However, to dismiss sharding altogether in favor of throwing hardware at the problem is shortsighted. If your web-based application is growing you should be thinking about how you would shard the database if it becomes necessary in the long term. It’s better to have a plan in place before it’s needed than to be scrambling at the last moment to relieve an overloaded web application.
In fact, all of the “rules” I mentioned above should be addressed by web developers at some point if they have plans on scaling their web-based applications. Meanwhile, let’s hear from other web developers out there. What are some of the “rules” you’ve had to break?