Optimizing for Shared Hosting

Author: Jeff Anderson

Shared hosting, no matter how their packages are painted, have limits. Staying away from hogging too much CPU, memory, and other resources can ensure the longevity and performance of your shared hosting account. If you are just looking into how to build a site, or if you already have a very busy site on shared hosting, these guidelines can help you get the most out of your shared hosting account before making the switch to more expensive hosting. One of the goals in this post is to encourage "good neighbor" practices that will ensure you aren't disrupting fellow users on the server that hosts your account. This also ensures that you won't get any of these principles also apply to other types of hosting, but this is written with shared hosting in mind.

Common bottlenecks

With modern shared hosting, you generally have plenty of available disk space and bandwidth. The most common bottlenecks encountered by a site in shared hosting are CPU and databases. Memory usage is a less common bottleneck, but it can happen depending on how well or how poorly your site's code is written.

On-the-fly page generation takes quite a bit of CPU compared to serving flat html files. When that page generation depends on a database system that is shared, you have now opened yourself up to both of the common bottlenecks. If the CPU usage spikes on your shared server due to any of the other users you are sharing with, your scripts will appear to run much more slowly than normal. If any one of the other users are hammering the database server that you are using, that's another hit your site will be taking. Occasional "database connection" errors are indicative of problems connecting to the database server.

Also, if your site happens to be the cause of the CPU spike or the database hammering, you may both experience and cause these types of problems for you and everyone else on the server

If you need a memory-intense application to run, shared hosting may not be the best choice for your web application. If you can off load memory-intensive processes to your home computer, or to other types of hosting, you may still be able to have your main website hosted on the shared hosting provider of your choice.

Processing

To get the best website performance out of your site, you should limit the use of your shared hosting account to, well, your website. Many photo and video gallery web applications exist that can resize or re-encode uploaded content as needed. This type of operation should be done on your content before uploading, and have the result uploaded instead of expecting your shared server to do it for you.

Trim the fat

Remember that you are using shared hosting. Enabling lots and lots of plugins, or adding lots of miscellaneous functionality to your site may produce an extremely functional site, but you'll be pressing up against the practical limits of your shared hosting account with only a few users simultaneously using your feature-rich, bloated site. Turn those extras off unless they are fundamentally needed for the functionality of your site.

Another possible bottleneck is the classic one-process-per-request problem. If possible, use FastCGI or a similar one-process-handles-many-requests model. PHP can be run in FastCGI mode, where the php interpretor stays in memory. Python applications that support the WSGI interface can run in FastCGI mode with the flup adapter.

Databases

Database theory and design is a branch of Computer Science all on its own. In the context of shared hosting, if your application uses a database, you can follow a few guidelines to get the most out of your account.

Keep your databases small. This doesn't mean you should post less content. Don't store things that aren't needed. Two examples of storing unneeded information include sessions and error logs. It is fine to store these things, but if they are never cleared out, that can become a problem.

Keep your queries sane and simple. If you are designing a web application, make sure that any columns that you will be doing queries on are indexed. Doing searches on unindexed columns will take a long time with the more rows that you get. Don't implement a search algorithm that simply does a LIKE against all your posts in your blog. As your database gets bigger, this will be harder and harder on the database each time someone searches. For really big datasets, using a full-text SQL search engine, like Sphinx, is the way to go, but that usually means that your site is likely big enough that you've already graduated from shared hosting anyway. Also, keep in mind that any queries that you do a JOIN on can actually multiply the number of rows that need to be examined.

Even if you are a nice neighbor in your usage of the database, that doesn't mean that others can't ruin your experience. If you've ever shared a hot water heater with a neighbor, you know what I'm talking about. Fortunately, with caching, you can reduce your site's dependence on the shared database resource.

Caching

Server-side caching addresses the symptoms caused by both of the most common bottlenecks experienced on shared hosting. There are several types of caching that can happen, and which one is appropriate depends on your specific site. The best method for shared hosting in most of the cases, however, is by far static file caching. The idea behind static file caching is simple.

The webserver receives a request for a URL. It checks to see if the file that corresponds with that request exists. If it does not, it passes the request to your script. Every single time your script is invoked to generate the html for this request, it will save the result to the appropriate location. If the content ever changes, the file is deleted.

If your site has additional functions, such as a member login, you would have to add additional logic that checks for the presence of a member-logged-in cookie sent with the request. Some sites may find this trick only necessary for their main page, or a specific popular article. Every time you have a site hosted on shared hosting that gets slashdotted, this is a must.

Most shared hosting environments have Apache, and this can be accomplished using mod_rewrite. Wordpress has a plugin called wp-supercache that employs this very technique. There is a Django project called django-staticgenerator that helps authors of Django sites create and delete the static html files as necessary.

Another type of caching is key/value based caching. Applications may store any piece of information in a cache, big or small. A common practice is to query the database, and then store the result in the cache with the full query as the key. Other types of calculated results can be cached as well-- generated images, compiled template engine objects, and other expensive-to-generate values can go into a key-based cache system. This may include entire responses for a particular URL, but the difference is usually that when stored in a key-based caching system, the webserver isn't configured to serve those directly.

Key-based caching can be accomplished in a variety of ways. They values can be stored in flat files, in a database, in memcached, or even stored in memory inside of the web application process.

Working Smarter

Another thing that can be done to help lighten the load on your server or on your account is to get upstream caches to work more nicely with your application. Upstream caches include browser caches, as well as shared caches that some organizations implement. There are three types of HTTP headers that help with accomplishing this: Etags and modification dates.

Etags and modification timestamps are just different ways for various user-agents to do "conditional get" requests. They can provide an etag or last known modification time, and the server can return a "304 Not Modified" response if it hasn't changed. If it has changed, then it answers the request normally.

Applying Best Practices

Since I'm a Django developer, I'm most accustomed to implementing these optimizations in my Django projects. Fortunately, Django makes a lot of this easy to do with the framework tools that it provides. Since one of its aims is to promote best practice, It's usually a matter of just connecting the dots.

Django has a built-in caching framework that can be configured to use various backends. Honestly, even though many hosts and developers will tell you that memcached doesn't make any sense for shared hosting, I highly recommend it if you can reliably launch and run a daemon on your shared host. The potential problem with running memcached is that you may end up using too much memory quite easily. Make sure you set the maximum memory limit with the -m option, and force a cache timeout time that isn't too terribly long. I also recommend that you run memcached listening on a UNIX domain socket file rather than on a port. If you run it on a port (even a localhost port) then just anyone could connect and store information.

Static file caching is fairly easily implemented in Django with the help of a pluggable app, django-staticgenerator. It uses either a middleware or signal to generate the static html files, and uses triggers to delete them as necessary. It doesn't include an example for Apache/mod_rewrite, but it's fairly straightforward to do. I also recommend creating a cron job that deletes any files that survive past a certain age, or ensures that your file count for statically generated files doesn't go beyond a certain number. Many shared hosts track file count, and ending up with hundreds of thousands of files can be a problem. I recommend using the middleware method for generating the static files rather than the generate-on-save method.

Django 1.1 and newer includes decorators for your views that make it easy to add support for etag and modification timestamps. You simply write functions that can generate an etag or determine the last modification time for a given resource, and use those with the decorator.

And Beyond

Sometimes shared hosting just isn't enough anymore. Even after all the optimization you can muster, sometimes, you just need to look into more beefy hosting, however I wouldn't recommend making the switch until you have done all the optimizations that are practical.

I plan to do some "case studies" of some specific Django apps, and post some specifics that can be done for those cases. Please let me know if I've left anything out, or if I'm just plain wrong anywhere.

Posted: Jun 10, 2010 | Tags: Django Open Source Bluehost Hosting FastCGI
  • memo said at Jun 11, 2010:
    gravatar

    This is quite thoughful. I wish I had read this before choosing my host provider. Thanks for sharing anyway.


Comments are closed.