Called the “Swiss Army knife” of Apache modules, mod_rewrite can be used for everything from URL rewriting to load balancing. Where mod_rewrite and its ilk shine is in abbreviating and rewriting URLs.
One of the most effective optimization techniques available to web developers, URL abbreviation substitutes short URLs like “r/pg
” for longer ones like “/programming
” to save space. Apache and IIS, Manilla, and Zope all support this technique. Yahoo.com, WebReference.com, and other popular sites use URL abbreviation to shave anywhere from 20% to 30% off of HTML file size. The more links you have, the more effective this technique.
How mod_rewrite Works
As its name implies mod_rewrite rewrites URLs using regular expression pattern matching. If a URL matches a pattern that you specify, mod_rewrite rewrites it according to the rule conditions that you set. mod_rewrite essentially works as a smart abbreviation expander. Let’s take our example above from WebReference.com. To expand “r/pg
” into “/programming
” Apache requires two directives, one turns on the rewriting machine (RewriteEngine On
) and the other specifies the rewrite pattern matching rule (RewriteRule
). The RewriteRule
syntax looks like this:
RewriteRule <pattern> <rewrite as>
Becomes:
RewriteEngine On
RewriteRule ^/r/pg(.*) /programming$1
This regular expression matches a URL that begins with the /r/
(we chose this sequence to signify a redirect to expand) with “pg
” following immediately afterwords. The pattern (.*)
matches one or more characters after the “pg
.” So when a request comes in for the URL <a href="/r/pg/perl/">Programming Perl</a>
the rewrite rule expands this abbreviated URI into <a href="/programming/perl/">Programming Perl</a>
.
RewriteMap
for Multiple Abbreviations
That’ll work well for a few abbreviations, but what if you have lots of links? That’s where the RewriteMap
directive comes in. RewriteMaps group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here’s an example map file snippet from WebReference.com.
d dhtml/
dc dhtml/column
pg programming
h html/
ht html/tools/
The MapName file maps keys to values for a rewrite rule using the following syntax:
${ MapName : LookupKey | DefaultValue }
MapNames require a generalized RewriteRule
using regular expressions. The RewriteRule
references the MapName instead of a hard-coded value. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there’s no match, the rule substitutes a default value or a blank string.
To use this MapName we need a RewriteMap
directive to show where the mapping file is, and a generalized regular expression for our RewriteRule
.
RewriteEngine On
RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt
RewriteRule ^/r/([^/]*)/?(.*) $(abbr:$1}$2 [redirect=permanent,last]
The new RewriteMap
rule points the rewrite module to the text version of our map file. The revamped RewriteRule
looks up the value for matching keys in the map file. The permanent redirect (301 instead of 302) boosts performance by stopping processing once the matching abbreviation is found in the map file.
Binary Hash RewriteMaps
For maximum speed you should convert your text map files into binary *DBM hash file, which is optimized for maximum lookup speed. Then the above RewriteMap
line would look like this:
RewriteMap abbr txt:/www/misc/redir/abbr_webref
Automating URL Abbreviation
The above URL abbreviation technique works well for URLs that don’t change very often. But what about news or blog sites where URLs change every hour or every minute? You can create a shell script that automatically scans and abbreviates incoming URLs or use the free open source script available at WebReference.com (http://www.webreference.com/scripts/) that does just that. That’s the abbreviated version of URL abbreviation.
Abbreviating Yahoo.com
Yahoo! uses a similar technique to squeeze nearly 30% off of their home page. Because they manage the busiest page on the Web, Yahoo! takes abbreviation to the extreme. So this expanded URL:
http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/
Becomes this miniscule abbreviation:
r/ww
Yahoo’s webmaster created a mapping file that looks something like this:
r/bu http://dir.yahoo.com/Business_and_Economy/
r/bb http://dir.yahoo.com/Business_and_Economy/Business_to_Business/
r/fi http://dir.yahoo.com/Business_and_Economy/Finance_and_Investment/
r/bs http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/
r/jo http://dir.yahoo.com/Business_and_Economy/Employment_and_Work/
r/ci http://dir.yahoo.com/Computers_and_Internet/
r/in http://dir.yahoo.com/Computers_and_Internet/Internet/
r/ww http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/
r/sf http://dir.yahoo.com/Computers_and_Internet/Software/
r/ga http://dir.yahoo.com/Recreation/Games/Video_Games/
...
So this expanded version:
<font size=-1><b><a href=http://dir.yahoo.com/Business_and_Economy/>Business & Economy</a></b></font><br><font size=-2><a href=http://dir.yahoo.com/Business_and_Economy/Business_to_Business/>B2B</a>,
<a href=http://dir.yahoo.com/Business_and_Economy/Finance_and_Investment/>Finance</a>, <a href=http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/>Shopping</a>, <a href=http://dir.yahoo.com/Business_and_Economy/Employment_and_Work/>Jobs</a>...</font> <br><br><font size=-1><b><a href=http://dir.yahoo.com/Computers_and_Internet/>Computers & Internet</a></b></font><br>
<font size=-2><a href=http://dir.yahoo.com/Computers_and_Internet/Internet/>Internet</a>, <a href=http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/>WWW</a>, <a href=http://dir.yahoo.com/Computers_and_Internet/Software/>Software</a>, <a href=http://dir.yahoo.com/Recreation/Games/Video_Games/>Games</a>...</font>
Becomes this abbreviated version:
<font size=-1><b><a href=r/bu>Business & Economy</a></b></font><br>
<font size=-2><a href=r/bb>B2B</a>, <a href=r/fi>Finance</a>, <a href=r/bs>Shopping</a>, <a href=r/jo>Jobs</a>...</font><br><br>
<font size=-1><b><a href=r/ci>Computers & Internet</a></b></font><br>
<font size=-2><a href=r/in>Internet</a>, <a href=r/ww>WWW</a>, <a href=r/sf>Software</a>, <a href=r/ga>Games</a>...</font>
Note that Yahoo! does not quote URLs, which is invalid but works in most browsers. Yahoo saves nearly 30% off of their home page HTML with this technique. Yahoo also uses subdomains, which further redistributes the load.
Conclusion
Abbreviating URLs with mod_rewrite is one of the most effective techniques available to optimize HTML files. File size savings can range up to 20% to 30%, depending on the number of links in your HTML page. You can combine this technique with URL Rewriting with Content Negotiation for maximum savings. Best used on high traffic pages like home pages, automated URL abbreviation can squeeze more bytes out of critical pages for server-savvy developers.
About the Author
Andy King is the founder of five developer-related sites, and the author of Speed Up Your Site: Web Site Optimization (http://www.speedupyoursite.com) from New Riders Publishing. He publishes the monthly Bandwidth Report, the weekly Optimization Week, the weekly Speed Tweak of the Week, and the semiweekly WebReference Update.
Further Reading
- Apache URL Rewriting Guide
- Ralf Engelschall shows how to use mod_rewrite.
- Case Studies: Yahoo.com and WebReference.com
- Chapter 19 summary of Speed Up Your Site shows how Yahoo and WebReference abbreviate their URLs with mod_rewrite.
- ISAPI_Rewrite
- URI rewriting ISAPI filter for Microsoft’s IIS server, from Helicon Tech.
- modrewrite.com
- Resources on this versatile module.
- mod_rewrite
- Documentation from Apache.
- URLS! URLS! URLS!
- Documents URL rewriting in Apache with mod_rewrite. By Bill Humphries for A List Apart.
- Rewrite URLs with Content Negotiation
- Content negotiation can make your URLs shorter and more abstract. By rewriting URLs without file extensions to the right resources you can save bytes and migration headaches.
- Server-Side Techniques
- Chapter 17 summary of Speed Up Your Site shows how to shunt work to the server to shrink XHTML code. Details URL abbreviation with mod_rewrite, browser sniffing, mod_include for SSI, and form and CGI script optimization.
Thanks for the helpful post. It would be great if you could explain how to do this in IIS as well. For example, can you do the same without having to purchase some software?
Thank you for a great description on URL rewrite.
I applied it to JBOSS and worked like a charm.