Search Engine Friendly URLs with IIS and Classic ASP

This post will show how you can "simulate" the effects of .htaccess and mod_rewrite using Microsoft's Internet Information Server (IIS) and classic ASP.

The Problem

A typical CMS stores its content in a database for easy maintenance. When an end user visits a web page, the content for that page must be retrieved from the database so that it can be displayed. So how does the system "know" which database record should be retrieved for a particular page? The answer is that the URL for that page contains a query string or "parameter" that uniquely identifies the content, e.g.:

http://www.example.com/showitem.php?id=12345

In this example, there is only a single parameter ("id"), but it is quite possible to have URLs with two or more parameters. For example, if you have a piece of clothing that is available in different colors and sizes, you could have something like:

http://www.example.com/showitem.php?id=12345&col=12&siz=34

Unfortunately, search engines have a problem indexing URLs with parameters (A.K.A. "dynamic URLs"); and that's especially true for URLs with multiple parameters. Therefore, if we want all of our pages to be indexed, we need a mechanism that hides the parameters from the search engines and turns them into "static" URLs that look something like:

Step One: 404 Error HandlerLet's use the ASP version of the earlier example with three parameters. In other words, when end users request the page:

http://www.example.com/showitem/12345/12/34/

we will act as if they had requested the page:

http://www.example.com/showitem.asp?id=12345&col=12&siz=34

The first thing to notice is that the "parameterless page" does not really exist, so when end users request it, they will actually trigger an error ("404 File Not Found").

This leads us to the idea that we can use a custom error handler to deal with the problem.

To specify a custom error handler for our web site in IIS, we go to the "Custom Errors" tab of your site's Properties. The default handler for the 404 error will be of type "File" and will point to a file called "404b.htm" somewhere in your Windows directory. We click on "Edit" to specify a new error handler. First, we change "Message type" from "File" to "URL". Next, we enter the absolute URL of the ASP file that will act as our 404 error handler; i.e., we enter "/my404.asp" rather than "my404.asp". Finally, we click "OK" to confirm.

We have now stated that there will be a file called "my404.asp" in the root directory of our site that will deal with "file not found" errors, so our next step is to create one.

How do we know which (non-existent) file has been requested by an end user? Fortunately, that is something that we can easily find out by looking at "Request.QueryString".

If someone requests "http://www.example.com/showitem/12345/12/34", Request.QueryString will contain "404;http://www.example.com:80/showitem/12345/12/34", i.e., the error code "404" followed by a semicolon and the requested URL. (By the way, notice that the URL includes the port number ":80"!)

Now all we have to do is "parse" the URL to find the three "hidden" parameters, and then we can "translate" the requested URL into the actual URL that we will send back to the browser.

A first, extremely "naive" version of our code could be something like:


Dim RQ, P, ID, Color, Size

RQ = Request.QueryString

P = Instr(RQ,"showitem/")

If P > 0 Then
RQ = Mid(RQ,P+9) ' The string "showitem/" contains 9 characters!

P = Instr(RQ, "/")
ID = Left(RQ,P-1)
RQ = Mid(RQ,P+1)

P = Instr(RQ, "/")
Color = Left(RQ,P-1)
RQ = Mid(RQ,P+1)

P = Instr(RQ, "/")
Size = Left(RQ,P-1)

Response.Write "ID: " & ID & ", Color: " & Color & ", Size: " & Size

End If


In reality, we would need much better error handling; what, for example, if the URL does not contain the required number of parameters, or if it does not contain a trailing slash?

For the sake of simplicity, we will respond to these cases by sending a status code of 404 to the browser and stop further processing; we'll do the same when someone requests a completely unrelated (non-existent) page (e.g., http://www.example.com/nosuchpage.htm). This can be done with the following code:


Dim RQ, P, ID, Color, Size, ErrorFound

RQ = Request.QueryString
ErrorFound = False

P = Instr(RQ,"showitem/")

If P > 0 Then
RQ = Mid(RQ,P+9) ' The string "showitem/" contains 9 characters!
P = Instr(RQ, "/")
If P > 0 Then
ID = Left(RQ,P-1)
RQ = Mid(RQ,P+1)
P = Instr(RQ, "/")
If P > 0 Then
Color = Left(RQ,P-1)
RQ = Mid(RQ,P+1)
P = Instr(RQ, "/")
If P > 0 Then
Size = Left(RQ,P-1)
Else
ErrorFound = True
End If
Else
ErrorFound = True
End If
Else
ErrorFound = True
End If
Else
ErrorFound = True
End If

If Not ErrorFound Then
Response.Write "ID: " & ID & ", Color: " & Color & ", Size: " & Size
Else
Response.Status = "404 File Not Found"
Response.End
End If


Step Two: Server.Transfer

So far, we have responded to a (well-formed) URL request by displaying the three parameters ID, Color, and Size. In reality, however, we want to return the page:

http://www.example.com/showitem.asp?id=12345&col=12&siz=34

This can easily be accomplished using Server.Transfer:


Server.Transfer "/showitem.asp?id=" & ID & "&col=" & Color & "&siz=" & Size


(We have to make sure, however, that the file "showitem.asp" itself uses absolute, rather than relative, URLs for graphics, style sheets, etc., otherwise it will point to items in a non-existent directory!)

A Flexible Alternative

The example above deals with a single type of page (clothing items) with three parameters (ID, Color, and Size). Of course, we could expand the code so that it can handle different page types and (perhaps variable) numbers of parameters. As a result, we would be able to use URLs like:

http://www.example.com/showbook/9876/

to display information on books (that have no color or size, just an ID), or:

http://www.example.com/showitem/12345/12/34/5/

for clothing items that have a fourth parameter (e.g., material). However, as you can imagine, the required code could quickly get very messy and hard to debug...

As I was thinking about a way to improve upon this idea, the following thought struck me. What if we were to use the entire query string (after some "basic cleaning", perhaps, like converting it to lower case and removing extraneous characters) to retrieve the associated content from a database; something like:


SQL = "SELECT * FROM MyContent WHERE MyTitle = '" & CleanQueryString & "'"


This would provide us with a very flexible way to display content from our database!

source:evolt.org