Know your redirects!
This is one of the most important chapters in this guide. If you read nothing else, understand and grasp the contents of this chapter.
Your job title does not matter. You can be an SEO, developer, system admin, or consultant. You need to understand redirects.
Why?
Redirects impact server load, site speed, indexability, crawlability, UX, and just about everything else that could break a website.
What we will cover
- URL mapping first
- mod_alias vs. mod_rewrite
- mod_alias basics
- mod_rewrite basics
- Redirect regular expressions and variables
URL mapping first
Before we can talk about how to create redirects we need to talk about URL mapping.
This is a guide about using the .htaccess file for SEO so we will be talking about how the Apache web server works.
What is URL mapping?
URL mapping is the process of matching a uniform resource locator (URL) to a resource (file) in the server filesystem.
Back when websites were built with static HTML pages this was simple. Today URL mapping is much more complex.
URL mapping basics
When your browser requests a URL, your query first passes to DNS servers. Once, DNS lookup has been completed the request is passed to the web server.
The web server then needs to determine what response should be served for that request. There are a number of inputs (like HTTP headers and cookies) but for simplicity, we will just look at the URL.
The most basic way for the Apache server to map URLs is to match the path and file to a directory and file in the DocumentRoot.
Example: Apache URL mapping
Your website uses the domain example.com
.
The DocumentRoot for example.com
on your server is set to the path /var/www/html
.
A request for http://example.com/somepath/file.html
would be matched to /var/www/html/somepath/file.html
in the server filesystem.
This is all pretty simple so far.
It gets more complicated when resources are stored outside the DocumentRoot or when you don't want a direct URL to resource match.
For instance, what happens if you want http://example.com/somepath/file.html
to look like http://example.com/somepath/file
i.e. minus the .html
?
How do CMSs like WordPress and Joomla serve URLs that match nothing in the server filesystem?
The answer to these questions can be found in two Apache server modules mod_alias and mod_rewrite.
mod_alias vs. mod_rewrite
Apache uses two modules to map URLs to resources that are not direct filesystem matches to the requested URL. These modules are modules mod_alias and mod_rewrite.
Each one has its advantages and uses.
Ok, maybe I'm a little harsh on mod_alias.
mod_alias is good at basic redirects and mapping URLs to aliases. However, mod_rewrite holds all the bells and whistles you might want.
mod_alias basics
What can you do with mod_alias?
Although mod_rewrite is superior in almost every way, mod_alias has a few things that it does really well. The first is simple redirects.
If you want to redirect one URL to another, mod_alias is going to be your best option.
Let's look at some of the directives that belong to mod_alias.
Redirect Directive
### Redirect Directive Syntax
# Redirect [status] [URL-path] URL
### Example
Redirect 301 "/old-url.html" "/new-url.html"
The redirect directive syntax is as follows...
- Redirect is the directive.
- [status] identifies the HTTP status code to be served.
- [URL-path] is the path of the URL to be redirected.
- URL is the new URL to be served.
Each of these items after the Redirect
directive is called an argument. Let's take a closer look at each of these arguments.
1. The status argument
There are some things to remember about the status argument.
The status argument is optional.
You can include the status if you want. You can also choose not to include it. However, it should be noted that the default status for the Redirect
directive is 302
. This means that if do not specify an HTTP status, Apache will serve a status of 302
.
You can also specify the status using a keyword status argument instead of a numeric status argument.
There are four keyword status arguments that can be used.
permanent
is the same as301
temp
is the same as302
seeother
is the same as303
gone
is the same as410
The status argument accepts more than just these four options. However, depending on what status you are declaring, the URL-path and URL arguments may or may not be required. See the section about those arguments for more information.
I always specify the status, even if it is 302
. It is more clear and maintainable for future SEOs and developers.
For example, these four redirect statements will have the same effect.
### These redirects are all the same
Redirect "/old-path" "/new-path"
Redirect 302 "/old-path" "/new-path"
Redirect temp "/old-path" "/new-path"
RedirectTemp "/old-path" "/new-path"
I will explain the RedirectTemp
directive later in this chapter.
2. The URL-path argument
The URL-path cannot be a relative URL.
This means that you must include the slash at the beginning of the URL.
The URL-path cannot include the scheme and hostname.
Correct | Incorrect |
---|---|
|
|
3. The URL argument
The URL argument like the URL-path argument must start with a slash.
Correct | Incorrect |
---|---|
|
|
The URL argument can include the scheme and host.
Redirect "/old-url.html" "http://www.example/new-url.html"
If the status argument is between 300 and 399, the URL argument must also be present.
If the status argument is NOT between 300 and 399, the URL argument must also NOT be present.
Correct | Incorrect |
---|---|
|
|
The reason Redirect "/old-path"
is incorrect is because the default status is 302
. Therefore it needs the URL argument.
RedirectMatch Directive
### RedirectMatch Directive Syntax
# RedirectMatch [status] regex URL
### Example
RedirectMatch 301 "(.*)\.pdf$" "$1.html"
The RedirectMatch
directive syntax is as follows...
- RedirectMatch is the directive.
- [status] identifies the HTTP status code to be served.
- regex is the URL-path matched using a regular expression (Regex).
- URL is the new URL to be served.
The above example will create a 301 redirect for all PDF documents to an HTML document with the same path and filename on the host.
It should be noted that RedirectMatch
and Redirect
are the same except for two things.
First, as already stated, RedirectMatch
uses regex to match the URL-path argument.
Second, the URL argument may include a backreference to the regex to capture groups from the URL-path argument.
A backreference is $0
through $9
, and it points back what is captured in the group surrounded by parenthesis.
Let's look at the example again.
RedirectMatch 301 "(.*)\.pdf$" "$1.html"
(.*)
is the first capture group. This means that it can be referenced with the $1
backreference.
We can use multiple capture groups and backreferences. However, remember that they are referenced sequentially from left to right.
RedirectMatch 301 "/media.php?id=(.*)&file=(.*)$" "https://cdn.example.com/images/$1/$2"
This will redirect /media.php?id=168&file=file-icon.png
to https://cdn.example.com/images/168/file-icon.png
.
RedirectPermanent and RedirectTemp
There are two additional directives in mod_alias I want to address. They are RedirectPermanent
and RedirectTemp
.
RedirectPermanent
is an exact equivalent to Redirect permanent
.
RedirectTemp
is an exact equivalent to Redirect temp
.
I personally don't use either the keyword redirect method, RedirectPermanent
, or RedirectTemp
. I prefer to simply include the numeric status code. It looks cleaner and seems more intuitive.
mod_alias has some other important directives. However, they cannot be set in the context of the .htaccess file.
mod_rewrite basics
We were able to do a lot with Redirect
and RedirectMatch
in mod_alias. However, once you see what we can do with mod_rewrite you will begin to look at mod_alias a little like a Yugo.
mod_rewrite is more than a module to create redirects. It is a powerful and capable URL manipulation module. This means that it has the ability to change how URLs are mapped within the Apache web server.
mod_alias can do a lot with it's Alias
and AliasMatch
directives, but they cannot be set in the .htaccess context.
This means that there are a host of things we can do with mod_rewrite that we cannot do with mod_alias.
How mod_rewrite works
Although, most SEO's may not need to know all the details of how mod_rewrite works internally, it is invaluable to troubleshooting errors. It is also wise to grasp the fundamentals of how it works.
Because of the complexity of the system, I will not write out the entire process. However, I have created this flowchart to illustrate the rewrite ruleset processing.
Note: the only two possible outcomes are the eventual serving of a resource, or redirecting to or proxying an external resource. This, of course, assumes that there were no errors. The end result could be a 404 error. In fact, it often is.
Interesting Factoid:
.htaccess files are executed from top to bottom. However, here is an exception to that general rule. You may have noticed it if you looked closely at the chart above.
Before mod_rewrite checks for a RewriteCond
match, the URL is matched against the RewriteRule
Pattern argument. If the RewriteRule
Pattern matches mod_rewrite looks above the RewriteRule
for RewriteCond
directives. If all the RewriteCond
tests are true the RewriteRule
is processed.
Directive order in the .htaccess file.
RewriteCond %{REQUEST_URI} !^/update\.html
RewriteCond %{REQUEST_URI} ^(.*)$
RewriteRule ^(.*)$ https://www.example.com/update.html?path=%1 [R=302,L]
Order directives are processed.
RewriteRule ^(.*)$ https://www.example.com/update.html?path=%1 [R=302,L]
RewriteCond %{REQUEST_URI} !^/update\.html
RewriteCond %{REQUEST_URI} ^(.*)$
You can share this little gem at your next SEO office party... your welcome!
This interesting factoid does have significance. Because the RewriteRule
is evaluated prior to any rewrite conditions you can use a backreference from the RewriteRule
in a RewriteCond
.
Basics and Syntax
Because mod_rewrite is as complex and powerful as it is, we will need to explain a few things, to begin with.
The syntax used by the rewrite module, although more complex than the alias module, is straightforward.
There are five directives you will want to know about. They are...
RewriteEngine
RewriteBase
RewriteOptions
RewriteCond
RewriteRule
There is an additional directive, RewriteMap
. It is one of my favorites. However, it cannot be used in the .htaccess file context.
Out of all of these directives, we will spend the majority of our time looking at RewriteCond
and RewriteRule
. This is because they do most of the heavy lifting, and are the most customizable.
RewriteEngine
directive
The first step to using mod_rewrite is to enable the runtime rewrite engine with the RewriteEngine
directive. This is very simple.
### Syntax
# RewriteEngine on/off
# Turn on RewriteEngine
RewriteEngine on
# Turn off RewriteEngine
RewriteEngine off
Hint:
Don't forget that you can also turn off the RewriteEngine
by setting the on/off argument to off
. This can be handy if you have a large block of rewrite rules you want to disable. It is much faster than commenting out each line with #
.
RewriteBase
directive
It is generally good practice to declare the RewriteBase
in your .htaccess files. The RewriteBase
directive defines the URL prefix to be used for relative URLs.
This is important since mod_rewrite, unlike mod_alias, can accept relative URLs (URLs that don't begin with a slash).
### Syntax
# RewriteBase URL-path
### Example
RewriteBase /
The above RedirectBase
would set the path for relative URLs to the DocumentRoot
. In the latest Apache versions, this is almost always unnecessary. However, I still do it.
RewriteOptions
Directive
To be honest, if you need this directive, you should get a server that gives you root access. It is a little beyond your typical .htaccess file.
It allows you to change the configuration of the mod_rewrite module.
### Syntax
# RewriteOptions Options
### Example
RewriteOptions InheritDownBefore
The Options argument can accept one of a number of predefined options. For instance, the above example uses InheritDownBefore
. This will cause the current rewrite rules to be applied to child configurations before the child rewrite rules.
RewriteOptions
can be helpful in creating maintainable and less verbose configurations. However, it is not likely that you will need to use this directive.
RewriteCond
Directive
The RewriteCond
directive specifies a condition under which the RewriteRule
that follows it will be executed.
### Syntax
# RewriteCond TestString CondPattern [flags]
### Example
RewriteCond %{HTTP_HOST} ^example\.com [NC]
The RewriteCond
syntax is as follows...
- RewriteCond is the directive.
- TestString identifies the string that will be tested.
- CondPattern is the pattern that the TestString will be matched against.
- [flags] are options that define how to the
RewriteCond
should be processed.
The above example uses the HTTP hostname as the TestString. (In the URL https://www.example.com/path/file.html
the hostname would be www.example.com
.)
Because our example CondPattern uses ^
to define the beginning of the pattern www.example.com
would not match since it begins with www.
not example.com
.
Our example also uses the [NC]
(no-case) flag. This means that it will match both example.com
and Example.com
.
RewriteRule
Directive
The RewriteRule
is the foundational element of mod_rewrite. It is what is used to manipulate URLs and create redirects.
It is similar to the RedirectMatch
directive. However, it has a few major differences that will be evident as we go through it.
### Syntax
# RewriteRule Pattern Substitution [flags]
### Example
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
The RewriteRule
syntax is as follows...
- RewriteRule is the directive.
- Pattern is a Perl compatible regular expression (PCRE) used to match against the URL-path. In the context of a .htaccess file, the Pattern is matched against a partial path, based on where the .htaccess file is located.
- Substitution is the string that replaces URL-path that Pattern was matched against.
- [flags] are options that define how to the
RewriteRule
should be processed.
I have chosen to include our previous RewriteCond
in this example.
In this example, the RewriteCond
tests all requested hostnames that to determine if they match the string example.com
. If the test is positive the RewriteRule
is processed.
This RewriteRule
uses the regex capture group ^(.*)$
to capture the entire URL-path minus the possible initial slash. The captured URL-path is then appended to the string http://www.example.com/
using the backreference $1
.
The last part of our RewriteRule
is the two flags. L
is the last flag. This tells mod_rewrite to execute this RewriteRule
and return the results to the client without continuing on to the next RewriteRule
. R
is the redirect flag. This tells the server to redirect the old URL to the new URL not just serve the new URL resource over the old URL. The =301
parameter on the R
flag identifies the type of redirect to use.
The end result of our example is that all request to example.com
will be redirected to the canonical host www.example.com
. This is not the best way to accomplish this. However, it serves as a good example of how RewriteCond
and RewriteRule
work.
Rewrite flags
The last argument for both RewriteCond
and RewriteRule
is [flags]. A flag is surrounded by brackets e.g. [R]
. You can include more than one flag by separating them with a comma e.g. [R,L]
.
There are a number of common flags you will use.
[E]
or [env]
You can use the [E]
flag to set an environment variable. The syntax is as follows...
### Syntax
# [E=VAR:VAL]
# [E=!VAR]
### Example
RewriteRule (.*)\.(pdf)$ - [E=doc:$1]
This example would create an environment variable called doc
if a PDF is requested. If white-paper.pdf
was requested the doc
variable would contain the value white-paper
.
You can unset the doc
variable with the flag [E=!doc]
.
[END]
The [END]
flag terminates the current round of rewrite processing as well as any subsequent processing in the .htaccess file.
[F]
Forbidden
The [F]
flag will cause the server to return a 403 forbidden response code. This can be used to restrict access to sensitive files.
It should be noted that when using the [F]
flag the [L]
is implied.
[L]
Last
The last [L]
is used to stop processing and return the current results of the ruleset.
[N]
Next
The [N]
flag will cause the RewriteRule
to execute again until the Pattern no longer returns a match.
RewriteRule "(.*)A(.*)" "$1a$2" [N]
This example will change run each time it finds an A
and will replace it with an a
.
You can also specify the maximum number of times an [N]
flag can loop like this [N=8]
. The next process will run only 8 times.
[NC]
No Case
The [NC]
flag processes the rule or condition as case insensitive. This is important any time you are handling %{HTTP_HOST}
.
If you include the [NC]
flag, www
and Www
, as well as example.com
and Example.com
, will be handled the same. If you don't include the [NC]
flag, the CondPattern ^www\.
will not match the TestString %{HTTP_HOST}
if Www.
is how the requested host begins.
[QSA]
QS Append
The [QSA]
flag will cause the query string from the request to be appended to the end of the new query string in the Substitution. This is only needed if your Substitution string contains a query string.
If you do not include the [QSA]
flag and your Substitution string contains a query string, the query string from the request will be dropped.
[QSD]
QS Discard
The [QSD]
flag is used to remove the request query string from the target URI when doesn't contain a query string.
[R]
Redirect
The [R]
flag will cause the rewrite rule to be processed as a redirect. If it is not supplied, the URL will be mapped to the new location without redirecting.
You can specify the response code by using the syntax [R=NUM]
. The status code may be any valid status code. For instance, you could use [R=404]
. If the status code is not a 3XX
status code, the Substitution string will be dropped, and the [L]
flag will be implied.
[S]
Skip
The [S]
flag can be used to skip a specified number of RewriteRule
. [S=3]
will skip the next three RewriteRule
.
Redirect regular expressions and variables
One of the most powerful aspects of the rewrite module is the ability to use Perl compatible regular expressions (PCRE) and environment variables.
Understanding the meaning of regex characters and variables is important to writing rulesets.
Common regex characters
Character | Meaning | Example |
---|---|---|
. |
Any single character. | .ish would match fish , dish , wish , etc. but not wash |
+ |
Repeats the previous construct one or more times. | o+ would match o , ooo , ooo , etc. but not book |
* |
Repeats the previous construct zero or more times. | o* would match an empty string, o , ooo , ooo , etc. but not book |
? |
Makes the previous construct optional. | colou?r would match colour and color because the u is optional. |
\ |
Escapes the following character | \. would match . instead of any character. |
^ |
This anchor defines the beginning of the string. | ^o would match any string that begins with o . |
$ |
This anchor defines the end of the string. | o$ would match any string that ends with o . |
( ) |
Matches a group of characters. It also creates a capture group for backreferences. | .*(oo).* would match book , took , oops or any other string containing oo . |
[ ] |
Matches any character from this Character Set. | [dlf]og would match dog , log , fog , but not cog . |
[^ ] |
Matches any character not in this Negated Set. | [^c]og would match dog , log , fog , but not cog . |
Common variables
Server variables are wrapped in %{}
. For example, the HTTP_HOST
variable is written as %{HTTP_HOST}
.
Variable | Value |
---|---|
HTTP_HOST |
The HTTP hostname e.g. www.example.com . |
HTTP_REFERER |
The HTTP referer from the HTTP request header e.g. https://otherwebsite.com/somepage |
HTTPS |
A value of on if the connection is using SSL/TLS, otherwise, it is off . |
REQUEST_URI |
The path of the requested URL. i.e everything after the domain and before the query string or fragment. |
REQUEST_SCHEME |
The scheme of the request. Usually, http or https . |
REQUEST_FILENAME |
The same as the REQUEST_URI in the virtual host context, otherwise the local filesystem path of the resource matching the request. Most shared and could hosting use virtual hosts so it will usually be the same as the REQUEST_URI . |
QUERY_STRING |
This is the query string from a requested URL. |
REMOTE_ADDR |
The IP of the remote host. This is usually the IP of the visitor. |
There are many more variables. However, this is a good introduction. If you need more information about Apache variables you can find that information in the Apache documentation.
Hint:
It is valuable to point out that a RedirectRule
is only passed the URL-path to compare against the pattern. This means that it includes everything after the domain. However, you can use the variables %{REQUEST_SCHEME}
and %{HTTP_HOST}
to include the scheme and host in the pattern.
How backreference works
In the .htaccess file, a RewriteRule
Pattern, as well as a RewriteCond
CondPattern, can contain capture groups. A capture group is anything in a pair of parenthesis. These capture groups can be referenced to include the captured strings in the Substitution or TestString.
Backreferences to a CondPattern capture group are created with %1
through %9
.
Backreferences to a Pattern capture group are created with $1
through $9
.
Backreferences to a capture group within the regular expression are created with \1
through \9
.
In this example, the first rewrite ruleset http://www.example.com
would be redirected to https://www.mydomain.com/example-com
. In fact, it will redirect any requested root domain to https://www.mydomain.com/
followed by the requested domain with the dot before the TLD changed to a hyphen.
This could be handy if you managed a website that sold domains and you wanted to redirect all domain requests to an HTML page without creating individual redirects for each domain.
In the example, the second ruleset would redirect all request that map to a file that doesn't exist or is empty to an error page.
Pro Tip:
mod_rewrite only allows backreferences to other regular expressions in TestString and Substitution. This means that %1
and $1
cannot be used in either Pattern or CondPattern.
You can get around this by passing the contents of %1
or $1
along with your TestString forward with an internal backreference and a delimiter.
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-zA-Z0-9-]+)\.([a-zA-Z]+)$ [NC]
RewriteCond %2-%3::%{REQUEST_URI} !^(.*?)::/\1?
RewriteRule ^(.*)$ https://www.mydomain.com/%2-%3 [R]
In this example, the backreferences %2
and %3
are called in the TestString. They are then captured using the (.*?)
in the CondPattern. We then use \1
in the CondPattern to backreference the (.*?)
capture group, which captured %2-%3
.
This ultimately allows the %{REQUEST_URI}
to be compared with %2-%3
which is not possible otherwise.
There is more
We have only scratched the surface of the world of redirects. The power and sophistication of mod_rewrite is a wonderful thing. It is used often in this guide to solve SEO problems.
I hope this has helped you understand the fundamentals of how redirects work in the .htaccess file.