Be Careful with .htaccess ReWrite Rules and WordPress

The last few days have highlighted a gap in my knowledge of the WordPress platform.

I use .htaccess rules to block requests to visit pages and to block requests to directly call scripts in WordPress directories when the requests look malicious.

Mostly, those rules are designed to stop bots and hackers from using query string exploits to do nasty little things to my blogs. As it happens, my rewrite directives didn’t completely work in all circumstances. I now know why…. I think.

It turns out that custom rewrite rules must be placed before the WordPress rewrite rules in .htaccess. I figured this out a few days ago but at the time I was unsure about why this is the case.

A Little Digging into WordPress

WordPress has a class called WP_Rewrite which is used to give any URL a pretty appearance when WordPress permalinks are configured for pretty links. Here’s an extract from the WP_Rewrite class reference page; this extract covers the mod_rewrite_rules() function:

mod_rewrite_rules() is the function that takes the array generated by rewrite_rules() and actually turns it into a set of rewrite rules for the .htaccess file. This function also has a filter, mod_rewrite_rules, which will pass functions the string of all the rules to be written out to .htaccess, including the <IfModule> surrounding section. (Note: you may also see plugins using the rewrite_rules hook, but this is deprecated).

Don’t worry if that’s gone over your head. I’ve included it only as proof that I know a little about the subject. Correct me if I’m wrong with any of this.

My understanding is that .htaccess works in the same way as any other Unix based configuration file: the server (Apache) reads .htaccess and processes any directives that are in it. Each directive is read and processed from the top of the file to the bottom of the file, successively.

Whenever Apache receives a request to view a file, Apache reads every .htaccess file in every directory along the path to the requested file before it serves the requested file. Requests to view directories are treated the same way. Apache checks those .htaccess files to see whether access to the requested file or directory is blocked; it checks whether directory browsing is enabled; it checks whether a URL should be rewritten; and it checks for any number of directives that might change the way a request is handled.

One of the Apache modules (plugins) that lets requested URLs be filtered and rewritten is called mod_rewrite. mod_rewrite uses a slightly altered form of regular expressions to match URL patterns and – when a requested URL matches one of those patterns – to rewrite the requested URL or block access to it, as directed.

For example, a rewrite directive to check for URLs that contain the directory “script” which then rewrites the URL without the “script” directory would look like this,

Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_URI} ^(.*)/script/(.*)$
RewriteRule ^(.*)script(.*)$ %1/%2 [L]

In practice, it would change,

http://example.com/script/one.zip

to

http://example.com/one.zip

The custom directives in my .htaccess file are supposed to block known query string exploits and SQL injection exploits.

My custom rewrite rules didn’t always work. I had guessed that WordPress had been altering request URLs before my custom directives got to play with them but it wasn’t until I read the above quote from wordpress.org that I knew what was happening for sure.

Are You Familiar with Sed?

Sed is a command line stream editor that can process an input stream on the fly. Multiple sed instructions can be used to process the same input stream multiple times.

When sed commands are chained together, each sed command processes the input stream in turn. As the input stream passes through each process it is altered in whatever way sed is commanded to alter it by. Subsequent sed commands receive the altered form of the stream from their antecedent.

Try this example to get an idea of what I mean:

echo "I like fish fingers" | sed -e 's#like#hate#' | sed -e 's#fish fingers#apples#' | sed -e 's#apples#sour green apples#'

Changing the above command slightly so you can see what’s happening (in a pseudo way):

input="I like fish fingers" ; echo $input ; input=$(echo $input | sed -e 's#like#hate#') ; echo $input ; input=$(echo $input | sed -e 's#fish fingers#apples#') ; echo $input ; input=$(echo $input | sed -e 's#apples#sour green apples#'); echo $input

The above examples each use three sed commands to change “I love fish fingers” to “I hate sour green apples”. The solid vertical bar is called a pipe. It is used in Linux to convert the output of whatever is before it into the input of whatever comes after it. It’s a little bit like a colon in regular written language: a colon is used to deliver the goods invoiced before the colon.

The second example demonstrates the way the stream is changed from its original form by each subsequent sed process.

mod_rewrite works in a similar way to sed: it processes the modified version of the requested URL. Whatever is done to a URL by the directives before a rewrite condition processes it, is the form the URL takes when a subsequent rewrite rule receives it.

For example, if a rewrite rule changes http://example.com/script.php into http://example.com/no-script-here.php then a subsequent rewrite rule would work on http://example.com/no-script-here.php and not http://example.com/script.php.

This means that any directives in .htaccess need to be written so they apply to a request URL as the request URL appears post any changes made to it prior to it meeting the current directive.

mod_rewrite uses several flags to control the flow of the rules being processed by Apache. One of those flags is the L flag which tells Apache not to process rewrite rules beyond the current one if the current rule matches the URL being processed.

How Does This Affect WordPress Sites?

The WordPress rewrite rules for non multi-site sites look like this:

# BEGIN WordPress

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

They tell Apache to intercept ALL requests and pass any requests for non physical files and directories to WordPress (i.e it causes index.php to execute). Once WordPress has the URL, WordPress does magical things with it like converting it to a pretty permalink.

Notice the L flag after RewriteRule ./index.php ? It tells Apache to stop processing further rewrite directives once the URL has been passed to WordPress.

What Does this Mean?

Whenever you add custom rewrite rules to the root .htaccess file of a WordPress site, either place your rules before the line that reads “# BEGIN WordPress” and be sure your custom rules do not interfere with those used by WordPress, or use the proper WP_Rewrite class to include your own rules within WordPress as a plugin instead of in .htaccess.

What Next?

Don’t know about you but I’m off to get some peaches and ice cream :D

I’m still unsure of whether WordPress somehow returns a URL back to .htaccess once WordPress has finished rewriting it or whether WordPress reads .htaccess and picks up rewrite rules placed after the # END WordPress line to process them internally. The reason I think something strange is happening is that rewrite rules  placed after # END WordPress are sometimes honored by either Apache or WordPress (I don’t know which).

I know rewrite directives continue to be honored after the first matched rewrite condition or rule but, to my understanding, once mod_rewrite finds a match and an [L] flag is set for that match then mod_rewrite should ignore further rewrite directives stated in .htaccess.

I will update this post when I  learn more about how WordPress URL rewriting affects the flow of .htaccess directives. Do you know the answer? Does WordPress read .htaccess? Does WordPress pass a modified URL back to Apache?

In the meantime, I look forward to learning your thoughts on this.

Now I’m going to enjoy my ice cream…. mmmmm raspberry ripple  :D

Sharing is caring!

6
Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
newest oldest most voted
Notify of
elundmark

Another important thing to remember about WordPress w/ pretty permalinks and htaccess is when U have subdomains, together with a WP installation in the root / directory. A 500 error will be generated for each file that isn’t in the subdomain (404). A lot of explaining, but the solution is simple, but it took me a while to figure out though.
Add this at the top of your htaccess file:# Fix wp subdomain 500 (hijacked requests) error# One for each Subdomain. Replace "subdomainfolder"RewriteRule ^subdomainfolder/.*$ - [PT]</code

Thanks for this tip. The code hasn’t quite posted properly so for anyone who didn’t get it, add this to the top of the .htaccess file in the directory that holds the subdomains (with “subdomainfolder” replaced by the actual name of your subdomain folder):
RewriteRule ^subdomainfolder/.*$ – [PT]

Mickey

This is interesting. I wonder if that’s why Joomla puts their permalink (SEF) rules at the end of their suggested Master .htaccess file.

http://docs.joomla.org/index.php?title=Htaccess_examples_%28security%29&direction=next&oldid=62075

It could be. When WordPress initially writes its directives to .htaccess it does put them at the end. I am aware that placing additional rewrite rules before the WordPress rules sometimes creates problems with a few plugins and a few themes that don’t occur when those same rules are placed after the WordPress ones. On those occasions you can either edit the custom rules and keep them before the WP ones or take a chance and move the conflicting rules after the WP ones. I definitely need to do more research on this.

Jim

The [L] flag only stops processing RewriteRules for that “pass” through the .htaccess. If (1) a rewrite did occur, and (2) if the ending URL is different than the starting URL, and (3) if a [L] flag is present — then the new URL will be fed through .htaccess, again, from the top, for another pass at the rules. If (1) and (2) occur, but not (3), then processing on the same journey through the pass will continue, and any new URL will still be fed through the .htaccess again, either at a qualifying [L] further down the line or… Read more »