Protecting PHP scripts from Cross-site scripting (XSS) attacks
Cross-site scripting (XSS) attacks are one of the biggest threats to dynamic web pages. Up to 80 percent of all web sites are vulnerable to some type of this attack. Yet, many programmers and site owners are simply not aware of this problem.
So, how it works?
Generally, you have some web site which doesn’t filter user input or escape output, then attacker exploits that fact by entering some malicious code (usually javascript, but it can be any kind of code which can be executed on the client side) which is then sent to victims whose browsers executes that code.
There are two types of XSS attacks: stored (malicious code is saved on the server, and then sent to the end users, without proper encoding) and reflected (malicious code is usually sent to the server in GET or POST parameters in http request, and the server returns that code in response, without proper encoding).
Reflected type attacks are more common, and they are often carried out by sending malicious links to the end users. Code, which can be in link itself, or can be located on a third site (in hidden iframe or javascript), will be executed on target site in end user’s browser and steal his cookies, read sensitive data, or whatever attackers wants to do.
Protection
There are many solutions for this problem, but before that, you need to understand some basic principles.
Filter Input, Escape Output
First step is usually reading the data from user input. Here you need to check the maximum length of string, let user to only enter numbers if type of field is numeric, validate e-mails and urls, etc. Don’t encode data, for example, with htmlentities on this point.
Next you’ll probably want to save that data somewhere. To save the data to database, you need to make sure your queries are safe from SQL injection. I’d recommend using PDO and parameterized SQL queries. Second option is using functions like mysqli_real_escape_string, mysql_real_escape_string and pg_escape_string, but make sure that ALL data that came from user is escaped.
Finally, you’ll want to output that data somewhere. That can be PDF file, text file, e-mail, console, or, in most cases, HTML code which will be read by user browser. That’s why we didn’t escape data for XSS before, we don’t know where it will end up. We only need to use this type of escaping for HTML output.
PHP’s magic_quotes, which is deprecated as of PHP 5.3 does just the opposite of what we have now described. It escapes all incoming data with addslashes, and instead of making it more secure, it’s only making things worse. Make sure you have disabled magic_quotes.
Character encoding
It’s very important to choose one character encoding and use it everywhere in your application. UTF-8 is recommended for most web-applications. If you don’t properly specify encodings, some browsers may try to do guess based on content and that can lead to UTF-7 type attacks.
You need to explicitly set meta tag like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
And Content-Type http header like this:
Content-Type: text/html; charset=UTF-8
In PHP that can be done with header function, like this:
header('Content-Type:text/html; charset=UTF-8');
Also, make sure that same encoding is set on database and text files you may use.
Check if functions you use to filter the data are aware of used encoding (i.e. htmlentities function takes charset as a third parameter).
Filter out invalid utf-8 characters before output, because incomplete utf8 multi-byte character may swallow next character, which could be, for example, quote which delimits html attribute, and user code can be executed. I’m using HTMLPurifier‘s cleanUTF8 method for that purpose, but simple regexp match or iconv also should do the work.
Solutions
Let’s see how popular php frameworks handle this issue.
Yii - output escaping with integrated HTMLPurifier
Kohana2 – input filtering / global XSS filter
Kohana3 – input filtering, they recommend output escaping with HTMLPurifier, but it’s not included
CodeIgniter – input filtering / global XSS filter
Zend Framework – custom output escaping
HTMLPurifier is a great solution when you need to display clean HTML that came from untrusted source, but for escaping every piece of data, which won’t be displayed as HTML, is overkill.
Global XSS filtering is a very bad idea, beacuse of the reason we mentioned above, you don’t know in which context the data will be used.
OWASP has good security encoding library, but unfortunately, PHP version is not complete yet. They have a good reference for this matter here.
Now let’s see in which contexts can we insert our data and how to properly encode it.
Html elements and attributes
Examples:
<body><?php echo htmlencode($untrusted_var); ?></data> <input value="<?php echo htmlencode($untrusted_var); ?>" />
While we can in most cases just use php’s htmlentities function, it’s better to write custom wrapper functions, so we can change code only in one place if, for example, we want to add additional filtering or switch to another library.
function htmlencode($str) {
$str = HTMLPurifier_Encoder::cleanUTF8($str);
$str = htmlspecialchars($str, ENT_QUOTES, 'UTF-8');
return $str;
}
This function will encode html characters and prevent breaking the context. If you need to write user data which contains html, HTMLPurifier will do the job.
Important thing here is to always use double or single quotes (I use double quotes) to enclose the attribute value.
Be very careful with attributes which allows scripting – like href, src, style and onclick.
For example, suppose you have a form in which user can enter url to his site. What if he enters something like this:
javascript: alert(1);
You should do input validation and allow only real url address, but the data can come into database from different locations, so it’s wise to make sure such code won’t execute in user browser. Our function above will encode special characters, but that won’t stop javascript code from execution.
URLs
For PHP 5.2+ you could use something like this:
<a href="<?php htmlencode(filter_var($url, FILTER_VALIDATE_URL)); ?>">url description</a>
Parameter values should be url encoded (you can use php’s urlencode function for that purpose).
Again, you can wrap filter_var call into your function, so you can easily later add additional functionality or change library.
CSS
Examples:
<div style="background-color: '<?php echo cssencode($untrusted_data); ?>';"></div>
<style>
div { background-color: '<?php echo cssencode($untrusted_data); ?>'; }
</style>
function cssencode($str) {
$str = HTMLPurifier_Encoder::cleanUTF8($str);
$translate = array();
$chars = array(32, 37, 42, 43, 44, 45, 47, 59, 60, 61, 62, 94, 124);
foreach($chars as $i) {
$translate[chr($i)] = "\\" . str_pad(dechex($i), 2, '0', STR_PAD_LEFT);
}
return str_replace(array_keys($translate), array_values($translate), $str);
}
This will escape dangerous characters with \HH format. Don’t insert untrusted data into non-quoted places and complex properties like url and behavior.
Javascript
Examples:
<div onclick="alert('<?php echo htmlencode(json_encode($untrusted_data)); ?>');">
<script>
alert('<?php echo htmlencode(json_encode($untrusted_data)); ?>');
</script>
json_encode will escape dangerous javascript characters. Be very careful when inserting untrusted user data into javascript. Don’t insert it into non-quoted places and functions like window.setInterval.
These are the most common contexts for web applications, if you need to insert potentially dangerous user data into some other context, make sure you understand how to properly encode the data for that specific context.
Good collection of XSS vectors for testing can be found here.
Posted in Programming
Tags: Escaping, PHP, XSS
Hello!
I’m using htmlpurifier and it’s url encoding characters such as + and ^
1. Are there any potential security risks if I don’t encode these characters?
2. Can I prevent htmlpurifier from encoding these characters?
Thanks so much,
Eric
Hi Nikola,
It’s a great reading! But I still have a question about the URL encoding and URL escaping.
For example, an attacker can attack some application by encoding < as %3c to bypass some filters, but at the same time application defense by escaping the < with %3c to display the < safely. Don't they do the same thing? They both convert the < to %3c.
Many thanks!
Hi Shanshan…
Attacker’s goal is to break the context on output, e.g. to break url context inside a href, he’ll need to inject quote, but not url encoded, because that will do just the opposite – prevent the attack, and that’s what we do…
@Eric
If you need to encode html chunks which contains urls, you should be fine with htmlpurifier, otherwise encode them with the above method.