From Liway Wiki
Given that almost all modern websites are database driven, it is very important to know and understand fully what SQL injections are and how they work. This tutorial will concentrate on PHP, however it should be noted, that SQL injections can affect any language. I'm assuming no prior knowledge of the security problems and solutions regarding SQL injects, and will take users through a range of solutions and their respective flaws. I am to be very detailed in this guide, and provide all the necessary background information, so that by the end, hopefully you will understand SQL injections and how to deal with them perfectly.
Introduction into SQL Injections
What is an SQL Injection?
For databases to be useful, they must take input from the user, and store it for later processing. This means that at some point in time, you will be be accepting user input from an HTML form and using it in SQL queries within a database. Because of the nature of SQL, the data a user inputs may change the meaning of the SQL query which you tried to perform, creating a malicious threat.
Let me demonstrate this with a very simple PHP script, which stores a user's name in a database table called 'users'.
<?php // Connect to MySQL here $name = $_POST['name']; mysql_query ("INSERT INTO users (name) VALUES ('$name');"); ?>
Pretty trivial. Works well in most cases, but a malicious user could very easily destroy your entire database, by entering the following as their name;
John'); DROP ALL TABLES; --
That input has the affect of executing a query which inserts the user John, then another query which drops (ie. deletes) all tables from your database. The SQL code '--' means "whatever follows is a comment", and is used to disregard any more SQL following a variable in SQL code, within a PHP script
In order words, what would be executed would actually be this;
mysql_query ("INSERT INTO users (name) VALUES ('John'); DROP ALL TABLES;");
And that, is called an SQL injection. It literally means extra SQL is being injected into your queries through user-input.
What is affected by SQL injections?
As mentioned, SQL injections can and do affect all languages which communicate with a database. At the heart of an SQL injection, is un-checked user inputted data and is used inside an SQL query.
Likewise, all mainstream databases are affected by SQL injections, as injections are really non-database specific. All the database sees is SQL data coming in, and there is no way for it to tell whether it's been injected or not.
All this means that it is up to you as a programmer to protect your scripts against injections!
Bad history with PHP
Because of the immense popularity of the web, and the way PHP makes it immensely easy to program for the web, it has been used by countless non-security concious people to make all sorts of scripts, which have SQL injections (and other security problems) in them.
This led to widespread problems of SQL injections in a lot of PHP code, and the PHP developers decided to do something, which in hindsight was pretty stupid. They added a feature to PHP called "Magic Quotes GPC", GPC standing for "GET POST COOKIE". This feature, turned on by default on countless servers, automatically performed an "addslashes()" operation on the superglobals GET, POST and COOKIE - which contain incoming user data. (more on addslashes in the next section)
While this may seem like a good idea at first, in actual fact this made problems significantly worse in the long-run, creating a very unstable situation where many servers had this feature enabled and many disabled, meaning there was no standard for developers to work towards. This meant that scripts working on servers with this feature on would break and the reverse would also hold true. This lead to developers working around this feature by performing the reverse "stripslashes()" on all three of those superglobals if this feature was turned on, or the opposite if it was turned off - depending on how they wanted to treat incoming data.
Even worse, this created a generation of developers who didn't care and didn't even learn about SQL injections, since magic quotes gpc simply covered for them, and hence they were led into a false sense of security. Any code which disregards the GPC situation is absolutely non-portable and toxic.
As for PHP 5.3.0 this horrific feature was depreciated from PHP and in PHP 6.0.0 it will be removed for good.
Once again, as the developer, it is solely up to you to protect your scripts from all foreseeable problems. See wikipedia on defensive programming.
How we prevent them from happening
There are several ways to protect from SQL injections. Here I will outline as many as I can come up with, and more importantly describe why you should absolutely not use the majority of them.
An important note to state here, is that often developers make silly assumptions about incoming data. For example, if you have a select box which asks the user their favourite number between 1-5.
<form> <select name="number"> <option>1</option> <option>2</option> <option>3</option> <option>4</option> <option>5</option> </option> </form>
Now you may think, "okey, so the user can only select 1-5, therefore I can safely assume that the input from this particular form field will be a number, and I don't need to worry about it", heck it's not even possible to set a select field to an option which hasn't been defined. A naive user may this this is ok...Wrong!!! Put the above code in a blank .html file, then run it with a browser, and then put the following into the URL bar and press enter:
The lesson to learn here is to, never, ever, ever make any assumptions about the incoming form input.
Restricting Database User Privileges
Once you have a database schema created, it may be temping to remove certain privileges from the username you are using to access the database. Such privileges may be to create and drop tables, alter tables and other significant operations.
While it is true, that this will prevent malicious users from executing the above queries, it is important to remember that they can just as well execute queries to delete all records from any and all tables. In addition, if you have a script like a forum or a wiki installed, chances are, it will need these privileges once in a blue moon, especially when updating the software.
Clearly, this is a bad way to go.
The PHP addslashes() function escapes all single and double quotes, as well as backslashes and null characters in a given variable. Escaping means that those special characters are preceded with another particular special character (a backslash) effectively telling the database to ignore their special meaning, and treat them literally. Since in general an SQL injection involves one or more of those characters, this has histoically been used very extensively as a means to prevent SQL injections, even in PHP's own magic quotes GPC. Re-writing one of the previous examples with addslashes;
<?php // Connect to MySQL here $name = addslashes($_POST['name']); mysql_query ("INSERT INTO users (name) VALUES ('$name');"); ?>
This may seem like a really good solution - and indeed, it is deployed on a lot of scripts around the web...unfortunately so.
Addslashes() has some serious flaws. For one, it only escapes four common characters which are used by common databases as SQL control characters having the capability to escape an SQL string. Confusion aside, the point is that depending on what database you use, it may have other characters capable of escaping SQL or not even use those characters (leading to data corruption). Or future versions of common databases may introduce new such characters. Needless to say, addslashes is a very unreliable and insecure way of dealing with input, because it is an arbitrary solution, which is not at all database dependent - in a situation, where it would very much so help to be database dependent.
But that's not even the worst. Because addslashes is not database specific, it doesn't take into account what character set the database is using. And because PHP traditionally has not supported multi-byte character sets, and this is only planned for version 6, addslashes can screw up multi-byte characters quite badly. To say the least, there is absolutely no consistency between the operations of addslashes() and whatever character set your specific database and/or tables are using.
This exploit was fixed in both MySQL and PostgreSQL, however as I understand it other database systems have been affected by the exact same problem, and not all of them may have fixed it. This is one of those difficult cases where it's hard to decide whether PHP or the DBMS is responsible for fixing the security issue. In a perfect world, both would take equal responsibility, but unfortunately we live in a very different world.
I will show an example of how the above character set inconsistencies represent a serious vulnerability within addslashes()'s ability to sanitize user input. Normal or single-byte characters can be represented by a single hex number, such as 0xFF. For those who don't know, '0x' means "the following is in hex", and FF are two 16bit characters, each representing a 4bit binary number. Put together, these 4-bit numbers make up an 8-bit number or a byte. Most multi-byte character sets use 2 bytes (and sometimes more) instead of one, which means their characters may have 4 hex numbers instead of 2, like this: 0xFFFF.
A character of particular interest to us, is 0xbf27. This two-byte character interpreted as two one-byte characters is 0xbf followed by 0x27 which is ¿' - an upside down question mark followed by a single quote. It also just so happens that 0xbf27 is not a valid multi-byte character in some character sets. Now you might think this makes it very useless...but in fact, because addslashes() can only process single byte characters, it thinks of 0xbf27 as 0xbf followed by 0x27, and incorrectly places a backslash (which is 0x5c) before the single quote (second part of this multi-byte character), effectively changing the character to 0xbf5c27. Bare with me here. All modern databases on the other hand, interpret and use multi-byte character sets by default (and even if some users still use one-byte sets, the large majority have moved to utf8, which is now the default standard), and on certain specific character sets (such as GBK - but there are others) 0xbf5c27 will be read as 0xbf5c followed by 0x27. Now if it just so happens that 0xbf5c is a valid multi-byte character in a specific character set, this leaves just the 0x27 - which if you remember, is a single quote!
So now, by entering this one character in an input field, addslashes changes it in such a way, that the database can potentially see it as another character followed by a single quote - and the malicious user can now inject whatever SQL they want.
As I said previously, this has been fixed, and as I understand it, you can not longer mix two-byte and one-byte characters like that in popular DBMSes - however this should not be an excuse to continue using addslashes. The next time something like this is discovered, your script as well as any other using addslashes may be exploited and your website hacked. Granted it's the worst case scenario, security is getting too lax in the modern world of computing, and this should absolutely signify to you that addslashes is not a safe function to use because of it's inability to work correctly with the database's character set. Stay away from it!
Now it should be obvious, that we must consider two very important things when we wish to escape a string before using it in a query:
- Database specific escape characters
- The character-set the database is set to
The only way to do that, is to use database-specific library functions. As it turns out, mysql, postgresql and sqlite libraries all have a function to do this very thing in a very correct way. These functions work just like addslashes() - they accept one input variable and return the escaped variable, which is ready to be used in an SQL query safely. They take into account all database-specific constants and the character set used by the current connection.
I encourage everyone to check out these functions if you do not already use them, and I highly encourage their use over addslashes().
- MySQL - mysql_real_escape_string
- MySQli - mysqli_real_escape_string
- Pgsql - pg_escape_string
- Sqlite - sqlite_escape_string
The only one drawback, is that it is not immediately portable like addslashes(), however you could very well use your own database abstraction to have a generic escape function, which calls one of those 4, depending on the database used by that particular instance of the script. Or you could use an already made library like PDO (available for all of the above 3 databases on Liway's server).
Although this method is almost full-proof, believe it or not there are still subtle security problems even here. These functions in general use the character set which was used by your database at connection-time. This means that if you change the character set during the execution of your script, it could potentially lead to the same situation as having used the addslashes() function. Now I know what you're thinking - you're very unlikely to do that (but at least now you know to never do that), but still, this is not the most defensive programming one could engage in. Considering the possibility, that a hacker may somehow find a way to change the character set the database uses during your script execution, this could cause problems.
Furthermore, if you use a database system which allows multiple character sets within a datable (perhaps different depending on what each table schema specifies) and you use this functionality, this could also cause similar problems by not having a consistent character-set to use within the database - or even within a table!
Going One Step Further - Prepared SQL Statements
In reality, the best way to prevent SQL injection, is to remove one fundamental requirement of all sql injection attacks - the requirement that user input be used in sql queries. This may seem silly and impossible, but in fact a high-level, tried and tested method exists to do just this. It is called prepared SQL statements.
The idea behind prepared statements, is that you send the SQL query itself and the data used in it separately. This entirely negates the need to escape data, because the database will handle everything, and since data is not being embedded within the SQL statement, there is absolutely no way to change it's meaning and context.
If one can claim there is any drawback to using prepared statements, it may be that it is slightly more complicated and not quite as neat. However that should be an acceptable price to pay for 100% security from SQL injections.
MySQL and SQLite libraries do not support prepared statements, however MySQLi and Pgsql libraries do. In addition, PDO supports prepared statements for all databases.
Here is an example that builds upon a previous one, with MySQLi and prepared statements. Although this exact same principle can be applied to any database libraries which support prepared statements.
<?php // Connect to MySQL here $username = $_POST['username']; $password = $_POST['passwordname']; $age = intval($_POST['age']); $email = $_POST['email']; $stmt = mysqli_prepare($link, "INSERT INTO users (username, password, age, email) VALUES (?, ?, ?, ?);"); $stmt->bind_param('ssis', $username, $password, $age, $email); $stmt->execute(); $stmt->close(); ?>
Slightly more work, but again - this could be minimised with database abstraction if you wish.
As you can see, instead of a query function we have a prepare function, and where you would use a variable, you place a question mark, indicating that a variable will go in that place, in the query. Then you bind parameters to the query. The first argument of the 'bind_param' function relates to the types of variables, "s" meaning string and "i" meaning integer - so "ssis" means two strings, followed by an integer, followed by a string. Finally, you use the variables as the remaining arguments of that function - noting that they must be in the same order as they would appear in the SQL query. Then the 'execute()' function sends both the SQL statement and data to the database server (separately) for execution, and the 'close()' function closes that particular statement-data link between your script and the server.
This is very secure, and by far the best way of executing SQL queries from PHP (and any other language for that matter!). Remember - it is your responsibility as a programer to make sure your scripts are safe. Things may change, and unexpected events may occur, which render almost any escaping function insecure - just like addslashes, but by using this entirely different system of SQL execution, you are sending both the SQL and the data separately to the database - which will certainly always be able to process it in a secure way.
Of course it is not entirely necessary to use this method for every single SQL query - if you need to execute simple queries without user input, you can still use the old query function for versatility, but please remember to use this method when dealing with user input!
I hope I have educated everyone, from the novice to the professional about the most obvious and more subtle things about SQL injections. After reading this article, you should be able to draw an educated conclusion about which technique to use in order to safeguard your scripts from SQL injections.
Just remember that it is very important to keep your scripts secure, that it is very important to make sure your scripts are portable across web-servers which could potentially have different settings and most importantly, that other developers may build upon your work, and it really sucks when everything needs to be re-written, because the previous developer did not do things right.
Good luck with your PHP development, and code safe!