I’ve never seen SQL injection explained really well, until one of my coworkers did just that. I’m going to try to repeat his explanation here, because SQL injection is something that everyone seems to expect everyone else to just know.
SQL injection (sometimes abbreviated SQLi) is the technical term for getting a form in a web site to run SQL commands when it shouldn’t. Here’s what it is and how and why it works.
Let’s start with the basics: Most modern web sites, such as this one, have two major components. There’s the component that you and I interact with, which is usually comprised of HTML and one or more other scripting languages, like ASP or PHP. HTML handles things like changing the font and placing the font on the screen; languages like ASP or PHP handle logic.
Behind the scenes, there’s usually a database, and the language of that database is usually SQL, which stands for Structured Query Language.
Now let’s say you want to post a comment to something I wrote. You’ll scroll down to the comments form, you’ll type something like “Thanks for explaining that,” and you’ll click “Submit.”
When you click submit, my site will send a command that looks something like this to my database:
INSERT INTO Comments('Thanks for explaining that.');
That simply tells the database to put your comment into a database called “Comments.” The part you typed is in bold. The rest are SQL commands.
But you have to be careful. Some smart aleck can come along and type this into the form:
Tee Hee Hee’); DROP DATABASE WordPress; —
So now the command that my blog sends to its database looks a lot different:
INSERT INTO Comments ('Tee Hee Hee'); DROP DATABASE WordPress; --');
What I typed got injected into the spot where the comment used to go. Now instead of sending one command to the database, my blog sends two. In SQL, “Drop” means “erase,” so entering that command would theoretically cause my blog to self-destruct.
In SQL, “–” is a sequence that indicates a comment, so most SQL systems will ignore anything after the –. So most malicious inputs will end in that sequence to avoid producing errors.
There are several ways to prevent this problem. The easiest is just to filter out apostrophes, semicolons, and dashes. If my code does that, then the command looks like this:
INSERT INTO Comments ('Tee Hee Hee) DROP DATABASE WordPress ');
Now it’s just one command again. Instead of self-destructing, it just posts a comment that doesn’t make any sense.
I once worked someplace that had an internal HR system wouldn’t allow names like O’Malley. The system always filtered it to OMalley. That neatly prevented situations like the popularly cited XKCD “Exploits of a Mom”/”Bobby Tables” comic:
But since they had other systems that did allow apostrophes, they never could keep all of the systems synchronized.
A better way to handle it is to just encode whatever the user types. So then the command looks like this:
INSERT INTO Comments ('VGVlIEhlZSBIZWUnKTsgRFJPUCBEQVRBQkFTRSBXb3JkcHJlc3M7IC0t');
Harmless. The system then has to decode the text when it retrieves it, which adds overhead, but that’s a lot better than self-destructing. If you need an HR system that doesn’t discriminate against people with apostrophes, or a blogging system that allows writers to use semicolons and apostrophes, one solution is to base-64 the data as it goes into the database and unbase-64 it as you pull it out.
That’s SQL injection explained. I hope that clarifies it for you. This is a very popular job interview question, and being able to explain it by writing an actual SQL query on a whiteboard or a sheet of paper would be a good way to nail the question. Cross site scripting is a similar issue that most people tend to overcomplicate.