I am testing possible SQL injections on my DB, and I am running a simple function to get results which a user should not be getting. The return value is correct based on id, however, the rest of the query is completely being ignored.
I want to return all the data from the data table.
Is there something wrong in my syntax?
Here is my implementation:
function test(id) {
db.query("SELECT * FROM users WHERE id = ?", [id], (err, result) => {
console.log(result[0]);
});
}
const id = "122 UNION SELECT * FROM data";
test(id);
This looks like nodejs Javascript with the npm mysql driver package. And, I guess your id column is defined as an INT or BIGINT, not as some kind of text string.
The way you use the .query() method is the correct way to prevent SQL injection. It's parameterized. That means each parameter in the SQL is represented by a ? placeholder. The second argument to .query() is an array of parameter values to substitute for the placeholders. For your use case the driver generates a query looking like this.
SELECT * FROM users WHERE id = '122 UNION SELECT * FROM data'
and passes it to the MySQL server. The server then takes the string you passed and attempts to interpret it as a number. Due to a quirk in MySQL, it interprets your '122 UNION SELECT * FROM data' string as the number 122, and so looks up WHERE id = 122. (MySQL coerces strings to integers by looking for a leading number. So 123RedLight gives 123, and Hello gives 0. It can be confusing. Other makes and models of RDBMS throw errors when given strings where they expect integers.)
It correctly ignores the rest of your string.
If you wanted to make your code vulnerable to SQL injection (you do not want to do that!) you would write
function test(id) { /* danger: sql injection in next line */
db.query("SELECT * FROM users WHERE id = " + id, (err, result) => { /* wrong ! */
console.log(result[0]);
});
}
This would send
SELECT * FROM users WHERE id = 122 UNION SELECT * FROM data
to the server, and give you your data leak.
You can't do it this way. In fact, not being able to do this is the WHOLE POINT of parameterized queries. It prevents an attacker from giving you a string like 122; DROP Table users; as the input.
Related
I know you can set arguments in a schema to default values but is it possible to make the argument limit argument completely optional in my GraphQL Schema?
Right now it seems like when I hit this without specifying a limit I think that's why I get Int cannot represent non-integer value: undefined
const schema = buildSchema(`
companies(limit: Int): [Company]
...)
I want to be able to skip the limit so that it gets all companies.
In JS, I call it like this:
query: `query {
companies(limit: ${limit}) {
...
but sometimes I don't want to specify a limit. So what is happening is the client is sending crafters(limit: undefined) and it's probably trying to convert that to Int. I'm not sure how to not send limit in and how to make that entire param optional.
(I also read that from the client I should be instead specifying the arguments as variables like query($limit: Int) { companies(limit: $limit) { I guess from my client, from JS? If so how would I send in my limit JS variable into that?
Arguments in GraphQL are nullable (i.e. optional) by default. So if your type definition looks like this:
companies(limit: Int): [Company]
there is nothing else you need to do to make limit optional -- it already is. If you wanted to make limit required, you would make it non-nullable by appending a ! to the type like this:
companies(limit: Int!): [Company]
The errors you are seeing are unrelated to the type of the limit argument. The issue is with the query that you're sending, which based on the error messages, looks something like this:
query ($limit: Int){
companies (limit: undefined) {
# ...
}
}
There's two issues here: One, you are defining a variable ($limit) that you never actually use inside the query (as indicated by the second error). Two, you are setting the limit to undefined, which isn't a valid literal in GraphQL.
Instead of using string interpolation, you should use variables to pass any dynamic values to your query. For example:
query ($limit: Int){
companies (limit: $limit) {
# ...
}
}
If the variable is nullable (notice we used Int instead of Int!), then it can be omitted from the request entirely, effectively making the value of the argument undefined server-side. It's unclear how you're sending your requests to the server, but assuming you're not using some client library, you can check the documentation here for how to structure your request. Otherwise, check your library's documentation for how to correctly use variables with your query.
Below is an example of how you could define a query on client and pass non-required argument. Not sure about your client-side config, but you may want to use a lib like graphql-tag to convert string to AST.
const GET_COMPANIES = gql`
query Companies($limit: Int) {
companies(limit: $limit) {
... // return fields
}
}
`;
I read about that prepare statments are a good way to avoid sql injections to databases.
the problem is the Customer wants a quiet variable UI
where he first selects a table, then some contraints consisting of a column and a text.
So basically the (naive) endproduct will look like this:
Select * from %TABLENAME% where %ATTRIBUTENAME% = %VALUE%
Now the question is how to get this secure?
I could, of course, build a prepare Statement solution where I create statements for all tables in advance, but that sounds to me like a pretty stupid idea, because the effort to maintain this would be quiet big (the customer has quiet a few tables).
Any idea how to solve this in a secure manner that is as generic as possible?
You should change your example to
select * from %TABLENAME% where %ATTRIBUTENAME% = ?
So that at least the VALUE can't be used for SQL injection. You could then have validation for TABLENAME and ATTRIBUTENAME against the known tables and columns in your database.
See DatabaseMetaData.getColumns(...) which you might use in your validation at runtime. Or perhaps you might keep a static file, generated at build time, with the valid tables/columns.
You could generate an Enum at build time for every table/column combination? I know jOOQ does this sort of build time java code generation from a db schema... perhaps it can help?
Eg
public enum TableColumn {
CUSTOMER_NAME("customer", "name"), CUSTOMER_ID("customer", "id"),
ORDER_ID("order", "id"), // etc etc
String table;
String column;
public TableColumn(String table, String column) {
// set values
}
}
public List<Row> doSelect(TableColumn tc, Object value) {
String sql = String.format("select * from %s where %s = ?", tc.table, tc.column);
Connection con = getConnection();
try {
PreparedStatement ps = con.prepareStatement(sql);
ps.setObject(1, value);
...
I am using node.js to connect to a DB2 database and loading different queries. But I would like to use parameters from user input.
My connection string is:
var ibmdb = require("ibm_db")
, cn = "DATABASE=namedb;HOSTNAME=hst;UID=portal;PWD=portalpwd;PORT=50000;PROTOCOL=TCPIP;"
;
But my query is static, I need to pass parameters/variables from user input.
I am using user input with var rl = require('readline'); but the problem now is how to communicate variables to this query and put dynamic parameters not a single value like name, id etc.
var rows = conn.querySync(
"select name,id,uid,password,type from db21.rep_name fetch first 10 rows only"
);
The Node.js package ibm_db is fully documented. There are several ways you could solve the problem, depending on whether you want to have the async or sync version and whether you want to first prepare, then execute the statement.
The simplest option probably is to use querySync as already done by you. There is an optional parameter bindingParameters to pass in an array of values. Those are bound to all places having a ?. Something like the following should work.
var rows = conn.querySync(
"select name,id,uid,password,type from db21.rep_name where name=? fetch first 10 rows only", ['henrik']
);
I need make a scan with limit and a condition on DynamoDB.
The docs says:
In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter). If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements (the number of results returned will be less than or equal to 6).
The code (NODEJS):
var params = {
ExpressionAttributeNames: {"#user": "User"},
ExpressionAttributeValues: {":user": parseInt(user.id)},
FilterExpression: "#user = :user and attribute_not_exists(Removed)",
Limit: 2,
TableName: "XXXX"
};
DynamoDB.scan(params, function(err, data) {
if (err) {
dataToSend.message = "Unable to query. Error: " + err.message;
} else if (data.Items.length == 0) {
dataToSend.message = "No results were found.";
} else {
dataToSend.data = data.Items;
console.log(dataToSend);
}
});
Table XXXX definitions:
Primary partition key: User (Number)
Primary sort key: Identifier (String)
INDEX:
Index Name: RemovedIndex
Type: GSI
Partition key: Removed (Number)
Sort key: -
Attributes: ALL
In code above, if I remove the Limit parameter, DynamoDB will return the items that match the filter requirements. So, the conditions are ok. But when I scan with Limit parameter, the result is empty.
The XXXX table, has 5 items. Only the 2 firsts have the Removed attribute. When I scan without Limit parameter, DynamoDB returns the 3 items without Removed attribute.
What i'm doing wrong?
From the docs that you quoted:
If you also supply a FilterExpression value, DynamoDB will return the
items in the first six that also match the filter requirements
By combining Limit and FilterExpression you have told DynamoDB to only look at the first two items in the table, and evaluate the FilterExpression against those items. Limit in DynamoDB can be confusing because it works differently from limit in a SQL expression in a RDBMS.
Also ran into this issue, i guess you will just have to scan the whole table to a max of 1 MB
Scan
The result set from a Scan is limited to 1 MB per call. You can use the LastEvaluatedKey from the scan response to retrieve more results.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
You might be able to get what you need by using a secondary index. Using the classic RDB example, customer - order example: you have one table for customers and one for orders. The Orders table has a Key consisting of Customer - HASH, Order - RANGE. So if you wanted to get the latest 10 orders, there would be no way to do it without a scan
But if you create a Global Secondary Index on orders of "Some Constant" -- HASH, Date RANGE, and queried against that index, they query would do what you want and only charge you for the RCUs involved with the records returned. No expensive scan needed. Note, writes will be more expensive, but in most cases, there are many more reads than writes.
Now you have your original problem if you want to get the 10 biggest orders for a day larger than $1000. The query would return the last 10 orders, and then filter out those less than $1000.
In this case, you could create a computed key of Date-OrderAmount, and queries against that index would return what you want.
It's not as simple as SQL, but you need to think about access patterns in SQL too. If if you have a lot of data, you need to create Indexes in SQL or the DB will happily to table scans on your behalf, which will impair performance and raise your costs.
Note that everything I proposed is normalized in the sense that there is only one source of truth. You are not duplicating data -- you are merely recasting views of it to get what you need from DynamoDB.
Bear in mind that the CONSTANT as a HASH s subject to the 10GB per partition limit, so you would need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you could use Customer and not a constant as a HASH. Or use STreams to organize the data (or subsets) in other ways.
Small hack - Iterate till you get the results
lastEvaluatedKey = null;
do {
if(lastEvaluatedKey != null) {
// query or scan data with last evaluated key
} else {
// query or scan data WITHOUT last evaluated key
}
lastEvaluatedKey == key of last item retrieved
} while(lastEvaluatedKey != null && retrievedResultSize == 0); // == 0 or < yourLimit
If the number of items retrieved is 0 and lastEvaluatedKey is not null that means it has scanned or queried the number of rows which match to your limit. (and result size is zero because they didn't match the filter expression)
Does node-oracledb escape / sanitize queries? It has parameterized queries via binding:
connection.execute(
"INSERT INTO countries VALUES (:country_id, :country_name)",
[90, "Tonga"],
function(err, result)
{
if (err)
console.error(err.message);
else
console.log("Rows inserted " + result.rowsAffected);
});
I looked in the documentation and took a quick ready through of the source code, but nowhere does it state nor show that it escapes the queries.
If it does not, I was thinking of using a combination of node-mysql as well as copious predicates on the user input and queries before passing to the connection.execute method.
The driver doesn't do the escaping, the database does, but only when you use bind variables rather than string concatenation.
The example you showed is correct and safe.
Here's an example of how to do it the WRONG way which opens you up to SQL injection:
connection.execute(
"INSERT INTO countries VALUES (" + countryId + ",'" + countryName + "')",
function(err, result)
{
if (err)
console.error(err.message);
else
console.log("Rows inserted " + result.rowsAffected);
});
To add to what Dan just said, here is (more or less) what would be the right way:
The SQL query string would now look like this:
'INSERT INTO countries VALUES(?,?)'
(NOTICE! that the question-marks are not in quotes!)
This SQL specifies that the two values to be inserted are "parameters." Therefore, specific values must be "bound to" both of these parameters, each and every time the statement is executed.
The SQL engine will retrieve the values for each parameter directly from whatever data-source has been bound to them, "this time." Therefore, regardless of "what (text) they happen to contain," SQL will never consider them to be "part of the SQL statement, itself."
(And so, if you happen to occasionally find that one or the other of the columns in your countries table contains: "foo; drop table countries", as no doubt they will, you'll know ... ;-) exactly what to do with those ["nice try, Loser!" ...] rows.)
EDIT: As Christopher Jones kindly pointed out in a reply to this post, Oracle uses a different syntax to identify parameters. Nevertheless, the essential idea remains the same: “the SQL query,” as presented to the engine, contains specifications that call for input-values which must be supplied at runtime, each and every time the statement is executed. These values stand entirely separate from the SQL statement itself and will never be mis-construed as being part of it. Use the syntax that is called-for by whatever SQL engine you are using.