Sorry if it's a dumb question but I'm new to express and MySQL and I couldn't find a solution anywhere.
I need to create a MySQL query based on some user input filters from an HTML form.
The parameters are something like this:
let sDate = req.body.startDate;
let eDate = req.body.endDate;
let param1 = req.body.param1;
let param2 = req.body.param2;
let param3 = req.body.param3;
let param4 = req.body.param4;
A normal query that I will use if all params are not null will be
const query = `
SELECT * FROM table
WHERE
date BETWEEN ? AND ?
AND col1 = ?
AND col2 = ?
AND col3 = ?
AND col4 = ?;`
db.query(query, [startDate, endDate, param1, param2, param3, param4], (e, rows)=>{});
But every single parameter can be null, so there are a lot of possible filter combinations.
Is there a way to query the db with one single query? I could handle it in many if-else statements but it feels wrong.
Thanks for your help in advance!
For your col1 = ? pattern you can use (? IS NULL OR ? = col1).
For your date matching you can do this.
WHERE date <= IF( ? IS NOT NULL THEN ? ELSE '2999-12-31')
AND date >= IF( ? IS NOT NULL THEN ? ELSE '1000-01-01')
But notice that parameters appear twice. So your query is
const sql = `
SELECT *
FROM tbl
WHERE date <= IF(? IS NOT NULL, ?, '2999-12-31')
AND date >= IF(? IS NOT NULL, ?, '1000-01-01')
AND (? IS NULL OR col1 = ?)
AND (? IS NULL OR col2 = ?)
AND (? IS NULL OR col3 = ?)
AND (? IS NULL OR col4 = ?);`
And you'll invoke it like this.
const params = [startDate, startDate,
endDate, endDate,
param1, param1,
param2, param3,
param2, param3,
param4, param4]
db.query(query, params, (e, rows)=>{});
But beware: this kind of query is a notorious edge case for the query-optimization software in the database server. It can't tell what you want well enough to use table statistics to choose the right index. Still, be sure to put an index on
(date, col1, col2, col3, col4)
for this query. You're likely to have a date range.
This may be slow, especially if that table is really large. You may be better off building up a custom query from strings each time you handle the request. The query optimizer understands stuff like this better than stuff with that ? IS NULL OR clause in there.
WHERE date <= ?
AND date >= ?
AND col2 = ?
AND col3 = ?
You can use a LIKE clause with a combination of the COALESCE() function and the NULLIF() function to prevent a NULL value from being passed to the LIKE clause
const query = "SELECT * FROM accounts WHERE name LIKE COALESCE(NULLIF(?, ''), '%')";
const values = [name];
I'm having trouble binding the results from SQL to my Model in Javascript /Typescript.
In my model, I have a property of created_at with types of Date.
When I use JSON functions in the Postgres SQL statement to avoid duplicate parent rows for a relationship, I will get a different format for the timestamp.
Here is a simple example
SELECT
a.*,
(
SELECT
ROW_TO_JSON(profile)
FROM
(
SELECT
*
FROM
profile p
WHERE
p.account_id = a.id
)
profile
)
AS profile
FROM
account a
WHERE
a.id = 16
And here are the results in JSON
{
"id":16,
"email":"test#gmail.com",
"password":"$password",
"role_id":0,
"created_at":"2020-04-01T22:03:44.324Z",
"profile":{
"id":8,
"username":"firmanjml",
"gender":0,
"bio":null,
"avatar":"www.firmanjml.me/test.jpg",
"account_id":16,
"created_at":"2020-04-02T06:03:44.32498"
}
}
I noticed that the parent row which is from the account table has the Z at the end of created_at whereas the child table that is converted to JSON has a different timestamp format.
Is there a way that I could make all the timestamp be in Javascript format?
Query to create schema and insert data
CREATE TABLE "account"(
id SERIAL primary key,
email varchar(50) not null,
password varchar(50) not null,
role_id int2 default 0 not null,
created_at timestamp default now() not null
);
CREATE TABLE "profile"(
id SERIAL primary key,
username varchar(50) not null,
gender int2 not null,
bio varchar(50),
avatar varchar(50),
account_id integer not null REFERENCES account (id),
created_at timestamp default now() not null
);
INSERT INTO "account" (email,"password","role_id",created_at) VALUES
('test#gmail.com','$password',0,'2020-04-02 06:03:44.324');
INSERT INTO "profile" (username,gender,bio,avatar,account_id,created_at) VALUES
('fimrnajml',0,NULL,'www.firmanjml.me/test.jpg',1,'2020-04-02 06:03:44.324');
Use the TO_CHAR() function to format the timestamp in your SQL, like https://www.postgresqltutorial.com/postgresql-to_char/
A format of 'YYYY-MM-DD"T"HH24:MI:SS.US"Z"' should do it. This assumes all your timestamps are in UTC (the way the pros do it :-)
Your SQL then looks like:
SELECT
a.*,
(
SELECT
ROW_TO_JSON(profile)
FROM
(
SELECT
username,gender,bio,avatar,account_id,to_char(created_at, 'YYYY-MM-DD"T"HH24:MI:SS.US"Z"') created_at
FROM
profile p
WHERE
p.account_id = a.id
)
profile
)
AS profile
FROM
account a
WHERE
a.id = 16
try it here
I have a lot of tables with the same _ips endings. Example:
first-domain.com_ips
secons-domain.com_ips
...
I'm trying to get UNION result table which will contains all rows from all _ips tables. For this I use:
SELECT id, expirationDate FROM `first-domain.com_ips` WHERE isBlocked = 1
UNION
SELECT id, expirationDate FROM `secons-domain.com_ips` WHERE isBlocked = 1
...;
I have an array which consists of domain names. So I'm looking for a way to use this domain array in SQL query. Maybe something like we use with SELECT. Example:
const ids = [3, 4, 6, 8];
const query = 'SELECT * FROM table WHERE id IN (' + ids.join() + ')';
Is there a way to use array for tables names in SQL? Thank you in advance!
You can do this by using dynamic queries and regexps:
This dynamic query does what you want :
SELECT
GROUP_CONCAT(
CONCAT(
'SELECT * FROM `',
TABLE_NAME,
'`') SEPARATOR ' UNION ALL ')
FROM
`INFORMATION_SCHEMA`.`TABLES`
WHERE
`TABLE_NAME` REGEXP '_ips$'
INTO #sql;
SELECT #sql;
PREPARE stmt FROM #sql;
EXECUTE stmt;
This query gives two outputs, first is the final SQL query string which looks like :
SELECT * FROM `first-domain.com_ips` UNION ALL SELECT * FROM `second-domain.com_ips`
and the other output is the actual data from all tables, if you want only the final data, you can remove this statement:
SELECT #sql;
I want to pass a table as parameter on an ajax callback procedure in Oracle APEX 5, because I need to make an SQL query on that table.
The SQL process is stored as shared component inside the Apex 5 application. Screenshot
My procedure is like this
(procedure name: THIS_PROCESS)
declare
v_tablename varchar(128);--max table_name lenght
v_ID number;
v_somevar
BEGIN
SELECT Columname,
INTO v_somevar
FROM v_tablename
WHERE ID = v_ID;
--Do stuff
END;
This code (FROM v_tablename) gives me a compilation error:
ORA-00942: table or view does not exist ORA-06550: line 9, column 5:
PL/SQL: SQL Statement ignored
I'm a total newbie. I had been reading that I should call that procedure with this javascript:
apex.server.process ( "THIS_PROCESS", {
x01: "TABLENAME",
x02: "Row_ID",
pageItems: "#P1_Item,#P2_Item"
},{
success: function( pData )
// do something here
}
} );
I do not understand why I should pass x01 and x02 instead of v_tablename and v_ID
Do x01 and x02 automatically are assigned to v_tablename and v_ID?
Here's an example page process THIS_PROCESS of type "Ajax Callback". Note that you need Dynamic SQL to select from a table name that isn't hardcoded.
declare
v_table varchar2(128) := apex_application.g_x01;
v_id number := apex_application.g_x02;
v_somevar varchar2(100);
v_sql varchar2(4000);
begin
-- validate v_table parameter to avoid sql injection. will throw exception if it fails
select table_name into v_table from all_tables where table_name = v_table;
v_sql := 'SELECT Columname
FROM ' || v_table || '
WHERE ID = :A1';
execute immediate v_sql into v_somevar using v_id;
-- do something with v_somevar
end;
Do be careful with this sort of thing - this design will allow a malicious user to write their own javascript function which can pass any table name that it likes to your procedure.
You need to use dynamic sql:
declare
v_tablename varchar(128);--max table_name lenght
v_sql varchar2(1000);
v_ID number;
v_somevar varchar2(100);
BEGIN
v_sql := 'SELECT Columname FROM ' || v_tablename || ' where ID = :1';
EXECUTE IMMEDIATE v_sql INTO v_somevar USING v_ID;
--Do stuff
END;
/
I'm wondering if anyone knows of a way to measure string similarity in BigQuery.
Seems like would be a neat function to have.
My case is i need to compare the similarity of two urls as want to be fairly sure they refer to the same article.
I can find examples using javascript so maybe a UDF is the way to go but i've not used UDF's at all (or javascript for that matter :) )
Just wondering if there may be a way using existing regex functions or if anyone might be able to get me started with porting the javascript example into a UDF.
Any help much appreciated, thanks
EDIT: Adding some example code
So if i have a UDF defined as:
// distance function
function levenshteinDistance (row, emit) {
//if (row.inputA.length <= 0 ) {var myresult = row.inputB.length};
if (typeof row.inputA === 'undefined') {var myresult = 1};
if (typeof row.inputB === 'undefined') {var myresult = 1};
//if (row.inputB.length <= 0 ) {var myresult = row.inputA.length};
var myresult = Math.min(
levenshteinDistance(row.inputA.substr(1), row.inputB) + 1,
levenshteinDistance(row.inputB.substr(1), row.inputA) + 1,
levenshteinDistance(row.inputA.substr(1), row.inputB.substr(1)) + (row.inputA[0] !== row.inputB[0] ? 1 : 0)
) + 1;
emit({outputA: myresult})
}
bigquery.defineFunction(
'levenshteinDistance', // Name of the function exported to SQL
['inputA', 'inputB'], // Names of input columns
[{'name': 'outputA', 'type': 'integer'}], // Output schema
levenshteinDistance // Reference to JavaScript UDF
);
// make a test function to test individual parts
function test(row, emit) {
if (row.inputA.length <= 0) { var x = row.inputB.length} else { var x = row.inputA.length};
emit({outputA: x});
}
bigquery.defineFunction(
'test', // Name of the function exported to SQL
['inputA', 'inputB'], // Names of input columns
[{'name': 'outputA', 'type': 'integer'}], // Output schema
test // Reference to JavaScript UDF
);
Any i try test with a query such as:
SELECT outputA FROM (levenshteinDistance(SELECT "abc" AS inputA, "abd" AS inputB))
I get error:
Error: TypeError: Cannot read property 'substr' of undefined at line 11, columns 38-39
Error Location: User-defined function
It seems like maybe row.inputA is not a string perhaps or for some reason string functions not able to work on it. Not sure if this is a type issue or something funny about what utils the UDF is able to use by default.
Again any help much appreciated, thanks.
Ready to use shared UDFs - Levenshtein distance:
SELECT fhoffa.x.levenshtein('felipe', 'hoffa')
, fhoffa.x.levenshtein('googgle', 'goggles')
, fhoffa.x.levenshtein('is this the', 'Is This The')
6 2 0
Soundex:
SELECT fhoffa.x.soundex('felipe')
, fhoffa.x.soundex('googgle')
, fhoffa.x.soundex('guugle')
F410 G240 G240
Fuzzy choose one:
SELECT fhoffa.x.fuzzy_extract_one('jony'
, (SELECT ARRAY_AGG(name)
FROM `fh-bigquery.popular_names.gender_probabilities`)
#, ['john', 'johnny', 'jonathan', 'jonas']
)
johnny
How-to:
https://medium.com/#hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83
If you're familiar with Python, you can use the functions defined by fuzzywuzzy in BigQuery using external libraries loaded from GCS.
Steps:
Download the javascript version of fuzzywuzzy (fuzzball)
Take the compiled file of the library: dist/fuzzball.umd.min.js and rename it to a clearer name (like fuzzball)
Upload it to a google cloud storage bucket
Create a temp function to use the lib in your query (set the path in OPTIONS to the relevant path)
CREATE TEMP FUNCTION token_set_ratio(a STRING, b STRING)
RETURNS FLOAT64
LANGUAGE js AS """
return fuzzball.token_set_ratio(a, b);
"""
OPTIONS (
library="gs://my-bucket/fuzzball.js");
with data as (select "my_test_string" as a, "my_other_string" as b)
SELECT a, b, token_set_ratio(a, b) from data
Levenshtein via JS would be the way to go. You can use the algorithm to get absolute string distance, or convert it to a percentage similarity by simply calculating abs(strlen - distance / strlen).
The easiest way to implement this would be to define a Levenshtein UDF that takes two inputs, a and b, and calculates the distance between them. The function could return a, b, and the distance.
To invoke it, you'd then pass in the two URLs as columns aliased to 'a' and 'b':
SELECT a, b, distance
FROM
Levenshtein(
SELECT
some_url AS a, other_url AS b
FROM
your_table
)
Below is quite simpler version for Hamming Distance by using WITH OFFSET instead of ROW_NUMBER() OVER()
#standardSQL
WITH Input AS (
SELECT 'abcdef' AS strings UNION ALL
SELECT 'defdef' UNION ALL
SELECT '1bcdef' UNION ALL
SELECT '1bcde4' UNION ALL
SELECT '123de4' UNION ALL
SELECT 'abc123'
)
SELECT 'abcdef' AS target, strings,
(SELECT COUNT(1)
FROM UNNEST(SPLIT('abcdef', '')) a WITH OFFSET x
JOIN UNNEST(SPLIT(strings, '')) b WITH OFFSET y
ON x = y AND a != b) hamming_distance
FROM Input
I did it like this:
CREATE TEMP FUNCTION trigram_similarity(a STRING, b STRING) AS (
(
WITH a_trigrams AS (
SELECT
DISTINCT tri_a
FROM
unnest(ML.NGRAMS(SPLIT(LOWER(a), ''), [3,3])) AS tri_a
),
b_trigrams AS (
SELECT
DISTINCT tri_b
FROM
unnest(ML.NGRAMS(SPLIT(LOWER(b), ''), [3,3])) AS tri_b
)
SELECT
COUNTIF(tri_b IS NOT NULL) / COUNT(*)
FROM
a_trigrams
LEFT JOIN b_trigrams ON tri_a = tri_b
)
);
Here is a comparison to Postgres's pg_trgm:
select trigram_similarity('saemus', 'seamus');
-- 0.25 vs. pg_trgm 0.272727
select trigram_similarity('shamus', 'seamus');
-- 0.5 vs. pg_trgm 0.4
I gave the same answer on How to perform trigram operations in Google BigQuery?
I couldn't find a direct answer to this, so I propose this solution, in standard SQL
#standardSQL
CREATE TEMP FUNCTION HammingDistance(a STRING, b STRING) AS (
(
SELECT
SUM(counter) AS diff
FROM (
SELECT
CASE
WHEN X.value != Y.value THEN 1
ELSE 0
END AS counter
FROM (
SELECT
value,
ROW_NUMBER() OVER() AS row
FROM
UNNEST(SPLIT(a, "")) AS value ) X
JOIN (
SELECT
value,
ROW_NUMBER() OVER() AS row
FROM
UNNEST(SPLIT(b, "")) AS value ) Y
ON
X.row = Y.row )
)
);
WITH Input AS (
SELECT 'abcdef' AS strings UNION ALL
SELECT 'defdef' UNION ALL
SELECT '1bcdef' UNION ALL
SELECT '1bcde4' UNION ALL
SELECT '123de4' UNION ALL
SELECT 'abc123'
)
SELECT strings, 'abcdef' as target, HammingDistance('abcdef', strings) as hamming_distance
FROM Input;
Compared to other solutions (like this one), it takes two strings (of the same length, following the definition for hamming distance) and outputs the expected distance.
bigquery similarity standardsql hammingdistance
While I was looking for the answer Felipe above, I worked on my own query and ended up with two versions, one which I called string approximation and another string resemblance.
The first is looking at the shortest distance between letters of source string and test string and returns a score between 0 and 1 where 1 is a complete match. It will always score based on the longest string of the two. It turns out to return similar results to the Levensthein distance.
#standardSql
CREATE OR REPLACE FUNCTION `myproject.func.stringApproximation`(sourceString STRING, testString STRING) AS (
(select avg(best_result) from (
select if(length(testString)<length(sourceString), sourceoffset, testoffset) as ref,
case
when min(result) is null then 0
else 1 / (min(result) + 1)
end as best_result,
from (
select *,
if(source = test, abs(sourceoffset - (testoffset)),
greatest(length(testString),length(sourceString))) as result
from unnest(split(lower(sourceString),'')) as source with offset sourceoffset
cross join
(select *
from unnest(split(lower(testString),'')) as test with offset as testoffset)
) as results
group by ref
)
)
);
The second is a variation of the first, where it will look at sequences of matching distances, so that a character matching at equal distance from the character preceding or following it will count as one point. This works quite well, better than string approximation but not quite as well as I would like to (see example output below).
#standarSql
CREATE OR REPLACE FUNCTION `myproject.func.stringResemblance`(sourceString STRING, testString STRING) AS (
(
select avg(sequence)
from (
select ref,
if(array_length(array(select * from comparison.collection intersect distinct
(select * from comparison.before))) > 0
or array_length(array(select * from comparison.collection intersect distinct
(select * from comparison.after))) > 0
, 1, 0) as sequence
from (
select ref,
collection,
lag(collection) over (order by ref) as before,
lead(collection) over (order by ref) as after
from (
select if(length(testString) < length(sourceString), sourceoffset, testoffset) as ref,
array_agg(result ignore nulls) as collection
from (
select *,
if(source = test, abs(sourceoffset - (testoffset)), null) as result
from unnest(split(lower(sourceString),'')) as source with offset sourceoffset
cross join
(select *
from unnest(split(lower(testString),'')) as test with offset as testoffset)
) as results
group by ref
)
) as comparison
)
)
);
Now here is a sample of result:
#standardSQL
with test_subjects as (
select 'benji' as name union all
select 'benjamin' union all
select 'benjamin alan artis' union all
select 'ben artis' union all
select 'artis benjamin'
)
select name, quick.stringApproximation('benjamin artis', name) as approxiamtion, quick.stringResemblance('benjamin artis', name) as resemblance
from test_subjects
order by resemblance desc
This returns
+---------------------+--------------------+--------------------+
| name | approximation | resemblance |
+---------------------+--------------------+--------------------+
| artis benjamin | 0.2653061224489796 | 0.8947368421052629 |
+---------------------+--------------------+--------------------+
| benjamin alan artis | 0.6078947368421053 | 0.8947368421052629 |
+---------------------+--------------------+--------------------+
| ben artis | 0.4142857142857142 | 0.7142857142857143 |
+---------------------+--------------------+--------------------+
| benjamin | 0.6125850340136053 | 0.5714285714285714 |
+---------------------+--------------------+--------------------+
| benji | 0.36269841269841263| 0.28571428571428575|
+----------------------------------------------------------------
Edited: updated the resemblance algorithm to improve results.
Try Flookup for Google Sheets... it's definitely faster than Levenshtein distance and it calculates percentage similarities right out of the box.
One Flookup function you might find useful is this:
FUZZYMATCH (string1, string2)
Parameter Details
string1: compares to string2.
string2: compares to string1.
The percentage similarity is then calculated based on these comparisons. Both parameters can be ranges.
I'm currently trying to optimise it for large data sets so you feedback would be very welcome.
Edit: I'm the creator of Flookup.