Order random with seed. Postgresql + Sequelize - javascript

Why does PostgreSQL not support random(<seed>)?
In most dialects the following code works.
const randomSeed = 123;
return Story.findOne({ order: sequelize.fn('random', randomSeed) });
Is there a way to do a seeded random with sequelize & postgreSQL? or do I need to switch back to mySQL?

Related

Sequelize how to return result as a 2D array instead of array of objects?

I am using Sequelize query() method as follows:
const sequelize = new Sequelize(...);
...
// IMPORTANT: No changed allowed on this query
const queryFromUser = "SELECT table1.colname, table2.colname FROM table1 JOIN table2 ON/*...*/";
const result = await sequelize.query(queryFromUser);
Because I am selecting two columns with identical names (colname), in the result, I am getting something like:
[{ "colname": "val1" }, { "colname": "val2" }...], and this array contains values only from the column table2.colname, as it is overwriting the table1.colname values.
I know that there is an option to use aliases in the SQL query with AS, but I don't have control over this query.
I think it would solve the issue, if there was a way to return the result as a 2D array, instead of the array of objects? Are there any ways to configure the Sequelize query that way?
Im afraid this will not be possible without changes in the library directly connecting to the database and parsing its response.
The reason is:
database returns BOTH values
then in javascript, there is mapping of received rows values to objects
This mapping would looks something like that
// RETURNED VALUE FROM DB: row1 -> fieldName:value&fieldName:value2
// and then javascript code for parsing values from database would look similar to that:
const row = {};
row.fieldName = value;
row.fieldName = value2;
return row;
As you see - unless you change the inner mechanism in the libraries, its impossible to change this (javascript object) behaviour.
UNLESS You are using mysql... If you are using mysql, you might use this https://github.com/mysqljs/mysql#joins-with-overlapping-column-names but there is one catch... Sequelize is not supporting this option, and because of that, you would be forced to maintain usage of both libraries at ones (and both connected)
Behind this line, is older answer (before ,,no change in query'' was added)
Because you use direct sql query (not build by sequelize, but written by hand) you need to alias the columns properly.
So as you saw, one the the colname would be overwritten by the other.
SELECT table1.colname, table2.colname FROM table1 JOIN table2 ON/*...*/
But if you alias then, then that collision will not occur
SELECT table1.colname as colName1, table2.colname as colName2 FROM table1 JOIN table2 ON/*...*/
and you will end up with rows like: {colName1: ..., colName2: ...}
If you use sequelize build in query builder with models - sequelize would alias everything and then return everything with names you wanted.
PS: Here is a link for some basics about aliasing in sql, as you may aliast more than just a column names https://www.w3schools.com/sql/sql_alias.asp
In my case I was using:
const newVal = await sequelize.query(query, {
replacements: [null],
type: QueryTypes.SELECT,
})
I removed type: QueryTypes.SELECT, and it worked fine for me.

How would you convert an ObjectID of mongodb to an object in order to store inside an array?

How would you convert an objectID of a MongoDB entry into an object so that you can store it in an array?
Having searched all over the internet, I resorted here as my one last option.
What I want to achieve is, storing IDs into an array. Those IDs are strings. They needed to be converted into objects, if they are to be stored.
How would you do that?
Following is a small snippet of the code for your review:
var user_id_object = JSON.parse(JSON.stringify(user_id));
console.log((user_id_object));
console.log(typeof (user_id_object));
Here' s what I get in the console:
Your server is running on the port number 8080
Connected to the MongoDB
5ee9ce5ded28da51fc4072c8
string
Parameter "obj" to Document() must be an object, got 5ee9ce5ded28da51fc4072c8
What do you think?
Thanks a TON!!
EDIT 1: Got the below error after implementing the Arjun's code:
const mongoose = require('mongoose');
const ID_OF_24_CHARACTERS = '5c6bf11473e216001afa5608' // example
const arrayOfObjectIds = []
if (mongoose.Types.ObjectId.isValid(ID_OF_24_CHARACTERS)) { // validate ObjectId
const id = mongoose.Type.ObjectId(ID_OF_24_CHARACTERS) // converting string to an ObjectId
arrayOfObjectIds.push(id)
}

UDF worker timed out during execution when using javascript udf in BigQuery for tf idf calculation

I have tried to implement a query in BigQuery that finds top keywords for a doc from a larger collection of documents using tf-idf scores.
Before calculating the tf-idf score of the keywords, I clean the documents (e.g. removed stop words and punctuations) and then I create 1,2,3and 4-grams out of the documents and then do stemming inside the n-grams.
To perform this cleaning, n-gram creation and stemming I am using javascript libraries and js udf. Here is the example query:
CREATE TEMP FUNCTION nlp_compromise_tokens(str STRING)
RETURNS ARRAY<STRUCT<ngram STRING, count INT64>> LANGUAGE js AS '''
// creating 1,2,3 and 4 grams using compormise js
// before that I remove stopwords using .removeStopWords
// function lent from remove_stop_words.js
tokens_from_compromise = nlp(str.removeStopWords()).normalize().ngrams({max:4}).data()
// The stemming function that stems
// each space separated tokens inside the n-grams
// I use snowball.babel.js here
function stems_from_space_separated_string(tokens_string) {
var stem = snowballFactory.newStemmer('english').stem;
splitted_tokens = tokens_string.split(" ");
splitted_stems = splitted_tokens.map(x => stem(x));
return splitted_stems.join(" ")
}
// Returning the n-grams from compromise which are
// stemmed internally and at least length of 2
// alongside the count of the token inside the document
var ngram_count = tokens_from_compromise.map(function(item) {
return {
ngram: stems_from_space_separated_string(item.normal),
count: item.count
};
});
return ngram_count
'''
OPTIONS (
library=["gs://fh-bigquery/js/compromise.min.11.14.0.js","gs://syed_mag/js/snowball.babel.js","gs://syed_mag/js/remove_stop_words.js"]);
with doc_table as (
SELECT 1 id, "A quick brown 20 fox fox fox jumped over the lazy-dog" doc UNION ALL
SELECT 2, "another 23rd quicker browner fox jumping over Lazier broken! dogs." UNION ALL
SELECT 3, "This dog is more than two-feet away." #UNION ALL
),
ngram_table as(
select
id,
doc,
nlp_compromise_tokens(doc) as compromise_tokens
from
doc_table),
n_docs_table as (
select count(*) as n_docs from ngram_table
),
df_table as (
SELECT
compromise_token.ngram,
count(*) as df
FROM
ngram_table, UNNEST(compromise_tokens) as compromise_token
GROUP BY
ngram
),
idf_table as(
SELECT
ngram,
df,
n_docs,
LN((1+n_docs)/(1+df)) + 1 as idf_smooth
FROM
df_table
CROSS JOIN
n_docs_table),
tf_idf_table as (
SELECT
id,
doc,
compromise_token.ngram,
compromise_token.count as tf,
idf_table.ngram as idf_ngram,
idf_table.idf_smooth,
compromise_token.count * idf_table.idf_smooth as tf_idf
FROM
ngram_table, UNNEST(compromise_tokens) as compromise_token
JOIN
idf_table
ON
compromise_token.ngram = idf_table.ngram)
SELECT
id,
ARRAY_AGG(STRUCT(ngram,tf_idf)) as top_keyword,
doc
FROM(
SELECT
id,
doc,
ngram,
tf_idf,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY tf_idf DESC) AS rn
FROM
tf_idf_table)
WHERE
rn < 5
group by
id,
doc
Here is how the example output looks like:
There were only three sample handmade rows in this example.
When I try the same code with a little bit larger table with 1000 rows, it again works fine, although taking quite a bit of longer time to finish (around 6 minutes for only 1000 rows). This sample table (1MB) can be found here in json format.
Now when I try the query on a larger dataset (159K rows - 155MB) the query is exhausting after around 30 mins with the following message:
Errors: User-defined function: UDF worker timed out during execution.;
Unexpected abort triggered for worker worker-109498: job_timeout
(error code: timeout)
Can I improve my udf functions or the overall query structure to make sure it runs smoothly on even larger datasets (124,783,298 rows - 244GB)?
N.B. I have given proper permission to the js files in the google storage so that these javascrips are accessible by anyone to run the example queries.
BigQuery UDFs are very handy but are not computationally hangry and make your query slow or exhaust resources. See the doc reference for limitation and best practices. In general, any UDF logic you can convert in native SQL will be way faster and use fewer resources.
I would split your analysis into multiple steps saving the result into a new table for each step:
Clean the documents (e.g. removed stop words and punctuations)
Create 1,2,3and 4-grams out of the documents and then do stemming inside the n-grams.
Calculate the score.
Side note: you might be able to run it using multiple CTEs to save the stages instead of saving each step into a native table but I do not know if that will make the query exceed the resource limit.

How to handle very big numbers in node js?

I am using BigInt(20) datatype for auto Increment id in mysql database.
and when the integer value is so big then how can I handle this as after the number precision of javascript, it won't to allow to insert and read any number Value. So how can I achieve this.
Read about the big-integer libraries but I won't the expected result
Example:-
var x = 999999999999999999999999999999999999999;
How can I print the same number without using its exponential value and any garbage value ?
I tried like that
var BigNumber = require('big-number');
var x = new BigNumber(999999999999999999999999999999999999999, 10);
console.log(x);
Example2:-
If I get the last inserted Id, then how can I handle this value
connection_db.query('INSERT INTO tableName SET ?', tableData,
function (error1, results1, fields1) {
error1){
// Db error
}else{
var lastInserted = new BigNumber(results1.insertId);
console.log(lastInserted);// still wrong value
}
});
You can only pass/show large numbers like that as strings:
var BigNumber = require('big-number');
var x = new BigNumber('999999999999999999999999999999999999999', 10);
console.log(x.toString())
However, in the end, it's up to the MySQL driver how it handles large numbers like this, because it does have to take into account Number.MAX_SAFE_INTEGER.
For instance, the mysql module has various options (supportBigNumbers and bigNumberStrings) that relate to handling BIGINT.

Express + node.js API filtering

I am developing a node API.
I have endpoint like GET /messages
This wll return all messages. I want to have pagination on this endpoint something like
GET /messages?filter=my_filter_string&limit=10&offset=10
The filter statement will be like this
{fieldName1}={fieldValue1}&...{fieldNameN}>{fieldValueN}.
Operations can be =, > or <. < and > operations are only for number, integers and dates
I using sequelize as the ORM and postgresql as DB.
My question is how can I parse the my_filter_statement and convert it into a search criteria object for sequelize.
Also if I call the API like
GET /messages?filter="id=10&contentlength>20"&limit=10&offset=10
It is not working
Can you have, for example:
GET /messages?greater=1&less=5&limit=10&offset=10
And then in node:
var url = require('url');
var url_parts = url.parse(request.url, true);
var query = url_parts.query;
And match the query parameters, such as greater/less than (which could be shortened to "gt" and "lt"), to the symbols you need in your statement?

Categories

Resources