Search an array using massivejs

Search an array using massivejs - javascript

I'm trying to construct a query on an array using massivejs, but it keeps telling me the operator is unsupported.
This query works:
SELECT * FROM my_table WHERE data->'items' #> '[{"foo": "bar"}]';
where data is a jsonb field and items is an array of objects. My massivejs query is:
{ 'data #>> {items} #>': '[{ \"foo\": \"bar\" }]' }
but massive tells me the #> operator doesn't exist.
I realize I can execute raw SQL, but I'm building up a query with paging, sorting, and other query conditions, so I'd rather not rebuild all that if I can avoid it.
Is there a mistake in my query? Is this something massivejs even supports?

I don't think massivejs supports jsonb operators.
Your query is SQL-correct. And I guess that you'll have to do raw SQL on this one.
I just read the API docs and it goes in my direction but I might be overlooking something.

Related

Inserting an array of json to hasura

I am trying to insert an array of objects to my hasura table. I have defined my columns like the image below
But I am receiving malformed array literal: \"[]\" error. I am using JSON.stringify from my client side code to stringify my array before calling the mutation. What am I doing wrong?

I'd highly recommend that you just make the column a jsonb and then store the array directly. jsonb querying capabilities are significantly improved over json and Hasura doesn't have the greatest support for array columns (of any type).
When submitting data to an array column with Hasura it expects to receive it as a string using the PG array literal syntax, eg '{1, 2, 3}' for an int[] column.

Firebase - Get All Data That Contains

I have a firebase model where each object looks like this:
done: boolean
|
tags: array
|
text: string
Each object's tag array can contain any number of strings.
How do I obtain all objects with a matching tag? For example, find all objects where the tag contains "email".

Many of the more common search scenarios, such as searching by attribute (as your tag array would contain) will be baked into Firebase as the API continues to expand.
In the mean time, it's certainly possible to grow your own. One approach, based on your question, would be to simply "index" the list of tags with a list of records that match:
/tags/$tag/record_ids...
Then to search for records containing a given tag, you just do a quick query against the tags list:
new Firebase('URL/tags/'+tagName).once('value', function(snap) {
var listOfRecordIds = snap.val();
});
This is a pretty common NoSQL mantra--put more effort into the initial write to make reads easy later. It's also a common denormalization approach (and one most SQL database use internally, on a much more sophisticated level).
Also see the post Frank mentioned as that will help you expand into more advanced search topics.

Parsing and constructing filtering queries similiar to SQL WHERE clause in Python/JavaScript

I am building a query engine for a database which is pulling data from SQL and other sources. For normal use cases the users can use a web form where the use can specify filtering parameters with select and ranged inputs. But for advanced use cases, I'd like to to specify a filtering equation box where the users could type
AND, OR
Nested parenthesis
variable names
, <, =, != operators
So the filtering equation could look something like:
((age > 50) or (weight > 100)) and diabetes='yes'
Then this input would be parsed, input errors detected (non-existing variable name, etc) and SQL Alchemy queries built based on it.
I saw an earlier post about the similar problem https://stackoverflow.com/a/1395854/315168
There seem to exist several language and mini-language parsers for Python http://navarra.ca/?p=538
However, does there exist any package which would be out of the box solution or near solution for my problem? If not what would be the simplest way to construct such query parser and constructor in Python?

Have a look at https://github.com/dfilatov/jspath
It's similar to xpath, so the syntax isn't as familiar as SQL, but it's powerful over hierarchical data.

I don't know if this is still relevant to you, but here is my answer:
Firstly I have created a class that does exactly what you need. You may find it here:
https://github.com/snow884/filter_expression_parser/
It takes a list of dictionaries as an input + filter query and returns the filtered results. You just have to define the list of fields that are allowed plus functions for checking the format of the constants passed as a part of filter expression.
The filter expression it ingests has to have the following format:
(time > 45.34) OR (((user_id eq 1) OR (date gt '2019-01-04')) AND (username ne 'john.doe'))
or just
username ne 'john123'
Secondly it was foolish of me to even create this code because dataframe.query(...) from pandas already does almost exactly what you need: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

Javascript serialization and performance with V8 and PostgreSQL

I have been experimenting with PostgreSQL and PL/V8, which embeds the V8 JavaScript engine into PostgreSQL. Using this, I can query into JSON data inside the database, which is rather awesome.
The basic approach is as follows:
CREATE or REPLACE FUNCTION
json_string(data json, key text) RETURNS TEXT AS $$
var data = JSON.parse(data);
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;
SELECT id, data FROM things WHERE json_string(data,'name') LIKE 'Z%';
Using, V8 I can parse JSON data into JS, then return a field and I can use this as a regular pg query expression.
BUT
On large datasets, performance can be an issue, as for every row I need to parse the data.
The parser is fast, but it is definitely the slowest part of the process and it has to happen every time.
What I am trying to work out (to finally get to an actual question) is if there is a way to cache or pre-process the JSON ... even storing a binary representation of the JSON in the table that could be used by V8 automatically as a JS object might be a win. I've had a look at using an alternative format such as messagepack or protobuf, but I don't think they will necessarily be as fast as the native JSON parser in any case.
THOUGHT
PG has blobs and binary types, so the data could be stored in binary, then we just need a way to marshall this into V8.

Postgres supports indexes on arbitrary function calls. The following index should do the trick :
CREATE INDEX json_idx ON things (json_string(field,'name'));

The short version appears to be that with Pg's new json support, so far there's no way to store json directly in any form other than serialised json text. (This looks likely to change in 9.4)
You seem to want to store a pre-parsed form that's a serialised representation of how v8 represents the json in memory, and that's not currently supported. It's not even clear that v8 offers any kind of binary serialisation/deserialisation of json structures. If it doesn't do so natively, code would need to be added to Pg to produce such a representation and to turn it back into v8 json data structures.
It also wouldn't necessarily be faster:
If json was stored in a v8 specific binary form, queries that returned the normal json representation to clients would have to format it each time it was returned, incurring CPU cost.
A binary serialised version of json isn't the same thing as storing the v8 json data structures directly in memory. You can't write a data structure that involves any kind of graph of pointers out to disk directly, it has to be serialised. This serialisation and deserialisation has a cost, and it might not even be much faster than parsing the json text representation. It depends a lot on how v8 represents JavaScript objects in memory.
The binary serialised representation could easily be bigger, since most json is text and small numbers, where you don't gain any compactness from a binary representation. Since storage size directly affects the speed of table scans, value fetches from TOAST, decompression time required for TOASTed values, index sizes, etc, you could easily land up with slower queries and bigger tables.
I'd be interested to see whether an optimisation like what you describe is possible, and whether it'd turn out to be an optimisation at all.
To gain the benefits you want when doing table scans, I guess what you really need is a format that can be traversed without having to parse it and turn it into what's probably a malloc()'d graph of javascript objects. You want to be able to give a path expression for a field and grab it out directly from the serialised form where it's been read into a Pg read buffer or into shared_buffers. That'd be a really interesting design project, but I'd be surprised if anything like it existed in v8.
What you really need to do is research how the existing json-based object databases do fast searches for arbitrary json paths and what their on-disk representations are, then report back on pgsql-hackers. Maybe there's something to be learned from people who've already solved this - presuming, of course, that they have.
In the mean time, what I'd want to focus on is what the other answers here are doing: Working around the slow point and finding other ways to do what you need. You could also look into helping to optimise the json parser, but depending on whether the v8 one or some other one is in use that might already be far past the point of diminishing returns.
I guess this is one of the areas where there's a trade-off between speed and flexible data representation.

perhaps instead of making the retrieval phase responsible for parsing the data, creating a new data type which could pre-disseminate json data on input might be a better approach?
http://www.postgresql.org/docs/9.2/static/sql-createtype.html

I don't have any experience with this, but it got me curious so I did some reading.
JSON only
What about something like the following (untested, BTW)? It doesn't address your question about storing a binary representation of the JSON, it's an attempt to parse all of the JSON at once for all of the rows you're checking, in the hope that it will yield higher performance by reducing the processing overhead of doing it individually for each row. If it succeeds at that, I'm thinking it may result in higher memory consumption though.
The CREATE TYPE...set_of_records() stuff is adapted from the example on the wiki where it mentions that "You can also return records with an array of JSON." I guess it really means "an array of objects".
Is the id value from the DB record embedded in the JSON?
Version #1
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var records = plv8.execute( "SELECT id, data FROM things" );
var data = [];
// Use for loop instead if better performance
records.forEach( function ( rec, i, arr ) {
data.push( rec.data );
} );
data = "[" + data.join( "," ) + "]";
data = JSON.parse( data );
records.forEach( function ( rec, i, arr ) {
rec.name = data[ i ].name;
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
Version #2
This one gets Postgres to aggregate / concatenate some values to cut down on the processing done in JS.
CREATE TYPE rec AS (id integer, data text, name text);
CREATE FUNCTION set_of_records() RETURNS SETOF rec AS
$$
var cols = plv8.execute(
"SELECT" +
"array_agg( id ORDER BY id ) AS id," +
"string_agg( data, ',' ORDER BY id ) AS data" +
"FROM things"
)[0];
cols.data = JSON.parse( "[" + cols.data + "]" );
var records = cols.id;
// Use for loop if better performance
records.forEach( function ( id, i, arr ) {
arr[ i ] = {
id : id,
data : cols.data[ i ],
name : cols.data[ i ].name
};
} );
return records;
$$
LANGUAGE plv8;
SELECT id, data FROM set_of_records() WHERE name LIKE 'Z%'
hstore
How would the performance of this compare?: duplicate the JSON data into an hstore column at write time (or if the performance somehow managed to be good enough, convert the JSON to hstore at select time) and use the hstore in your WHERE, e.g.:
SELECT id, data FROM things WHERE hstore_data -> name LIKE 'Z%'
I heard about hstore from here: http://lwn.net/Articles/497069/
The article mentions some other interesting things:
PL/v8 lets you...create expression indexes on specific JSON elements and save them, giving you stored search indexes much like CouchDB's "views".
It doesn't elaborate on that and I don't really know what it's referring to.
There's a comment attributed as "jberkus" that says:
We discussed having a binary JSON type as well, but without a protocol to transmit binary values (BSON isn't at all a standard, and has some serious glitches), there didn't seem to be any point.
If you're interested in working on binary JSON support for PostgreSQL, we'd be interested in having you help out ...

I don't know if it would be useful here, but I came across this: pg-to-json-serializer. It mentions functionality for:
parsing JSON strings and filling postgreSQL records/arrays from it
I don't know if it would offer any performance benefit over what you've been doing so far though, and I don't really even understand their examples.
Just thought it was worth mentioning.

Creating a BIRT Data Set with Dynamic Data - ORA-01722

Having some trouble getting BIRT to allow me to create a Data Set with Parameters that are set at run time.
The SQL that is giving me the error is:
...
FROM SPRIDEN, SPBPERS P, POSNCTL.NBRJOBS X, NHRDIST d1
where D1.NHRDIST_PAYNO between '#PAYNO_BEGIN' and '#PAYNO_BEGIN'
AND D1.NHRDIST_YEAR = '#YEAR'
...
I have my Report Parameters defined as PaynoBegin, PaynoEnd, Year
I also have a Data Set script set for beforeOpen as follows:
queryText = String (queryText).replace ("#PAYNO_END", Number(params["PaynoEnd"]));
queryText = String (queryText).replace ("#PAYNO_BEGIN", Number(params["PaynoBegin"]));
queryText = String (queryText).replace ("#YEAR", Number(params["Year"]));
The problem seems to be that the JDBC can't get the ResultSet from this, however I have 10 other reports that work the same way. I have commented out the where clause and it will generate the data set. I also tried breaking the where clause out into two and clauses with <= and >=, but it still throws a ORA-01722 invalid number error on the line.
Any thoughts on this?

Two quite separate thoughts:
1) You have single quotes around each of your parameters in the query, yet it appears as though each one is a numeric - try removing the single quotes, so that the where clause looks like this:
where D1.NHRDIST_PAYNO between #PAYNO_BEGIN and #PAYNO_BEGIN
AND D1.NHRDIST_YEAR = #YEAR
Don't forget that all three parameters should be required. If the query still returns an error, try replacing #PAYNO_BEGIN, #PAYNO_BEGIN and #YEAR with hardcoded numeric values in the query string, and see whether you still get an error.
2) You are currently using dynamic SQL - amending query strings to replace specified markers with the text of the entered parameters. This makes you vulnerable to SQL Injection attacks - if you are unfamiliar with the term, you can find a simple example here.
If you are familiar with the concept, you may be under the impression that SQL Injection attacks cannot be implemented with numeric parameters - Tom Kite has recently posted a few articles on his blog about SQL Injection, including one that deals with a SQL Injection flaw using NLS settings with numbers.
Instead, you should use bind parameters. To do so with your report, amend your query to include:
...
FROM SPRIDEN, SPBPERS P, POSNCTL.NBRJOBS X, NHRDIST d1
where D1.NHRDIST_PAYNO between ? and ?
AND D1.NHRDIST_YEAR = ?
...
instead of the existing code, remove the queryText replacement code from the beforeOpen script and map the three dataset parameters to the PaynoBegin, PaynoEnd and Year report parameters respectively in the Dataset Editor. (You should also change any other replaced text in your query text to bind parameter markers (?) and map dataset parameters to them as required.)

Develop Reference

JavaScript is the programming language of the Web.