I have a firebase model where each object looks like this:
done: boolean
|
tags: array
|
text: string
Each object's tag array can contain any number of strings.
How do I obtain all objects with a matching tag? For example, find all objects where the tag contains "email".
Many of the more common search scenarios, such as searching by attribute (as your tag array would contain) will be baked into Firebase as the API continues to expand.
In the mean time, it's certainly possible to grow your own. One approach, based on your question, would be to simply "index" the list of tags with a list of records that match:
/tags/$tag/record_ids...
Then to search for records containing a given tag, you just do a quick query against the tags list:
new Firebase('URL/tags/'+tagName).once('value', function(snap) {
var listOfRecordIds = snap.val();
});
This is a pretty common NoSQL mantra--put more effort into the initial write to make reads easy later. It's also a common denormalization approach (and one most SQL database use internally, on a much more sophisticated level).
Also see the post Frank mentioned as that will help you expand into more advanced search topics.
Related
I have a bunch of JSON documents in my db. I need to perform delete operation on a few documents by searching the documents that have the particular field present in them {key only}. What query can I add to my code so that it finds all the documents with the field? I will be using them to get their values(integer), put them in an array and then use them one by one.
Expanding a bit on the link provided by George Bailey, you might want to use cts.uris() instead of cts.search() because xdmp.documentDelete() takes uri strings instead of documents:
const uris = cts.uris(
null,
['score-zero', 'unchecked'],
cts.jsonPropertyScopeQuery('theKey', cts.trueQuery())
);
xdmp.documentDelete(uris);
If it's a large number of documents, you might need to specify the start value and a limit on the call to cts.uris() to delete different slices of documents in multiple passes.
Hoping that helps,
I would like to come straight to the point and show you my sample data, which is around the average of 180.000 lines from a .csv file, so a lot of lines. I am reading in the .csv with papaparse. Then I am saving the data as array of objects, which looks like this:
I just used this picture as you can also see all the properties my objects have or should have. The data is from Media Transperency Data, which is open source and shows the payments between institiutions.
The array of objects is saved by using the localforage technology, which is basically an IndexedDB or WebSQL with localstorage like API. So I save the data never on a sever! Only in the client!
The Question:
So my question is now, the user can add the sourceHash and/or targetHash attributes in a client interface. So for example assume the user loaded the "Energie Steiermark Kunden GmbH" object and now adds the sourceHash -- "company" to it. So basically a tag. This is already reflected in the client and shown, however I need to get this also in the localforage and therefore rewrite the initial array of objects. So I would need to search for every object in my huge 180.000 lines array that has the name "Energie Steiermark Kunden GmbH", as there can be multiple and set the property sourceHash to "company". Then save it again in the localforage.
The first question would be how to do this most efficient? I can get the data out of localforage by using the following method and set it respectively.
Get:
localforage.getItem('data').then((value) => {
...
});
Set:
localforage.setItem('data', dataObject);
However, the question is how do I do this most efficiently? I mean if the sourceNode only starts with "E" for example we don't need to search all sourceNode's. The same goes of course for the targetNode.
Thank you in advance!
UPDATE:
Thanks for the answeres already! And how would you do it the most efficient way in Javascript? I mean is it possible to do it in few lines. If we assume I have for example the current sourceHash "company" and want to assign it to every node starting with "Energie Steiermark Kunden GmbH" that appear across all timeNode's. It could be 20151, 20152, 20153, 20154 and so on...
Localforage is only a localStorage/sessionStorage-like wrapper over the actual storage engine, and so it only offers you the key-value capabilities of localStorage. In short, there's no more efficient way to do this for arbitrary queries.
This sounds more like a case for IndexedDB, as you can define search indexes over the data, for instance for sourceNodes, and do more efficient queries that way.
I have a set of documents, each annotated with a set of tags, which may contain spaces. The user supplies a set of possibly misspelled tags and I wants to find the documents with the highest number of matching tags (optionally weighted).
There are several thousand documents and tags but at most 100 tags per document.
I am looking on a lightweight and performant solution where the search should be fully on the client side using JavaScript but some preprocessing of the index with node.js is possible.
My idea is to create an inverse index of tags to documents using a multiset, and a fuzzy index that that finds the correct spelling of a misspelled tag, which are created in a preprocessing step in node.js and serialized as JSON files. In the search step, I want to consult for each item of the query set first the fuzzy index to get the most likely correct tag, and, if one exists to consult the inverse index and add the result set to a bag (numbered set). After doing this for all input tags, the contents of the bag, sorted in descending order, should provide the best matching documents.
My Questions
This seems like a common problem, is there already an implementation for it that I can reuse? I looked at lunr.js and fuse.js but they seem to have a different focus.
Is this a sensible approach to the problem? Do you see any obvious improvements?
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
You should be able to achieve what you want using Lunr, here is a simplified example (and a jsfiddle):
var documents = [{
id: 1, tags: ["foo", "bar"],
},{
id: 2, tags: ["hurp", "durp"]
}]
var idx = lunr(function (builder) {
builder.ref('id')
builder.field('tags')
documents.forEach(function (doc) {
builder.add(doc)
})
})
console.log(idx.search("fob~1"))
console.log(idx.search("hurd~2"))
This takes advantage of a couple of features in Lunr:
If a document field is an array, then Lunr assumes the elements are already tokenised, this would allow you to index tags that include spaces as-is, i.e. "foo bar" would be treated as a single tag (if this is what you wanted, it wasn't clear from the question)
Fuzzy search is supported, here using the query string format. The number after the tilde is the maximum edit distance, there is some more documentation that goes into the details.
The results will be sorted by which document best matches the query, in simple terms, documents that contain more matching tags will rank higher.
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
As ever, it depends. Lunr maintains two data structures, an inverted index and a graph. The graph is used for doing the wildcard and fuzzy matching. It keeps separate data structures to facilitate storing extra information about a term in the inverted index that is unrelated to matching.
Depending on your use case, it would be possible to combine the two, an interesting approach would be a finite state transducers, so long as the data you want to store is simple, e.g. an integer (think document id). There is an excellent article talking about this data structure which is similar to what is used in Lunr - http://blog.burntsushi.net/transducers/
I need to do a query where I can show only specific data using an 'AND' statement or equivalent to it. I have taken the example which is displayed in the Firebase Documentation.
// Find all dinosaurs whose height is exactly 25 meters.
var ref = firebase.database().ref("dinosaurs");
ref.orderByChild("height").equalTo(25).on("child_added", function(snapshot) {
console.log(snapshot.key);
});
I understand this line is going to retrieve all the dinosaurs whose height is exactly 25, BUT, I need to show all dinosaurs whose height is '25' AND name is 'Dino'. Is there any way to retrieve this information?
Thanks in advance.
Actually firebase only supports filtering/ordering with one propery, but if you want to filter with more than one property like you said I want to filter with age and name, you have to use composite keys.
There is a third party library called querybase which gives you some capabilities of multy property filtering. See https://github.com/davideast/Querybase
You cannot query by multiple keys.
If you need to sort by two properties your options are:
Create a hybrid key. In reference to your example, if you wanted to get all 'Dino' and height '25' then you would create a hybrid name_age key which could look something like Dino_25. This will allow you to query and search for items with exactly the same value but you lose the ability for ordering (i.e. age less than x).
Perform one query on Firebase and the other client side. You can query by name on Firebase and then iterate through the results and keep the results that match age 25.
Without knowing much about your schema I would advise you to make sure you're flattening your data sufficiently. Often I have found that many multi-level queries can be solved by looking at how I'm storing the data. This is not always the case and sometimes you may just have to take one of the routes I have mentioned above.
I have two classes - _User and Car. A _User will have a low/limited number of Cars that they own. Each Car has only ONE owner and thus an "owner" column that is a to the _User. When I got to the user's page, I want to see their _User info and all of their Cars. I would like to make one call, in Cloud Code if necessary.
Here is where I get confused. There are 3 ways I could do this -
In _User have a relationship column called "cars" that points to each individual Car. If so, how come I can't use the "include(cars)" function on a relation to include the Cars' data in my query?!!
_User.cars = relationship, Car.owner = _User(pointer)
Query the _User, and then query all Cars with (owner == _User.objectId) separately. This is two queries though.
_User.cars = null, Car.owner = _User(pointer)
In _User have a array of pointers column called "cars". Manually inject pointers to cars upon car creation. When querying the user I would use "include(cars)".
_User.cars = [Car(pointer)], Car.owner = _User(pointer)
What is your recommended way to do this and why? Which one is the fastest? The documentation just leaves me further confused.
I recommend you the 3rd option, and yes, you can ask to include an array. You even don't need to "manually inject" the pointers, you just need to add the objects into the array and they'll automatically be converted into pointers.
You've got the right ideas. Just to clarify them a bit:
A relation. User can have a relation column called cars. To get from user to car, there's a user query and then second query like user.relation("cars").query, on which you would .find().
What you might call a belongs_to pointer in Car. To get from user to car you'd have a query to get your user and you create a carQuery like carQuery.equalTo("user", user)
An array of pointers. For small-sized collections, this is superior to the relation, because you can aggressively load cars when querying user by saying include("cars") on a user query. Not sure if there's a second query under the covers - probably not if parse (mongo) is storing these as embedded.
But I wouldn't get too tied up over one or two queries. Using the promise forms of find() will keep your code nice and tidy. There probably is a small speed advantage to the array technique, which is good while the collection size is small (<100 is my rule of thumb).
It's easy to google (or I'll add here if you have a specific question) code examples for maintaining the relations and for getting from user->car or from car->user for each approach.