Cloudant search query not returning expected result

Cloudant search query not returning expected result - javascript

I'm using Cloudant and I created a search index. However, I'd like the index to return the term I'm querying. I mean I want to get a data which has a specific date that I chose.
1.) I have created a cloudant database and loaded it with some data.
2.) I have created a search index.
3) Node set-ups
4) And function content.
I've expected to see whole data about this exact "ts" variable. But I got this:
I have been struggling with this for a few days and can't seem to get this working. I am sure it's just a Newbie issue.
Many thanks in advice.

A search index uses the Apache Lucene library for text pre-processing and indexing. It's designed for chopping up sentences into words and words into stemmed tokens for "free text" search, i.e. finding the documents that best match a multi-word phrase. You can optionally choose the type of stemming which is performed by specifying the "analyzer" when creating the index, that is the text pre-processing algorithm used to break up the strings.
If you want to keep the string intact, then choose the "keyword" analyzer:
You may also want to investigate using Cloudant Query, whose type=json indexes would not pre-process your timestamp string.

Related

Data masking in MongoDB

We have a MongoDB database in a development environment. There are a lot of collections that contain names of people. What we want to do is the following:
mask the names in each collection, the fields need to be updated directly in the database, cannot run them through some external pipeline
once masked, it is ok if we are unable to retrieve the original names (so one-way masking)
every unique name should result in the same mask
the masking script can be run on the mongodb cli or a MongoDB gui like Studio3T
I was thinking of maybe using MD5 or SHA, but I am not sure if either is available to use directly in mongo operations like update or even in javascript without external libraries.
Also, since MD5 always produces the same hash, if someone were to get access to the document, since we will not be masking the field name, it would be fairly easy to feed typical names into the algorithm until the hash matches to figure out the name, but I think we may be able to live with this.
An alternative I was thinking of was, to loop through the unique names we have, and create a map from names to UUIDs. Then, go through each collection and use this map to update the names with the UUIDs. The problem with this is that we'll need to keep this mapping dictionary for when we receive additional documents for an existing person.

Firebase Firestore - Query for posts in a big collection where groupId should match a value in an array

I am having troubles with firebase using the cloudstore .where query.
I want to query a big collection of documents (in my case posts) but I only want to query the posts in which the groupId matches any of the groups that the user is in. The reason for this is that I want to query a combined feed for the user with all the latest relevant data (using orderBy and limit).
I know that I can use array-contains, so I could for instance query all of the posts for user where the user is a member.
firebase.db.collection('posts').where('members','array-contains',firebase.uid)
This would work if I decided to keep track of the members in a group. Problem is if I would change members in a group, I would have to loop through all posts and change the array of members (not really good practice). Better would then be to have the post contain and id of which group it was posted in.
So let's say the user has an array containing all the groups he is in
user.groups = ['companyGroup', '{id}', '{id2}']
I would then like to query through the whole posts collection and get all the documents where the field groupId matches any of the values in user.groups something like this:
firebase.db.collection('posts').where('groupId','==',[any of user.groups])
or maybe the reverse:
firebase.db.collection('posts').where(user.groups,'array-contains','groupId')
^ I have not tried this one but I am certain it doesn't work according to the docs according to
The where() method takes three parameters: a field to filter on, a comparison operation, and a value. The comparison can be <, <=, ==, >, >=, or array_contains.
Is there a possible way to do something like this? I can't really query multiple locations at once and combine them because that defeats the purpose of being able to limit the data and orderBy fields. I understand that I could put a new collection called feed under every user and for every post use a firebase function that pushes to post to the relevant members feed (only the id and latestActivity), but then as soon as that post changes (I am going to use a field called latestActivity to order data according to relevancy, but also when deleting a post) I would need to loop through all docs under every user affected and change the value/delete doc.
Any ideas are greatly appreciated!

Currently, there is no way to pass an array of ids to the where() function and expect to get all the documents that corresponde to each particular id:
firebase.db.collection('posts').where('groupId','==',[any of user.groups])
The option that you have, is to store in the array either the ids as strings or references (path to group documents). There is no real advantage of storing references rather than strings, so it's up to you to decide which one you feel more comfortable with.
To get the all group documents, you should get the array that contains those ids/references and for each one separately create a new database request. So unfortunately there is no other way to get those group documents at once using a single query.
However, creating extra database calls it doesn't mean that fetching, let's say 3 documents, will be 3x slower as fetching one document. You can make some tests yourself.
I quote #Frank van Puffelen, "Firebase has a history of performing fine in such cases, since it pipelines the requests."

use array contain it works perfectly.
firebase.db.collection('posts').where(user.groups,'array-contains','groupId')
it works pretty good for me . you should try this.

What data structure is best suited for quickly searchable text data?

When looking at products like DnD Insider and the Kindle app, users can quickly search for matching text strings in a large structure of text data. If I were to make a web application that allowed users to quickly search a "rulebook" (or similar text) for a matching entry and pull up the data to read, how should I organize the data?
I don't think it's a good idea to put all the data into memory. But if I stored it in some kind of database, what would be a good way to search the database and retrieve the appropriate matching entry?
So far, I believe I'm going to use the Boyer-Moore algorithm to actually do the searching. I can put the various sections of rule-text into different database entries. The user search will prioritize searching section titles over section body text. Since the text will be static and not user-editable, perhaps an array to store every word would work?

Typically some kind of inverted index is used for this purpose: https://en.wikipedia.org/wiki/Inverted_index
Basically this is a map from each word to a list of the places in which it appears. Each "place" could be a (document ID, occurrence count), or something more precise if you want to support phrase searching or if you want to give more weight to matches in titles, etc.
Search results are usually ranked with some variant of tf-idf: https://en.wikipedia.org/wiki/Tf%E2%80%93idf

How to regex Zapier and get output?

I regularly receive emails from the same person, each containing one or more unique identifying codes. I need to get those codes.
The email body contains a host of inconsistent email content, but it is the strings I am interested in. They look like...
loYm9vYzE6Z-aaj5lL_Og539wFer0KfD
FuZTFvYzE68y8-t4UgBT9npHLTGmVAor
JpZDRwYzE6dgyo1legz9sqpVy_F21nx8
ZzZ3RwYzE63P3UwX2ANPI-c4PMo7bFmj
What the strings seem to have in common is, they are all 32 characters in length and all composed of a mixture of both uppercase, lowercase, numbers and symbols. But a given email may contain none, one or multiple, and the strings will be in an unpredictable position, not on adjacent lines as above.
I wish to make a Zap workflow in Zapier, the linking tool for web services, to find these strings and use them in another app - ie. whenever a string is found, create a new Trello card.
I have already started the workflow with Zapier's "Gmail" integration as a "trigger", specifically a search using the "from:" field corresponding to the regular sender. That's the easy part.
But the actual parsing of the email body is foxing me. Zapier has a rudimentary email parser, but it is not suitable for this task. What is suitable is using Zapier's own "Code" integration to execute freeform code - namely, a regular expression to identify those strings.
I have never done this before and am struggling to formulate working code. Zapier Code can take either Python (documentation) or Javascript (documentation). It supports data variables "input_data" (Python) or "inputData" (Javascript) and "output" (both).
See, below, how I insert the Gmail body in to "body" for parsing...
I need to use the Code box to construct a regular expression to find each unique identifier string and output it as input to the next integration in the workflow, ie. Trello.
For info, in the above screengrab, the existing "hello world" code in the box is Zapier's own test code. The fields "id" and "hello" are made available to the next workflow app in the chain.
But I need to do my process for all of the strings found within an email body - ie. if an email contains just one code, create one Trello card; but if an email contains four codes, create a Trello card for each of the four.
That is, there could be multiple outputs. I have no idea how this could work, since I think these workflows are only supposed to accommodate one action.
I could use some help getting over the hill. Thank-you.

David here, from the Zapier Platform team.
I'm glad you're showing interest in the code step. Assuming your assumptions (32 characters exactly) is always going to be true, this should be fairly straightforward.
First off, the regex. We want to look for a character that's a letter, number, or punctuation. Luckily, javascript's \w is equivalent to [A-Z0-9a-z_], which covers the bases in all of your examples besides the -, which we'll include manually. Finally, we want exactly 32 character length strings, so we'll ask for that. We also want to add the global flag, so we find all matches, not just the first. So we have the following:
/[\w-]{32}/g
You've already covered mapping the body in, so that's good. The javascript code will be as follows:
// stores an array of any length (0 or more) with the matches
var matches = inputData.body.match(/[\w-]{32}/g)
// the .map function executes the nameless inner function once for each
// element of the array and returns a new array with the results
// [{str: 'loYm9vYzE6Z-aaj5lL_Og539wFer0KfD'}, ...]
return (matches || []).map(function (m) { return {str: m} })
Here, you'll be taking advantage of an undocumented feature of code steps: when you return an array of objects, subsequent steps are executed once for each object. If you return an empty array (which is what'll happen if no keys are found), the zap halts and nothing else happens. When you're testing, there'll be no indicator that anything besides the first result does anything. Once your zap is on and runs for real though, it'll fan out as described here.
That's all it takes! Hopefully that all makes sense. Let me know if you've got any other questions!

Can I have an URL query with numerical, possibly equal (non-unique) keys?

I am building an HTML5 single page app, and I want to allow the user to keep the current application state for later use. I want to achieve this by creating a link URL to my page, with a specially crafted query part. When called again, with the URL, the application would parse the query part and recreate the stored state.
Now, some part of the state is a list, whose items are numerical values and an associated text. The floating-point numerical values, as well as the text is not required to be unique.
Like this:
4.54 first
12.1 another
12.1 more
34 more
My intent is to create an URL like so:
www.myappdomain.com/SinglePage.html?4.54=first&12.1=another&12.1=more&34=more
Is this a legal URL? Given proper encoding of the text, will this work in the wild?
I have read What Every Developer Should Know About URLs by Alan Skorkin, which I can generally recommend about URLs and this Answer about URL character usage.
To me, doing it that way seems legal but I still feel a little uncomfortable, since I have not found information about the possibly non-unique keys I might have and about numbers as keys in query parts in general.
Edit: I've brought it to work, see below (tell me if link ever breaks):
http://quir.li/player.html?media=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D0VqTwnAuHws&title=What%20makes%20you%20beautiful&artist=The%20piano%20guys%20covering%20One%20Republic&album=Youtube&6.49=Intro&30.12=Knocking%20part&46.02=Real%20playing&51.5=Piano%20forte&93.32=Stringified&123.35=Vocals&139.38=Key%20cover%20jam&150.16=Good%20morning%20sky&173.96=Final%20chord

This is a legal URL by the URI specification -- https://www.rfc-editor.org/rfc/rfc3986. However, whether this will work in the wild is a different issue since the specification only defines the generic syntax of URIs.
Since there is no specification on what should be done for duplicate keys in the query part (see Authoritative position of duplicate HTTP GET query keys) different software frameworks will treat such URIs differently. However, most frameworks will correctly detect duplicate keys as multiple values with the same key and group such values into a single array/list of values for the given key (rather than using the last value with the given key and discarding all the previous values for that key). Using numbers as keys is also OK since keys are parsed as text strings. In short: you should be safe.

I don't think it a good way to pass data, query string parameters generally regarded as parameters that can be accessed by their name, here you actually pass some data as parameters , while from technical point of view this can be done it makes your code somewhat obfuscated, I would pass this data in a single parameter using JSON encoding

This post suggests that there is no spec which says that non-unique keys are invalid.
Authoritative position of duplicate HTTP GET query keys
I can't seem to find anything concrete about number keys.
However, this might be a workaround if you don't want to use non-unique numeric keys for any reason: Use some basic encoding to map numbers to strings. Something basic could be 1-a, 2-b, 3-c, 4-d...9-i, 0-j. And '.' could be 'k' (if there is not spec about whether '.' is a legal character in a URL parameter)
Then, e.g., 21.3 would encode to bakc. Also you can add a number in the end of the encoded key to ensure that keys are unique. These numbers would be ignored while decoding (or could help differentiate between the parameters). Then the first 21.3 would encode to bakc1, the next bakc2 etc)

Develop Reference

JavaScript is the programming language of the Web.