In my react based single page application, my page is divided in two panes.
Left Pane: Filter Panel.
Right Pane: Grid (table containing data that passes through applied filters)
In summary, I have an application that looks very similar to amazon.com. By default, when user hits an application's root endpoint (/) in the browser, I fetch last 7 days of data from the server and show it inside the grid.
Filter panel has couple of filters (e.g. time filter to fetch data that falls inside specified time interval, Ids to search data with specific id etc.) and a search button attached in the header of filter panel. Hitting search button makes a post call to a server by giving selected filters inside post form body, server returns back data that matches filters passed and my frontend application displays this data returned back from the server inside grid.
Now, when someone hits search button in the filter panel I want to reflect selected filters in the query parameter of the URL, because it will help me to share these URLs with other users of my website, so that they can see filters I applied and see data inside the grid matching these filters only.
Problem here is, if on search button click, I use http get with query parameters, I will endup breaking application because of limit imposed on URL length by different browsers.
Please suggest me correct solution to create such URLs that will help me to set the selected filters in the filter panel without causing any side effect in my application.
Possible solution: Considering the fact that we cannot directly add plain strings in query parameter because of URL length limitation from different browsers (Note: Specification does not limit the length of an HTTP Get request but different browsers implement their own limitations), we can use something like message digest or hash (convert input of arbitrary length into an output of fixed length) and save it in DB for server to understand the request and serve content back. This is just a thought, I am not sure whether this is an ideal solution to this problem.
Behavior of other heavily used websites:
amazon.com, newegg.com -> uses hashed urls.
kayak.com -> since they have very well defined keywords, they use
short forms like IN for INDIA, BLR for Bangalore etc. and combine
this with negation logic to further optimize maximum url length. Not
checked but this will ideally break after large selection of filters.
flipkart.com -> appends strings directly to query parameters and breaks
after limit is breached. verified this.
In response to #cauchy's answer, we need to make a distinction between hashing and encryption.
Hashing
Hashes are by necessity irreversible. In order to map the hash to the specific filter combination, you would either need to
hash each permutation of filters on the server for every request to try matching the requested hash (computationally intensive) or
store a map of hash to filter combination on the server (memory intensive).
For the vast majority of cases, option 1 is going to be too slow. Depending on the number of filters and options, option B may require a sizable map, but it's still your best option.
Encryption
In this scheme, the server would send its public key to the client, then the client could use that to encrypt its filter options. The server would then decrypt the encrypted data with its private key. This is good, but your encrypted data will not be fixed length. So, as more options are selected, you run into the same problem of indeterminate parameter length.
Thus, in order to ensure your URL is short for any number of filters and options, you will need to maintain a mapping of hash->selection on the server.
How should we handle permanent vs temporary links?
You mentioned in your comment above
If we use some persistent store to save the mapping between this hash to actual filters, we would ideally want to segregate long-lived "permalinks" from short-lived ephemeral URLs, and use that understanding to efficiently expire the short-lived hashes.
You likely have a service on the server that handles all of the filters that you support in your application. The trick here is letting that service also manage the hashmap. As more filters and options are added/removed, the service will need to re-hash each permutation of filter selections.
If you need strong support for permalinks, then whenever you remove filters or options, you'll want to maintain the "expired" hashes and change their mapping to point to a reasonable alternative hash.
When do we update hashes in our DB?
There are lots of options, but I would generally prefer build time. If you're using a CI solution like Jenkins, Travis, AWS CodePipeline, etc., then you can add a build step to update your DB. Basically, you're going to...
Keep a persistent record of all the existing supported filters.
On build, check to see if there are any new filters. If so...
Add those filters to the record from step 1.
Hash all new filter permutations (just those that include your new filters) and store those in the hash DB
Check to see if any filters have been removed. If so...
Remove those filters from the record from step 1.
Find all the hashes for permutations that include those filters and either...
remove those hashes from the DB (weak permalinks), or
Point that hash to a reasonable alternative hash in the DB (strong permalinks)
Lets analyse your problem and the solution possible.
Problem : You want a URL which has information about the filter applied so that when you share that URL user doesn't land on arbitrary page.
Solutions:
1) Append filter applied with URL. To achieve this you will need to shorten the key of type of filter and the value of filter so that Length of URL don't exceed much for each filter.
Drawback: This is not most reliable solution as the number of filter increase URL length has to increase no other option.
2) Append a unique key of filter applied(hash) with URL. To achieve this you will need to do some changes on server and client both. On client side you will need a encoding algorithm which convert filter applied to unique hash. On server side you will need decoding algorithm which convert unique hash to filter applied. SO now client whenever a URL like this is hit you can make a POST api call which take this hash give you the array of filter applied or on client side only put the logic to convert this hash.
Do all this in componentWillMount to avoid any side effect.
I think 2nd solution is scalable and efficient in almost all cases.
Related
I'm working on a web application where a large set of data can be filtered using JavaScript. When a user selects filters, I want to update the URL of the page to reflect the selected filters so that the user can share that URL with someone else, and that person can load the page and my app will apply the same filters. I don't have a need for the back buttons in the browser to cycle thru the previous filters that were selected.
I think I have two approaches here:
I can create a representation of the filters and add them to the fragment of the current page via window.location.hash. I can parse them on page load to see if there are any already set.
I can create a representation of the filters as query string params, and manipulate the URL using the history API. I would use the replaceState method.
Is there a reason to chose one over the other? Again, I want to emphasize that I'm not concerned with any routing or browser history manipulation. I just want to provide a way for someone to put certain params in the URL that my JS code will parse and apply as the filters.
Using the Vue router and may be also vuex for state management should help you save some time. There is also a little helper library for url encoding / decoding --> qs.
To your question "history vs hash": that depends on your application, the system which hosts the application (e.g. part of a content management system with its own url handling) and the meaning of the params.
History mode generates better looking paths and features some more control, as long as you stay in your application. But: as your path segments have no identifiers, the sort order matters.
Scenario: You have an application which can have three params:
/value1/value2/value3 means something else as /value1/value3/value2
With a query string you don't need to take care about sort order, as every value has its key:
key1=value1&key2=value2&key3=value3 is the same as key1=value1&key3=value3&key2=value2
We are looking to use Algolia Search for an application. We like the convenience of Algolia but are stuck on one point. We have custom user groups and each user group can only see a subset of the records. When we are pushing records to Algolia all the records show up. How do we pair that with our custom logic of specific users can see specific records and we dont those to show up on the search lists.
The best way to handle this use case is to encode the permission information directly inside your records (like a group or a user). You can for example add a permission array on your record:
"permission": ["group1", "user42"]
You then just need to add this permission attribute in the list of attributes for faceting and apply the restriction in your query via a facetFilters argument.
I would also recommend to use the secured-API key feature that allows to apply this restriction in a secure way even if the query come from a browser or mobile app. A HMAC-SHA 256 signature is computed in your backend between the API key and the restriction to ensure no-one can change this restriction.
I am using the Nodejs Cassandra driver and I want to be able to retrieve the previous and next pages. So far the documentation shows the retrieval of the next page, which is saving the pageState from the previous page and passing it as a parameter. Sadly there is no info on how to navigate to the previous page.
As I see it there are two options:
Save each pageState and page as a key-value pair and use the pageState for the page that you want to navigate to.
Save the retrieved data in an array and use the array to navigate to the previous page. (I don't think that this is a good solution as I'll have to store large chunks the data in the memory.)
Both methods does not seem to be an elegant solution to me, but if I have to choose I'll use the first one.
Is there any way to do this out of the box using the Nodejs Cassandra driver?
Another thing is that in the documentation the manual paging is used by calling the eachRow function. If I understand it correctly it gives you every row as soon as it is red from the database. The problem is that this is implemented in my API and I am returning the data for the current page in the HTTP response. So in order for me to do that I'll have to push each row to a custom array and then return the array when the data for the current page is retrieved. Is there a way to use execute with the manual paging as the above seems redundant?
Thanks
EDIT:
This is my data model:
CREATE TABLE store_customer_report (
store_id uuid,
segment_id uuid,
report_time timestamp,
sharder int,
customer_email text,
count int static,
first_name text,
last_name text,
PRIMARY KEY ((store_id, segment_id, report_time, sharder), customer_email)
) WITH CLUSTERING ORDER BY (customer_email ASC)
I am displaying the data in a grid, so that the user can navigate trough it.
As I write this I thought of a way to do this without needing the previous functionality, but nevertheless I think that this is a valid case and it will be great if there is an elegant solution to it.
Sadly there is no info on how to navigate to the previous page.
That is correct, when you make a query and there are more rows, Cassandra returns a paging state to fetch the next set of rows, but not the previous ones. While what the paging state represents is abstracted away, it is generally a pointer to where to continue reading the next set of data, there really isn't a concept of reading the previous set of data (because you just read it).
Save each pageState and page as a key-value pair and use the pageState for the page that you want to navigate to.
This is the strategy i'd recommend too, of course to get the paging state you actually have to make the queries.
Is there any way to do this out of the box using the Nodejs Cassandra driver?
Not that I am aware of unfortunately, if you want to go back to previous pages you are going to need to track the state.
Is there a way to use execute with the manual paging as the above seems redundant?
Yep, you can provide pageState in the options parameter with execute as well and it will be regarded, i.e.:
client.execute('SELECT * FROM table', [], {pageState: pageState}, function (err, result) {
...
});
There is an option to be "more" manual, this was common before paging features. You can store the last partition/clustering keys of the current page you are sending back in response (possibly encrypted depending on what it is, most likely generate the partition key and only send/receive the clustering keys to avoid security issues). Then when doing your "next page" response you just have to start your CQL query from there. To go back a page, change ORDER BY clause of clustering key and walk backwards. If you provide your schema it would be easier to give examples. Thats all the pageState really does anyway.
I am building a webapp and have a few arrays that I would like to pass through the URL in order to make the results of my application easily sharable.
Is there an efficient way to do this? I know a lot of websites (like youtube) use some sort of encoding to make their URLs shorter, would that be an option here?
Thanks in advance!
What I suspect you're asking is you have some page where the user can alter information, etc, and you want a way to create a URL on the fly with that information so it can easily be accessed again. I've listed two approaches here:
Use the query string. On your page you can have a button saying "save" that produces a URL with info about what the user did. For example, if I have a webpage where all I do is put my name in and select a color, I can encode that as http://my-website.com/page?name=John_Doe&color=red. Then, if I visit that link, your page could access the query object in JavaScript and load a page with the name and color field already set.
An approach for the "YouTube-style" URLs would be to create a hash of the relevant information corresponding to the page. For example, if I were creating a service for users to store plaintext files. These files are to have the following attributes: title, date, name, and body. We can create a hash of the string hash_string = someHashFunction(title+date+name).
Of course, this is a very naive hashing scheme, but something like this may be what you are looking for. Following this, your URL would be something like http://my-website.com/hash_string. The key here is not only creating these URLs, but having a means to route requests on the server side to the page corresponding to the hash_string.
I'm building a node.js application that opens up a connection to the Twitter Streaming API (v1.1)
I would like to filter multiple keywords (hashtags & words) as separate queries. My original idea was to have multiple public streams.
However, I understand that I can only have one open connection to the Twitter streaming api per application and per IP address and that Twitter encourages us to come up with creative solutions to get what we want.
So my question is this:
If I stream with no filters, such as using statuses/sample (which I believe is 1%) and use custom javascript to filter the output, would I get the same tweets if I used the API method of filtering (i.e track='twitter').
Edit: I have created a diagram explaining this:
As you can see, I want to know if the two outputs wil be the same. I suspect that they won't be because although both outputs are effectively the same filter, one source is a 1% sample, and maybe the other source is a 100% sample but only delivering 1% tweets from that.
So can someone please clarify if both outputs are the same?
Thank you.
According to the Twitter streaming api rules, if the keywords that you track doesn't exceed 1% of the whole global traffic you will receive all data (some tweets might be lost due to network issues etc but it is not significant). This is called garden-hose (firehose is a special filter which gives you all the data but it is given as a paid service through third parties such as http://datasift.com/)
So if a tweet is filtered through public stream then it would be part of your custom filter too unless your keyword set is too broad.
By using custom filters you can track multiple search keywords, and if you miss some data because your keyword set is too broad twitter sends a track limitation notice indicating how much data you are missing.
My suggestion to you would be to use a custom filter and analyze what you get from the stream and what you get as a result for the same keywords from twitter. And when you start getting track limitation notice from twitter, it is time for you to split your keyword set into chunks and start streaming through different streamers by running them from different machines.
The details of the filter streaming is below (taken from official website https://dev.twitter.com/docs/api/1.1/post/statuses/filter)
Returns public statuses that match one or more filter predicates. Multiple parameters may be specified which allows most clients to use a single connection to the Streaming API. Both GET and POST requests are supported, but GET requests with too many parameters may cause the request to be rejected for excessive URL length. Use a POST request to avoid long URLs.
The default access level allows up to 400 track keywords, 5,000 follow userids and 25 0.1-360 degree location boxes. If you need elevated access to the Streaming API, you should explore our partner providers of Twitter data here.
I would like to answer my question with the results of my findings.
I tested both side by side in the same time frame and concluded that the custom filter method, whilst it supports multiple filters does not provide enough tweets to create an interesting enough visualisation.
I think the only way to get something more interesting with concurrent filters is to look at other methods but I am wondering if its not possible. Maybe with a third party.
I have attached a screenshot of the visualisation tracking 'barackobama' The left is the custom filter, the right is statuses/filter.
The statuses/filter api operate on all tweets, instead of those returned by statuses/sample, you can tell by looking at their tweet id's: sample tweets all come from a specific time window. So from millisecond-resolution creation time, you can definitely tell that filter returns tweets outside of sample.
For more details about getting creation time from tweet id and the time window on sample tweets, consult this post: http://blog.falcondai.com/2013/06/666-and-how-twitter-samples-tweets-in.html