How much JSON is too much JSON?

How much JSON is too much JSON? - javascript

I am developing a bookmark site like delicious. In order to provide a better and a faster user experience to the user, i am grabbing all the bookmarks from the db table and form a json object with all the bookmark information in it. Eg, for each bookmark, i have an id, title, url, description, tags etc. The json object is already formed on the first page load. I then get the output json, use jquery.each to style and inject relevant html on the fly.
Right now, i have no option to test it so here comes my question: imagining there is no limits on the number of bookmarks a user can save, what would be the effect on this structure on the browser (or any other problems that might arise for this situation) if a user has, say, 2000 bookmarks also considering that paging is not an option for this particular project.

Probably controversial but anyway. How can paging not be an option? When is it ever relevant to show 2k bookmarks at a time? I'd say never.
When you're returning that much data (of course it depends on how much text) you're wide open to DDOS attacks. Imagine an attacker that gets hold of a url containing several megabytes of json, it would not be that hard to sink your servers.
It would be nice with some more information on your UI so we can analyze what data you really need.

Related

Save Value State for Public Sharing (Add to URL)

http://liveweave.com/xfOKga
I'm trying to figure out how to save code similar to Liveweave.
Basically whatever you code you click the save button and it generates a hash after the url. When you go to this url you can see the saved code. (I been trying to learn this, I just keep having trouble finding the right sources. My search results end up with references completely unrelated to what I'm looking for, example )
I spent the past two days researching into this and I've gotten no where.
Can anyone can help direct me to a tutorial or article that explains this type of save event thoroughly?

To understand the functionality, it is best to try and identify everything that is happening. Dissect this feature according to the technology that would typically be used for each distinguishable component. That dissected overview will then make it easier to see how the underlying technologies work together. I suspect you may lack the experience or nomenclature to see at a glance how a site like liveweave works or how to search for the individual pieces, so I will break it down for you. It will be up to you to research the individual components that I will name. Knowing this, here are the keys you need to research:
Note that without being the actual developer of liveweave, knowing all the backend technology is not possible, but intelligent guesses will suffice. The practice is all the same. This is a cursory breakdown.
1) A marked up page, with HTML, CSS, and JavaScript. This is the user-facing part of the application, where content can be typed, and how the user interacts with the application.
2) JavaScript to asynchronously (AJAX) submit the page's form to the backend for processing.
3) A backend programming/scripting language to process the incoming form. In the case of liveweave, the form is POSTed. It is also using PHP to process the form.
4) A database table with a column for each language (liveweave has HTML, CSS, and JavaScript). This database will insert the current data from each textarea submitted in the form and processed by PHP as a new row. Each row will generate a new hash and store it alongside the data just inserted. A popular database is MySQL.
5) When the database insert is complete, the scripting language takes over again, and send its response back to the marked up page (1). That page is waiting for a response from the backend. JavaScript handles the response. In the case of liveweave, the response is the latest hash to be used in the URL.
6) The URL magic happens with JavaScript. You want to look up JavaScript's latest History API, where methods like pushState will be used to update the URL in the browser without actually refreshing the page.
When a URL with a given hash is navigated to, the scripting language processes the request, grabs the hash, searches for the hash in the database table, finds a matching row, and populates the page's textareas with the data just found.
Throughout all this, there should be checks to avoid duplication and a multitude of exploits. This is also up to you to research.
It should be noted that currently there are two comments for your question. Darren's link will indeed allow the URL to change, but it is a redirect, and not what you want. ksealey's answer is not wrong; that is one way of doing it, but it is not the most robust or scalable, and would not be the recommended approach for solving this.

What's the standard way to store formatted content in the database?

I have an application that involves storing and retrieving lots of user-formatted content using a WYSIWYG html editor. Kind of like how SO saves formatted questions and answers.
What's the standard approach to do this?
EDIT:
Just to clarify: I am not asking about the data type to store in the DB. Rather I am concerned about storing chunks of html tags with style information in the DB.

This is just text data. Usually a VARCHAR is best.
UPDATE:
Yes, if you want to support Unicode (which you probably do in this case) then make that an NVARCHAR.
As for the OPs update, you are imagining difficulties which don't really exist. HTML is textual data so it goes into a text field. You do not want to separate the formatting from the text at all.
That is the answer but it isn't the end of your concerns on this matter. The reason doing this is bothering you is probably because databases use structured data (all of the data is in named and typed columns) and this is unstructured content. Meaning that the data in this field is not being stored in a DB friendly manner. You should try to structure your data as much as possible because it allows you to quickly search by the field values. We are throwing anything the user types into that field and if we ever need to find data in that field we'll need to search the entire field to find it. This is very slow process and to make things worse we aren't just searching through the text but also the formatting for that text.
This is all true and not good so we should avoid doing this as much as possible. If you can avoid allowing users to enter free form text then do so by all means. From that point you can apply HTML formatting to the data from your client application in a fast and consistent manner.
However, the basis of this question is that you want a field of unstructured content and you are asking how to store that unstructured content. That answer is pretty simple (even though I guess that I didn't get it 100% correct the first try), use NVARCHAR.
Even though storing this unstructured content is not DB friendly it is sometimes website friendly and a common practice in the situation you are describing. The thing to remember is that we want to avoid searching on this unstructured data. We may need to go to fairly extreme measures to do so.
Many applications will solve this slow search problem by creating a separate table and parsing the text out of the HTML and inserting each individual word (along with the foreign key for the original tables entry) into that other table to be searched on later. Even if you do this you'll still want to keep your original formatted text for display purposes.
I generally make this type of optimization Phase II because the site will function without such optimizations; it'll just be slower and that isn't going to even be noticed until the site has plenty of content to search through.
One other thing to note is that often this will not be HTML formatted text. There are several formats commonly used such as BBCode or Markdown. SQL doesn't care though, to your SQL server this is all just text.

The title of the question could be stored in a VARCHAR and the question in a TEXT.
Here, have a look at the data types of the SQL Server: http://msdn.microsoft.com/en-us/library/ms187752.aspx

Detect the referral/s of Url/s using JavaScript or PHP from inside a Bookmarklet

Let's think out of the box!
Without any programming skills, how can you say/detect if you are on a web page that lists products, and not on the page that prints specific details of a product?
The Bookmarklet is inserted using JavaScript in right after the body tag of a website ( eBay, Bloomingdales, Macy's, toys'r'us ... )
Now, my story is: (programming skills needed now)
I have a bookmarklet and my main problem is how to detect if I am on a page that lists products or if i am on the page that prints the product detail.
The best way that I could think, to detect if I am on the detail page of a product is to detect the referral(s) of the current URL. (maybe all the referrals, the entire click history)
Possible problem: a user adds the URL as favorite and does not use my bookmarklet, and closes the browser; then the user uses the browser again, clicks the favorite link and uses my bookmaklet and I think that I can't detect the referral in this case; it's OK, not all the cases are covered or possible;
Can I detect the referral of this link using the cache in this case? (many browsers cache systems involved here, I know)

how can you say/detect if you are on a web page that lists products, and not on the page that prints specific details of a product
I'd setup Brain.js (a neural net implemented in javascript) and train it up on a (necessarily broad and varied) sample set of DOMs and then pick a threshold product:details ratio to 'detect' (as near as possible) what type of page I'm on.
This will require some trial and error, but is the best approach I can think of (neural nets can get to "good enough" results pretty quickly - try it, you'll be surprised at the results).

No. You can't check history with a bookmarklet, or with any normal client side JavaScript. You are correct, the referrer will be empty if loaded from a bookmark.
The bookmarklet can however store the referrer the first time it is used in a cookie or in localStorage and then the next time it is used, if referrer is empty, check the cookie or localStorage.
That said, your entire approach to this problem seems really odd to me, but I don't have enough details to know if it is genius our insanity.
If I was trying to determine if the current page was a list or a details page, I'd either inspect the url for common patterns or inspect the content of the page for common patterns.
Example of common url patterns: Many 'list pages' are search results, so query string will have words like "search=", "q=", "keywords=", etc.
Example of page content patterns: A product page will have only 1 "buy" button or "add to cart", whatever. A list page will have either no such button or have many.

Why don't u use the URL? then you can do something like this http://www.le.url.com?pageid=10&type=DS and then the code will be something like this:
<?php
if(isset($_GET['type']) && $_GET['type'] == 'DS'){
// Do stuff related to Details Show
} else{
// Show all the products
}
?>
And you can make the url something like this with an .htacces file:
http://www.le.url.com/10/DS

I would say your goal should first be for it to work for some websites. Then many websites and then eventually all websites.
A) Try hand coding the main sites like Amazon, eBay etc... Have a target in mind.
B) Something more creative might be to keep a list of all currency symbols then detect if a page has maybe 10 scattered around. For instance the $ symbol is found all over amazon. But only when there is say 20 per page can you really say that it is a product listing (this is a bad example, amazon's pages are fairly crazy). Perhaps the currency symbols won't work; however, I think you can you can generalize something similar. Perhaps tons of currency symbols plus detection of a "grid" type system with things lined up in a row. You'll get lots of garbage so you'll need good filtering. Data analysis is needed after you have something working algorithmically like this.
C) I think after B) you'll realize that your system might be better with parts of A). In other words you are going to want to customize the hell out of certain popular websites (or more niche ones for that matter). This should help fill the gap for sites that don't follow any known models.
Now as far as tracking where the user came from why not use a tracking cookie type concept. You could of course use indexedDB or localstorage or whatever. In other words always keep a reference to the last page by saving it on the current page. You could also do things like have a stack and push urls onto it on every page. If you want to save it for some reason just send that data back to your server.
Detecting favorite clicks could involve detecting all AJAX traffic and analyzing it (although this might be hard...). You should first do a survey to see what those calls typically look like. I'd imaging something like amazon.com/favorite/product_id would be fairly common. Also... you could try to detect the selector for the "favorite" button on the page then add an onclick handler to detect when it is clicked.
I tried to solve each problem you mentioned. I don't think I understand exactly what you are trying to do.

Use AJAX or pre-load: dynamic changes of items in select element

Apologies in advance for a long question: I do want to give all the relevant information.
In our (quite large) web application, we have a generic code for entering addresses (there could be a number of different addresses: business address, users' address, online shop delivery address, etc.) The addresses can be anywhere in the world, although the site itself is in English (and for now we have no plans to change this aspect). The standard address has these fields:
Street address
City
State/County/Province
Postal/ZIP code
Country
Some fields are optional, of course (e.g. there are no postcodes in Republic of Ireland, for example and there are no state/county/province division in many countries). The issue we're having is exactly with the state/county/province field: as it can be anywhere in the world, we are currently using <input type='text'/> for this field. However now users put anything they feel like into it - and we don't even have unified values for where they should be (e.g. for Boston, Massachusetts, some user put MA, some put Mass, some put Massachusetts, some put Middlesex county, Ma, and so on - I'm not even talking about all the misspellings). This makes any statistics by geography almost useless.
To mitigate this issue, we're moving to a different way of entering addresses: the user must select the country first, then based on the country selection we will display a dropdown <select> element with the list of states, counties, provinces, etc. valid for that country. If the country doesn't have this division (as far as our system is aware), then we revert back to the plain text field. So far, so good.
Now, for the actual question. We have a table in a DB that contains this county/state/province/etc division per country. The volume of data is not large: at present, 7 countries with 262 counties/states/provinces across all of them (i.e. total 262 rows in the table). I'm sure this will grow, but not hugely. There are two ways to handle this:
Pre-load all this data, put it into global javascript variables and in the onchange of the dropdown for the country update the corresponding dropdown for the state/county/province.
Use AJAX in the onchange for the country dropdown to load the country-specific list from the database as/when it's needed.
Which option, in your opinion, is the better one (preferably with some reasoning as to why)?

I would pre-load the data: have all the data in a javascript file (as JSON, for instance), minify it, gzip it and send it to the client. This is text data. Should not take much of the bandwidth.
I would use Ajax only for really dynamic items, or when you have a lot of data and you cannot load all of the data on load (because the volume is large and the user is going to use only a subset of them).
If you are really worried about the bandwidth: you have users that may not have a high speed connection, users that use a mobile phone to access your site etc, then you can detect a slow connection and fallback to Ajax.

Assuming your audiences are desktop users, I would tend to go for the first option, as the size of data you're loading doesn't seem to be too large. You can even put more intelligence into it to improve the user experience, such as loading the data in batches, in order of popularity.
If you are targeting mobile users, then you may want to lean towards the second option.
Either case, you will want to try it out yourself with the slowest connection speed you may be expecting, making sure you are delivering acceptable user experience even for the worst case. The chances are you will end up choosing an option somewhere between those two you listed: pre-load some data, and load the others in the background and/or on-demand.

I'd load the datas once the page is loaded. Your page won't suffer from decreased loading time, and the datas will be ready when the onchange event will be fired.

Handling large grid datasets in JavaScript

What are some of the better solutions to handling large datasets (100K) on the client with JavaScript. In particular, if you have multi-column sort and search capabilities, how do you handle fetching (and pre-fetching) the data, client side model binding (for display), and caching the data.
I would imagine a good solution would be doing some thoughtful work in the background. For instance, initially, if the table was displaying N items, it might fetch 2N items, return the data for the user, and then go fetch the next 2N items in the background (even if the user hasn't requested this). As the user made search/sort changes, it would throw out (or maybe even cache the initial base case), and do similar functionality.
Can you share the best solutions you have seen?
Thanks

Use a jQuery table plugin like DataTables: http://datatables.net/
It supports server-side processing for sorting, filtering, and paging. And it includes pipelining support to prefetch the next x pages of records: http://www.datatables.net/examples/server_side/pipeline.html
Actually the DataTables plugin works 4 different ways:
1. With an HTML table, so you could send down a bunch of HTML and then have all the sorting, filtering, and paging work client-side.
2. With a JavaScript array, so you could send down a 2D array and let it create the table from there.
3. Ajax source - which is not really applicable to you.
4. Server-side, where you send data in JSON format to an empty table and let DataTables take it from there.

SlickGrid does exactly what you're looking for. (Demo)
Using the AJAX data store, SlickGrid can handle millions of rows without flinching.

Since you tagged this with Ext JS, I'll point you to Ext.ux.LiveGrid if you haven't already seen it. The source is available, so you might have a look and see how they've addressed this issue. This is a popular and widely-used extension in the Ext JS world.
With that said, I personally think (virtually) loading that much data is useless as a user experience. Manually pulling a scrollbar around (jumping hundreds of records per pixel) is a far inferior experience to simply typing what you want. I'd much prefer some robust filtering/searching instead of presenting that much data to the user.
What if you went to Google and instead of a search box, it just loaded the entire internet into one long virtual list that you had to scroll through to find your site... :)

It depends on how the data will be used.
For a large dataset, where the browser's Find function was adequate, just returning a straight HTML table was effective. It takes a while to load, but the display is responsive on older, slower clients, and you never have to worry about it breaking.
When the client did the sorting and search, and you're not showing the entire table at once, I had the server send tab-delimited tables through XMLHTTPRequest, parsed them in the browser with list = String.split('\n'), and updated the display with repeated calls to $('node').innerHTML = 'blah'. The JS engine can store long strings pretty efficiently. That ran a lot faster on the client than showing, hiding, and rearranging DOM nodes. Creating and destroying new DOM nodes on the fly turned out to be really slow. Splitting each line into fields on-demand seems to work; I haven't experimented with that degree of freedom.
I've never tried the obvious pre-fetch & background trick, because these other methods worked well enough.

Check out this comprehensive list of data grids and
spreadsheets.
For filtering/sorting/pagination purposes you may be interested in great Handsontable, or DataTables as a free alternative.
If you need simply display huge list without any additional features Clusterize.js should be sufficient.

Develop Reference

JavaScript is the programming language of the Web.