How to read a very large rss/atom

How to read a very large rss/atom - javascript

I have a very large RSS (which may be 1M), so when I read it, it takes alot of time.
If I set the number of items read, example: 4, I think this will not ensure that I read all the data which updated since last time I read it (and I will lose some items),
What can I do?
I am using Google AJAX Feed Api to read the RSS/Atom feed.
updated:
I am using Google AJAX Feed to handle the RSS, then I store the data in my database.

Edit, possible specific solution:
If accessing a limited set of items from a feed do speed up the Google Feed API access, then simply keep asking for the most recent items until you encounter an item you have seen before. Unless the feed has been re-ordered this will ensure all items have been seen (however, remember that feed items may be updated -- those changes would be lost).
If accessing a limited set of items does not have an performance benefit, then another approach, such as a server-side helper (or another feed accessor), needs to be considered.
General information (not specific to this question):
The feed server should correctly handle If-Modified-Since header. So, while it won't directly save the 1M+ download, you only need to perform the download if the feed has been modified.
Additionally, you can request just a Range of data from the server, if the server supports Range requests, and manually merge the data in. Even if the server doesn't support range requests, you can abort the download after you have sufficient to continue (using this approach will allow you to inspect inbound data and terminate at exactly the right time).
In either case, you are responsible for ensuring enough is read -- from there it may be easiest just to "fix up" the local XML and pass it to a normal feed processor.
And, neither of the above are possible to do in plain client JavaScript :-)

Gosh that would be definitely a whole archive. I know how difficult large XML files can be to parse!

Related

large JSON data persist across pages

I have a 40-50MB JSON object that I need to persist across to a different page.
This only needs to happen once (one transition) but I'm still way over HTML5 LocalStorage limits, what other options do I have?

Unfortunately, that is too much data to store for most browsers. Even combining sessionStorage and localStorage both will not get us even close.
There are a few options you can try though:
You can store the data on your own server. This will depend on what web server/environment you are using.
You can use someone else's server to store the data. For example, you could use Google Drive's API. This does mean that your user needs a google account. You could also pay for a service like Amazon S3 to store it.
You could create a 'container' page, which loads and displays the pages, but keeps the session going. How exactly this works depends again on your environment.

40-50m is too huge for a browser, the worse part is if mobile is involved, what you can do is split the data into chunks, keep some in sessionStorage, localStorage and the remaining on your server, so that the part on the server will be fast enough to load, You will have to join them once all is loaded and done. I wouldn't recommend this method though.

Getting user info using Client ID

I have inserted the analytics.js tracking script into my code, and now I am trying to get user data such as medium, source, etc. using javascript and putting them into variables. Is there a way I can do this using Client Id?

I assume you mean getting the data in realtime for use in your website. That is not possible.
Client ID is not exposed in the interface by default, you'd need to use a custom dimension.
There is a processing delay, report data may only be reliable the next day.
While there is the (less reliable) data from the real time API (which at least contains medium and source information) it does not support custom dimension, so you could not use the client id as query key.
Also to retrieve data from the API you need to be authenticated, which the current users of your webpage is not. So you would need to set up some kind of serverside proxy that handles authentication for you.
Also there are API limits determining how many requests you can make in a given time frame. Even a small site would exhaust those requests pretty quickly.
So while in theory this sounds doable it is not actually feasible for any real-life purpose.

Data security highcharts

Just starting to use Highcharts. If I include data in an array within the javascript the data is available for anyone to download when they view the source. This would be the same when data is called, say, from a csv file. Is there a way of protecting the data against copying/download?

No, since HighCharts is a client-side JavaScript library, data available to it is also potentially available to the end user. There really is no way to "secure" it once the data reaches the user's browser, although you can use HTTPS, server-side authentication, etc to at least guarantee in principle that only the intended user receives the data.
If you need to visualize your data while keeping the actual raw data secure, the obvious solution is to render the data on the server and just (in the end) serve up an image or other static content to the user. But then you lose the nice, interactive charts.
You might be able to use Flash or Silverlight to retrieve the data, to make part of the process harder to reverse engineer. This is not securing anything, just making it a bit harder for a determined user.
On the other hand, a user can see the data anyway in the final chart. If they really want to download the data they could painstakingly identify each data point and create their own CSV file, right? You need to figure out what is good enough for your particular use case, and strike the appropriate balance.

Being that HighCharts is a client-side JS system, I don't believe there is a way to get data to it securely. If you just attempt an AJAX call to get data at runtime, a user can see that call and the response. As you said you cannot just populate a variable in the source, as it is visible there.

Try the render charts on server feature:
http://www.highcharts.com/docs/export-module/render-charts-serverside

Ideas on Protecting Web App data sources

I'm working on a new web app where a large amount of content (text, images, meta-data) is requested via an Ajax request.
No auth or login required for a user to access this.
My concern is that you could easily lookup the data source URL and hit it directly outside the app to get large data. In some ways, if you can do this you could probably scrape the static HTML pages elsewhere that also have this content.
Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?
Example: web app HTML page contains a key that is republished every 30 min. On the server side the data is obfuscated based on this key. In order to get the data outside the app you'd need to figure out the data source but also the extra step of scraping the page for a key every 30 min.
I realize there is no 100% way to stop someone, but I'm talking more about deterrence.

Use sessions in your webapp. Make a note (e.g. database entry or some other mechanism which your server-side code can access) when a valid request for the first page is received and include code in the second page to exclude the data when processing a request without a corresponding session entry.
Obviously the specifics on how to do this will vary between languages, but most robust web platforms will support sessions, largely for this type of reason.

If you are wanting to display real-time data and are concerned about scrapers...if this is a big enough concern, then I suggest doing it with flash instead of JS (AJAX). Have the data display withing a flash object. Flash can make real-time send/receive requests to the server just like AJAX. But the benefit of Flash is that the whole stage, data, code, etc.. are within a flash object, which cannot be scraped. Flash object makes the request, you output the stuff as a crypted string of code. Decrypt it within flash and display from there.

"Are there any suggestions on methods to obfuscate, hide, or otherwise make it very difficult to access the data directly?"
Answers your own question because if the data is worth getting it will be obtained because you are obfuscating is merely making it harder to find.
You could in the server side script processing the ajax and returning the data check where the request came from.

JavaScript - Storing data during user interaction

I'm working on a web based form builder that uses a mix of Jquery and PHP server side interaction. While the user is building the form I'm trying to determine the best method to store each of one of the form items before all the data is sent to the server. I've looked at the following methods
Javascript arrays
XML document
Send each form item to the server side to be stored in a session

The good, the bad and the ugly
Depends on your application functionality and requirements, but Javascript would probably be the best way. You can use either arrays or objects or whatever in javascript. It's server independent and it will preserve data over a long period of time as long as client session stays present (browser window doesn't close for whatever reason) but this can be quite easily avoided (check my last paragraph).
Using XML documents would be the worst solution because XML is not as well supported on the client side as you might think.
Server side sessions are good and bad. They are fine if you store intermediate results from time to time, so if client session ends because of whatever reason, user doesn't loose all data. But the problem is that it may as well expire on the server.
If I was you, I'd use Javascript storage and if needed occasionally send JSON serialized results to server and persist them there as well (based on business process storig this data somewhere else than session could be a better solution). I'd do the second part (with sever side combination) only if I would know that user will most probably build forms in multiple stages over a longer period of time and multiple client sessions. but can be used for failure preventions as well. Anyway. Javascript is your best bet with possible server-side interaction.
Preserving data between pages on the client
Be aware that it's also possible to preseve data between pages on the client side. Check sessvars library for this. So even if the page gets refreshed or redirected and then returned all this can be stored on the client side between these events like magic. Marvelous any rather tiny library that made my life several times. And lessened application complexity considerably that would otherwise have to be implemented with something more complex.

I used TaffyDB to store data, and it's just wonderfully easy to implement.
Hope this helps you

You may want to check out PersistJS, which exposes a cross-browser persistent storage object. Of course, being persistent, data stored with this library survives sessions, not just page changes.
The latest version (0.2.0) is here – note the version in the above linked post is 0.1.0.

A combination of #1 (although I'd use objects, not arrays necessarily) and #3 would seem like a good approach. Storing the data locally in the browser (#1) makes it immediately accessible. Backing that up with session-based server-side storage defends you from the page being refreshed; you can magically restore the page just as it was.

Develop Reference

JavaScript is the programming language of the Web.