How to read a large file(>1GB) in javascript?

How to read a large file(>1GB) in javascript? - javascript

I use ajax $.get to read a file at local server. However, the web crashed since my file was too large(> 1GB). How can I solve the problem? If there's other solutions or alternatives?
$.get("./data/TRACKING_LOG/GENERAL_REPORT/" + file, function(data){
console.log(data);
});

A solution, assuming that you don't have control over the report generator, would be to download the file in multiple smaller pieces, using range headers, process the piece, extract what's needed from it (I assume you'll be building some html components based on the report), and move to the next piece.
You can tweak the piece size until you find a reasonable value for it, a value that doesn't make the browser crash, but also doesn't result in a large number of http requests.
If you can control the report generator, you can configure it to generate multiple smaller reports instead of a huge one.

Split the file into a lot of files or give a set user ftp access. I doubt you'd want too many people downloading a gig each off your web server.

Related

How can I suppress 404 error when a file is not found?

I have a list of webpages example.com/object/140, example.com/object/141, example.com/object/142, ...
and each page should have a particular background image example.com/assets/images/object/140.jpg, example.com/assets/images/object/141.jpg, ...
Some images are missing and then I use a default image. In that case, when I check if the image exists, I get a 404 error. I have already seen in several pages there isn't a direct way to avoid this problem.
Then I did the following: I created a service in the backend (C#) that checks if the file exists File.Exists(fileName);. That way I managed to avoid this error in my localhost. So far so good.
Now I published both my frontend and backend in two different services in Azure. The images are in the frontend but the file service is in the backend. My method does not work anymore because I can't access directly the frontend folders from the backend. One solution could be to make an http call from the backend to the frontend, but I think this doesn't make much sense, it's getting too messy.
One option could be to store in the DB a boolean with the (non)existence information, but I think this is prone to inconsistencies (if the boolean is not updated immediately when a new image is loaded or deleted, for example), even if I run a daily job to clean it.
Still another option could be to store the images directly in the DB and retrieve them together with the DTOs of the objects I'm loading in each particular page, but I guess that images that are shown only in the frondend should be stored in the frontend... shouldn't they?
Therefore:
a) Is any of these ideas acceptable? Is there a better way to avoid this error?
b) Another possibility: is there a way to access the frontend folders from the backend? I get a bit lost with the publishing and artifacts in Azure and I don't know if I could do it somehow.

I'm not sure how you've built the frontend, but I'm assuming that the background images are set using CSS. It is possible to set multiple background images in the same rule, and the browser will load them all and display them one below the other - if the first one loads successfully, and isn't transparent, then that is the only thing the user will see. But if the first image fails to load - for example because it doesn't exist, the second image will be shown.
See this other answer for more details: https://stackoverflow.com/a/22287702/53538

Saving filename in DB after uploading to GCP Storage or using bucket.getFiles()

I've been searching in StackOverflow, but it seems that this question has not been asked yet. It's an architecture question about files being uploaded to GCP Storage.
TL;DR : Is there any issue using bucket.getFiles() directly (from a server), rather than storing each filename in my db, and then asking for them one by one and returning the array to the client ?
The situation:
I’m working on a feature that will allow the user to upload image attachements linked to a delivery note. This delivery note can have multiple attachements.
I use a simple upload button on my client (mobile device), and upload the content in GCP in a path/to/id-deliveryNote folder such as: path/to/id-deliveryNote/filename.jpg path/to/id-deliveryNote/filename2.jpg etc…
Somewhere else in the app the user should be able to click and download on each of those attachements.
The solution
After the upload being done in GCP, I asked myself how to read those files and give the user a download link to the file. That’s when I found the: bucket.getFiles() function.
Since my path to files are all below the same id-deliveryNote/ prefix, I leverage the usage of bucket.getFiles(prefix) and after the promise resolve can safely return to my user the list of links available.
The issue
I do not store the filenames in my deliveryNote table in my DB. Which can sound a bit problematic, relying on GCP to know the attachements of one deliveryNote. The way I see it is that, in my way I do not need to replicate the information in our DB (and possibly handling failure at two spots), and if I need those files I will at the ask GCP to give me their links. The opposed way of thinking is that, storing the names you will be able to list the attachements for the clients, and then generating the download link, when the user click a specific attachement.
My question is: Is there any issue using bucket.getFiles() directly (from a server), rather than storing each filename in my db, and then asking for them one by one and returning the array to the client ?
Some point that could influence the chosen method:
GCP costs per call difference ?
Invalid application data structure ?
Other things ?

There is no issue with using this method to return the link for the files to download. In the API documentation for this method - accessible here - they even show an example of returning files using prefixes as well. You just need to look out that Cloud Storage actually doesn't use real folders and only names that look like they are in folders - more details in this case here - so you don't mix up concepts when working with names and prefixes.
For the pricing point, you can get the whole pricing for Google Cloud Storage in this documentation, including how much each operation will cost - for example, it will cost you $ 0.02 per 50000 operations for object gets, retrieving bucket and object metadata - storing data, etc. After you check that, you can compare with your database costs as well, to check it if this point will impact you.
To summarize, there is no problem for you to follow this. The advantage of storing the names on Database, it's actually that even though you could have failure in two spots, it's more probable for you to face issues in only one place and this way the replication would be a great thing to have. So, you just need to decide which one fits you best.

PHP: Request 50MB-100MB json - browser crash / do not display any result

Huge json server requests: around 50MB - 100MB for example.
From what I know, it might crash when loading huge requests of data to a table (I usually use datatables), the result: memory reaches to almost 8G, and the browser crash. Chrome might not return a result, Firefox will usually ask if I want to wait or kill the process.
I'm going to start working on a project which will send requests for huge jsons, all compressed (done by the server side PHP). The purpose of my report is to fetch data, and display all in a table - made easy to filter and order. So I cant find the use of "lazy load"ing for this specific case.
I might use a vue-js datatable library this time (not sure which specifically).
What's exactly using so much of my memory? I know for sure that the json result is received. Is that rendering/parsing of the json to the DOM? (I'm referring to the datatable example for now: https://datatables.net/examples/data_sources/ajax)
What is the best practices in these kind of situations?
I started researching this issue and noticed that there are posts from 2010 that seem like they're not relevant at all.

There is no limit on the size of an HTTP response. There is a limit on other things, such as:
local storage
session storage
cache
cookies
query string length
memory (per your CPU limitations or browser allocation)
Instead, the problem is with your implementation of your datatable most likely. You can't just insert 100,000 nodes into the DOM and not expect some type of performance impact. Furthermore, if the datatable is performing logic against each of those datum as they're coming in and processing them before the node insertion, that's also going to be a big no no.
What you've done here is essentially pass the leg work of performing pagination from the server to the client, and with dire impacts.
If you must return a response that big, consider using one of the storage options that browsers provide (a few mentioned above). Then paginate off of the stored JSON response.

Alternative ways to scrape large data from external URLs in Google-Apps-Script

Question
In Google Apps Script, are there alternative ways to get data from external URLs?
The preferred way using UrlFetchApp.fetch(URL) doesn't work in my case.
The retrieved data (a JSON string) is cut off, probably because it's too long.
Code to show the limitation
function ShowFetchLimit() {
var request = UrlFetchApp.fetch("http://www.gw2spidy.com/api/v0.9/json/all-items/all")
var response = request.getContentText()
Logger.log('Your retrieved string length is ' response.length)
}
Preview on a public Google-Apps script
Log output
Your retrieved string length is: 10.485.277
But the real string length should be 11.439.522 characters long. It seems I'm hitting a Google Script limitation, maybe the URL Fetch POST size limit which is 10MB per call

you are hitting a limit that seems is unpublished but consistent with the urlFetch "post" quota.
there is really no way to workarround that Ive seen. For example lets say you instead call your server asking to write the response on a spreadsheet. then from your app you try and read the spreadsheet. ive tried reading huge sheet ranges from gas and at some point it just fails to read it. Gas has limited ram too and for one it would need to fit in ram (besides any other network read quotas).

This is actually possible provided your server supports partial downloading.
See this post for complete explanation with code :-
Need alternative to UrlFetchApp.fetch for fetching large data in Apps Script

How to properly process base64 to stored server image

I'm working on an add-item page for a basic webshop, the shop owner can add item images via drag/drop or browsing directly. When images are selected i'm storing the base64 in an array. I'm now not too sure how best to deal with sending/storing of these item images for proper use. After giving Google a bit of love i'm thinking the image data could be sent as base64 and saved back to an image via something like file_put_contents('/item-images/randomNumber.jpg', base64_decode($base64)); then adding the item's image paths to its database data for later retrieval. Below is an untested example of how i currently imagine sending the image data, is something like this right?
$("#addItem").click(function() {
var imgData = "";
$.each(previewImagesArray, function(index, value) {
imgData += previewImagesArray[index].value;
});
$.post
(
"/pages/add-item.php",
"name="+$("#add-item-name").val()+
"&price="+$("#add-item-price").val()+
"&desc="+$("#add-item-desc").val()+
"&category="+$("#add-item-category :selected").text()+
"&images="+imgData
);
return false;
});
Really appreciate any help, i'm relatively new to web development.

As you are doing, so do I essentially: get the base64 from the browser, then post back, and store. A few comments...
First, HTML POST has no mandatory size limitation, but practically your backend will limit the size of posted data. (eg, 2M max_post_size in PHP.) Since you are sending base64, you are significantly reducing the effective payload you can send. That is, if every one byte of image equals three bytes of base64, you will get far less image transfered per byte of network. Either send multiple posts or increase your post size to fit your needs.
Second, as #popnoodles mentioned, using a randomNumber will likely not be sufficient in the long term. Use either a database primary key or the tempnam family of functions to generate a unique identifier. I disagree with #popnoodleson implementation, however: it's quite possible to upload the same file b/w two different people. For example, my c2013 Winter Bash avatar on SO was taken from an online internet library. Someone else could use that same icon. We would collide, so the MD5 is not sufficient in general, but in your use case could be.
Finally, you probably will want to base64 decode, but give some thought to whether you need it. You can use a data/url and inline the base64 image data. This has the same network issue as before: significantly more transfer is required to send it. But, a data URL works very well for lots of very small images (eg avatars) or pages that will be cached for a very long time (esp if your users have reasonable data connections). Summary: consider the use case before presuming you need to base64 decode.

Develop Reference

JavaScript is the programming language of the Web.