PHP - file_get_contents() reading chunks gets slower with time

PHP - file_get_contents() reading chunks gets slower with time - javascript

I'm making some tests to store large files locally with IndexedDb API, and I'm using PHP with JSON (and AJAX on Javascript's side) to receive the file data.
Actually, I'm trying to get some videos and, to do so, I use the following PHP code:
$content_ar['content'] = base64_encode(file_get_contents("../video_src.mp4", false, NULL, $bytes_from, $package_size)));
return json_encode($content_ar);
I know base64_encode will deliver 1/3 more of information than the original, but that's not the problem right now as it's the only way I know how to retrieve binary data without losing it on the way.
As you can see, I specify from which byte it has to start to read and how many of them I want to retrieve. So, on my JS side, I know how much of the file I have already stored and I ask the script to get me from actual_size to actual_size + $package_size bytes.
What I'm seeing already is that the scripts seems to run more slowly as time goes by and depending on the file size. I'm trying to understand what happens there.
I've read that file_get_contents() stores the file contents in memory, so with big files it could be a problem (that's why I'm reading it in chunks).
But seeing it gets slower with big files (and time), may it be possible that it's still storing the whole file on memory and then delivering me the chunk I tell it to? Like it loads everything and then returns the part I demand?
Or is it just storing everything until the $bytes_from + $package_size (that's why it gets slower with time, as it increases)?
If any of the above, is there any way to get it to run more efficiently and improve performance? Maybe I have to do some operations before or after to empty memory resources?
EDIT:
I've made a screenshot showing the difference (in ms) of the moment I make the call to get the file bytes I need, and the right moment when I receive the AJAX response (before I do anything with the received data, so Javascript has no impact on the performance). Here it is:
As you can see, it's increasing with every call.
I think the problem is the time it spends to get to the initial byte I need. It does not load the whole file into memory, but it's slow until getting into the first byte to read, so as it increases the initial point, it takes more time.
EDIT 2:
Could it have something to do with the fact that I'm JSON encoding the base64 content? I've been making some performance tests and I've seen that setting $content_ar['content'] = strlen(base64_encode(file...)) is done in so much less time (when, theorically, it's doing the same work).
However, if that's the case, I still cannot understand why it increases the slowness among time. The work of encoding the same length of bytes should take the same amount of time, isn't it?
Thank you so much for your help!

Related

Repeatedly re-writing JSON and getting undefined due to re-write

I am currently generating some data in Python and saving the corresponding data to a JSON file. This data grows quite quickly although it is not large to the point where it will be an issue, it is also re-generated fairly fast. I am then reading this data into Javascript at a very fast rate. The issue I am having is most of the time when I am requesting the JSON data the file is returning undefined in the python size due to the fact that I re-write the JSON file each time which causes the data to be undefined. The solution I am looking for would lead to the file being undefined for far less time, I could potentially switch to other methods for transferring data between Python and Javascript if any spring to mind.

Fastest way to read in very long file starting on arbitary line X

I have a text file which is written to by a python program, and then read in by another program for display on a web browser. Currently it is read in by JavaScript, but I will probably move this functionality to python, and have the results passed into javascript using an ajax Request.
The file is irregularly updated every now and then, sometimes appending one line, sometimes as many as ten. I then need to get an updated copy of the file to javascript for display in the web browser. The file may grow to as large as 100,000 lines. New data is always added to the end of the file.
As it is currently written, javascript checks the length of the file once per second, and if the file is longer than it was last time it was read in, it reads it in again, starting from the beginning, this quickly becomes unwieldy for files of 10,000+ lines. Doubly so since the program may sometimes need to update the file every single second.
What is the fastest/most efficient way to get the data displayed to the front end in javascript?
I am thinking I could:
Keep track of how many lines the file was before, and only read in from that point in the file next time.
Have one program pass the data directly to the other without it reading an intermediate file (although the file must still be written to as a permanent log for later access)
Are there specific benefits/problems with each of these approaches? How would I best implement them?
For Approach #1, I would rather not do file.next() 15,000 times in a for loop to get to where I want to start reading the file, is there a better way?
For Approach #2, Since I need to write to the file no matter what, am I saving much processing time by not reading it too?
Perhaps there are other approaches I have not considered?
Summary: The program needs to display in a web browser data from python that is constantly being updated and may grow as long as 100k lines. Since I am checking for updates every 1 second, It needs to be efficient, just in case it has to do a lot of updates in a row.

The function you seek is seek. From the docs:
f.seek(offset, from_what)
The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.
Limitation for Python 3:
In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.
Note that seeking to a specific line is tricky, since lines can be variable length. Instead, take note of the current position in the file (f.tell()), and seek back to that.

Opening a large file and reading the last part is simple and quick: Open the file, seek to a suitable point near the end, read from there. But you need to know what you want to read. You can easily do it if you know how many bytes you want to read and display, so keeping track of the previous file size will work well without keeping the file open.
If you have recorded the previous size (in bytes), read the new content like this.
fp = open("logfile.txt", "rb")
fp.seek(old_size, 0)
new_content = fp.read() # Read everything past the current point
On Python 3, this will read bytes which must be converted to str. If the file's encoding is latin1, it would go like this:
new_content = str(new_content, encoding="latin1")
print(new_content)
You should then update old_size and save the value in persistent storage for the next round. You don't say how you record context, so I won't suggest a way.
If you can keep the file open continuously in a server process, go ahead and do it the tail -f way, as in the question that #MarcJ linked to.

Read file from last document read - JSON stream

I'm receiving a stream from a server, that is increasing exponentially. and I need to check every minute for new data, process that data, and ask for more next minute.
the data is JSON documents. receive in average ~600-700 documents per minute.
I have to avoid reading the documents already processed due to performance issues.
Is it possible to only read the data received from the last minute?

You can use a circular buffer and put there the data using a listener.
As an example, by storing there the last N documents or chunks, it mainly depends on the code of your application.
This way older data will be discarded for design and you have not to deal neither with streams' internals nor with poorly designed solutions.
It's a matter of defining the right size for the buffer, but it looks to me as a far easier problem.

How do you send and receive the same image between the client and the server?

I'm trying to implement a bandwidth test, and it looks like the most conventional way to do this is basically to transmit one or more images back and forth between a client and a server and see what the upload and download times are. In fact, this answer already covers the idea of getting a download time.
At this point though, I'm not too sure how to make the process go both ways, with or without using the other answer. Even after adding debugging statements, I haven't found where the picture's data is stored in the other answer's code. And if I try to start off with a clean slate, a lot of the API information I'm finding on the Internet / Stack Overflow about sending images back and forth has very little explanation or elaboration, so I'm not able to use it put the two together. Furthermore some experiments I have put together that have sometimes involved other platforms seemed to really throttle bandwidth usage, as well as scale the delay improperly with the images' sizes.
Using JavaScript, how do you transmit the same image back and forth between the client and server in such a way that you can accurately use the delay and the image's size to measure bandwidth? How do you make this work both ways with the same image, without throttling, and without interaction from the user?
EDIT
I could try posting things I've tried, but it's hard for it to be meaningful. A lot of it was in Flash. When I start using JavaScript, I started to experiment a little along these lines:
$.get('http://ip address/test.php?filename=XXX&data=LONG STRING OF DATA REPRESTING THE DATA TO BE SAVED PROBABLY LESS THAN 2K IN SIZE AND ALSO YOU SHOULD ESCAPE OR ATLEAST URIENCODE IT', function(data) {
alert('here')
eval(data);
});
The PHP file being:
<?php
echo "response=here";
?>
And I used the PHP file both for Flash and for JavaScript. I also used Adobe Media Server with Flash. But going from a 1MB file to a 32MB file while using Flash/PHP, Flash would only scale the delay by 10 times, nowhere near 32. It also seemed to throttle bandwidth usage at least when paired with the AMS, and maybe even when it was paired with the PHP file.
I was about to convert the JavaScript code to pass the actual image in to the PHP file...but I can't get to it. Even when I do things like:
for (var s in download) {
alert(s + ": " + download[s]);
}
download being the object that downloaded the image in the JavaScript (see the linked answer for the code), I'm not seeing anything useful. download.children.length is 0 and so on. I'm also reluctant to trust that the results aren't throttling bandwidth usage, like the Flash experiments did, without further confirmation; maybe the image has to be passed in using one type of API call or another to get it to really work right?
In essence, I'm really looking for good API information. Other stuff I saw just wasn't elaborate enough to connect the dots with.
2ND EDIT
One of the pitfalls I've run into is using POST to download the images. I'm running into a lot of difficulty in getting IIS 7 to allow POST to download arbitrary file types (namely jpgs) with "unusual" binary characters and allow them to be more than 2MB in size.

Why don't you send some text using $.post.
E.g:
Generate some big text:
var someText = '0123456789';
for (var i = 0; i <= 10000; i++)
{
someText += someText;
}
then post it to the server:
var start = new Date();
$.post('yourUrl', {data:someText}, function(){
var end = new Date();
//this will show the bytes per second upload bandwidth
alert(someText.length / ((end - start)/1000));
});
To be sure the result is exact, you can run this, for example, 10 times and get the average value.

Maximum size of ajax returned data

Fellow coders, I have asked this question before but did not get a conclusive answer to it. The question is: how much data can i safely return from and ajax post call before i run into some limitation somewhere?
The scenarios are basically like this: front-end makes an ajax call to a php controller/model. the controller returns a bunch or rows from the database or returns some html representing some report which will be stored in a js string var to be displayed later.
I see two limitations here: the size of the data returned through the ajax call and max size the js var can hold.
Anyone knows what the limits are?
thanks

See this answer: Javascript maximum size for types?
In short, unless the browser specified otherwise, variable sizes are not subject to a restriction. As for Ajax: There's no limit, unless defined server-side (such as this one).

I don't think either factor you listed would be an issue. What I would look at are:
The amount of time the user is willing to wait for the response. Also, your server-side programming language or web server might impose a limit on the length of any one request.
The amount of RAM the client has. Even if there is no variable size limit, eventually the computer will run out of space.
In these situations, you are almost always better off delivering smaller chunks of the data at a time and allowing the user to load what data they need (either by granulation [showing summaries and letting them drill down], or pagination / search). No one wants to wait 10 minutes for the site to load, and HTTP doesn't really handle large requests all that well.

Develop Reference

JavaScript is the programming language of the Web.