I have a program that logs every GET/POST request made by a website during the page load process. I want to go through these requests one by one, execute them, and then determine if the file that was returned is a Javascript. Given that it won't have a .js ending (because of scripts like this, yanked from google.com a minute ago), how can I parse the file gotten from the request and identify if it is a Javascript file?
Thanks!
EDIT:
It is better to get a false positive than a false negative. That is, I would rather have some non-JS included in the JS-list than cut some real JS from the list.
The javascript link that you referred does not have a content type, nor does it have the js extension.
Any text file can be considered javascript if it can get executed which can make detection from scratch very difficult. There are two methods that come to mind.
Run a linter on the file contents. If the error is a syntax error or a Parsing error, it is not javascript. If there are no syntax error or parsing error, it should be considered javascript
Parse the AST (Abstract syntax tree) for the file contents. A javascript file would parse without errors. There should be a number of AST libraries available. I haven't worked with JS AST, so can't recommend any one of them but a quick search should give you some options.
I am not sure but probably a linter would also run AST before doing syntax checks. In this case, running AST seems like a lighter option.
The easiest way would be to check if there was anything identifying javascript files by their URI, because the alternatives are a lot heavier. But since you said this isn't an option, you can always check the syntax of the contents of each file using some heuristic tool. You can also check the response headers for its content-type.
Related
I have an HTML file with some Javascript and css applied on.
I would like to duplicate that file, make like file1.html, file2.html, file3.html,...
All of that using Javascript, Jquery or something like that !
The idea is to create a different page (from that kind of template) that will be printed afterwards with different data in it (from a XML file).
I hope it is possible !
Feel free to ask more precision if you want !
Thank you all by advance
Note: I do not want to copy the content only but the entire file.
Edit: I Know I should use server-side language, I just don't have the option ):
There are a couple ways you could go about implementing something similar to what you are describing. Which implementation you should use would depend on exactly what your goals are.
First of all, I would recommend some sort of template system such as VueJS, AngularJS or React. However, given that you say you don't have the option of using a server side language, I suspect you won't have the option to implement one of these systems.
My next suggestion, would be to build your own 'templating system'. A simple implementation that may suit your needs could be something mirroring the following:
In your primary file (root file) which you want to route or copy the other files through to, you could use JS to include the correct HTML files. For example, you could have JS conditionally load a file depending on certain circumstances by putting something like the following after a conditional statement:
Note that while doing this could optimize your server's data usage (as it would only serve required files and not everything all the time), it would also probably increase loading times. Your site would need to wait for the additional HTTP request to come through and for whatever requested content to load & render on the client. While this sounds very slow it has the potential of not being that bad if you don't have too many discrete requests, and none of your code is unusually large or computationally expensive.
If using vanilla JS, the following snippet will illustrate the above:
In a script that comes loaded with your routing file:
function read(text) {
var xhr=new XMLHttpRequest;
xhr.open('GET',text);
xhr.onload=show;
xhr.send();
}
function show() {
var text = this.response;
document.body.innerHTML = text;//you can replace document.body with whatever element you want to wrap your imported HTML
}
read(path/to/file/on/server);
Note a couple of things about the above code. If you are testing on your computer (ie opening your html file on a browser, with a path like file://__) without a local server, you will get some sort of cross origin request error when trying to make an XML request. To bypass this error, either test your code on an actual server (not ideal constantly pushing code, I know) or, preferably, set up a local testing server. If this is something you would want to explore, its not that difficult to do, let me know and I'd be happy to walk you through the process.
Alternately, you could implement the above loading system with jQuery and the .load() function. http://api.jquery.com/load/
If none of the above solutions work for you, let me know more specifically what it is that you need, and I'll be happy to give a more useful/ relevant answer!
I catch JS errors by subscribing to window.onerror event, so if someone catches 'undefined variable' error, I send it to server for debugging purposes.
As you know this event return 'message of error', 'url' and 'line' where error occurred.
The problem are in compressed files.
If file is compressed, all the code goes in one line and it's big problem to determine the exact place of error.
Is there any solution for this problem?
JavaScript compressors usually do two things:
Remove all unneccessary white-space (“unneccessary” in terms of syntactical validaty).
shorten variable names, if possible. This applies to local variables, i.e. those which are not in the global scope or members of an object.
There are some other optimizations, such as function inlining, but these are usually not so problematic.
As for the first point, you can run the code through one of the many JavaScript source formatters. This should give you a pretty readable JavaScript file.
As for the second point, the obfuscation is usually not reversible. If “speaking” variable names like width, height or whatever have been changed to a or b, you cannot know what they were meant to express in the first place.
If this problem applies to an open source product, you can usually download the sources and try to reconstruct the problem with them.
If it's closed source, and only “beautifying” the code doesn't help, you have to write a bug report to the vendor.
No. There is no way to "unminify" a JavaScript include for the purposes of error logging.
Your best bet is probably to log the Error Type in the hope that this will help you debug the problem.
If you really want to get to the specific line number you would have to remove the minimization and rely on browser caching to attain performance.
I think you could use source maps...
Its a file that can be generated when minifying, and can be used to map the line/character of the minified file to the original source.
http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/
The best answer regarding this question I found here
Malyw suggests yo use Uglify with max-line-len option and sourcemaps.
That's probably the best solution to identify exact place in code
I have a piece of javaScript that works (alongside html) to produce the GUI for a program written in C++. The program has to run for a long time (Sometimes 14/15 days, without monitoring).
The C++ and javaScript communicate by writing to/reading from a XML file.
After running the program for over 24 hours at a time, I've noticed the occasional javaScript error appearing 'someArray[...].name' is null or not an object.
Now: These are all arrays that are filled with information taken from the XML file, written by the C++. The contents of these arrays are refreshed every few seconds (To update information in the GUI 'live').
Question is: Could these errors be caused by an access/timer problem as in --> The javaScript starts reading a line from the XML just as the C++ swoops in and rewrites that line. Therefore information is parsed into the javaScript arrays with some illegal characters (etc) which when accessed throws the errors?
Hope that all makes sense. Thanks.
Your suggestion seems to provide a plausible explanation of what's happening.
You're probably seeing a race condition.
To fix it, you could implement a synchronization mechanism between C++ and JS.
The simplest form of sync I can think of is creating a second file each time C++ writes to your main XML file (this file acts as a lock). JS waits for the lock file to disappear before reading the XML. The same is done on the C++ side.
Sample code:
C++:
while(programRunning) {
do stuff;
// Now it's time to write XML
while("lockCpp.txt" exists)
; // Do nothing, JS is reading
create file "lockJS.txt";
write to xml;
delete file "lockJS.txt";
}
JavaScript:
while(programRunning) {
do stuff;
// Now it's time to read XML
while("lockJS.txt" exists)
; // Do nothing
create file "lockCpp.txt";
read xml;
delete file "lockCpp.txt";
}
This should in practice eliminate race conditions (though some are theoretically, possible, but unlikely).
Should JS not be allowed to write to the file system, then you could remove one of the lock files (lockCpp.txt) and, if the reading on the JS side is normally faster then the writing, it should still eliminate most conflicts.
EDIT after comment:
If you only have access to JS, you could check that the XML document is complete when reading, e.g. the root element is correcly matched by a </rootElementName> at the end.
That will ensure the file write is complete provided C++ doesn't do writes at random locations, but always rewrites the whole document.
Another route would be checking the file is not changing over time. If C++ only sporadically writes to XML, you can read it a few times over a few, say, seconds, and, if unchanged, use the read value. If changed, keep waiting.
HTH
I observed chunks like below sometimes on web pages. So i am curious to know what this really does? or why its written in such a way?
<script src="somefile.js?param1=one¶m2=two" />
i can only make out following few intentions behind it
Its not page URL (i mean .aspx/.php/.jsp etc.) so its not hacking kind of code where user can add code like this to pass data without getting users attention as its tag which does not render on UI OR implementing old type of AJAX alternative
This kind of URL param are useful if user do not wish the JS file (any other resource like image) to get cached. This can be quick way to manage caching
But i am unable to figure out following
Looks like page URL parameters but are these parameters anyway readable in JavaScript file and have some additional utility?
Do these parameters have any extra role to play here ?
What are the other possible practical scenarios where code like this can be/is used?
So please provide some inputs related with the same
Thanks,
Running Non-JS Code within .js Extensions
In cases like this, that source .js file might (given proper server-configurations) actually have PHP/.NET code within it, which can read those appended values.
As you said, Avoiding Cache...
Additionally, people will at times append a random string at the end of their referenced elements to avoid loading cached data.
The URL having '.js' means nothing. It could still be handled by a server-side script like an ASP or PHP.
Either the javascript file is not static (it is generated by the server based on the parameters in its querystring)
OR
In the JavaScript file itself, you can have it check its own querystring parameters (not just that of the page, but that of the javascript source url).
OR
(This doesn't exactly match your scenario, but) you can also add parameters at the end of image and script urls as a way of versioning. The version with the url="somescript.js?V=3" will be cached by the user until the page then changes and the url is not="somescript.js?V=4". The file will be replaced by the version on the server no matter what the browser setting may be.
My guess (without looking at this specific case) is that the javascript file is reading its own querystring. I have done this, and its very helpful.
Looks like page URL parameters but are these parameters anyway readable in JavaScript file and have some additional utility?
Yes you can read them in JavaScript, Scriptaculous uses that approach for loading modules, eg:
<script type="text/javascript" src="scriptaculous.js?load=effects,dragdrop">
</script>
Do these parameters have any extra role to play here ?
What are the other possible practical scenarios where code like this can be/is used?
That can be also used for server-side script joining and minifying, of course using some url rewriting technique to have the .js extension, and as you say, it's a common technique to add timestamp parameters to break the browser cache.
It can be used for three different reasons:
1) To generate the JavaScript file in the server depending on the parameters;
2) To avoid caching;
3) To pass parameters to JavaScript itself
An example of this in practice would be a server side handler for somefile.js that uses the parameters (names of other scripts) to determine which scripts are actually required and combine/minify them, returning them as a single somefile.js script file.
I try to get to a page straight from Bash at http://www.ocwconsortium.org/. The page appears when you write mathematics to the field at the top right corner. I tested
open http://www.ocwconsortium.org/#mathematics
but it leads to the main page. It is clearly some javascript thing. How can I get the results straight from Bash on the first page?
[Clarification]
Let's take an example. I have the following lines for a Math search engine in .bashrc:
alias mathundergradsearch='/Users/user/bin/mathundergraduate'
Things in a separate file:
#!/bin/sh
q=$1
w=$2
e=$3
r=$4
t=$5
open "http://www.google.com/cse?cx=007883453237583604479%3A1qd7hky6khe&ie=UTF-8&q=$q+$w+$e+$r+$t&hl=en"
Now, I want something similar to the example. The difference is that the other site contains javascript or something that does not allow me to see the parameters. How could I know where to put the search parameters as I cannot see the details?
open "http://www.ocwconsortium.org/index.php?q=mathematics&option=com_coursefinder&uss=1&l=&s=&Itemid=166&b.x=0&b.y=0&b=search"
You need quotes because the URL contains characters the shell considers to be special.
The Links web browser more or less runs from the commandline (like lynx) and supports basic javascript.
Even though the title of the post sounds general, your question is very specific. It's unclear to me what you're trying to achieve in the end. Clearly you can access sites that rely heavily on javascript (else you wouldn't be able to post your question here), so I'm sure that you can open the mentioned site in a normal browser.
If you just want to execute javascript from the commandline (as the title suggests), it's easy if you're running bash via cygwin. You just call cscript.exe and provide a .js scriptname of what you wish to execute.
I didn't get anything handled by JavaScript - it just took me to
http://www.ocwconsortium.org/index.php?q=mathematics&option=com_coursefinder&uss=1&l=&s=&Itemid=166&b.x=0&b.y=0&b=search
Replacing mathematics (right after q=) should work. You may be able to strip out some of that query string, but I tried a couple of things and and it didn't play nice.
Don't forget to encode your query for URLs.
You will need to parse the response, find the URL that is being opened via JavaScript and then open that URL.
Check this out: http://www.phantomjs.org/.
PhantomJS it's a CLI tool that runs a real, fully-fledged Browser without the Chrome.