How to copy certain urls from a webpage? - javascript

I have hundreds of page which I need to grab the urls of certain filehosters, but since the number of link for each page and the pages theyself, it has been tested to find the string and copy to the clipboard , using javascript
static getFileUrl(id){
var url=new URL("https://fileshosting.net/file/FileUrl")
url.searchParams.append('*',id)
return url.href
}
static copyFileUrl(id){
copy(this.getFileUrl(id))
}
I'm not to used to javascript but this should suffice, and everything it gets its the next console output
Uncaught SyntaxError: unexpected token: identifier
I understand this mean a semicolon before the statement is missing. But I don't get where it is. AFAIK there are two statement (the static procedures).
And sometimes there is only one url but in other pages there are 20 or 50 urls so perhaps it should be used an array, but I don't know how to implement an array in javascript.
so is there some method to, instead of open manually page to page, parse it from a terminal and pipe it to this (java)script or another language rutine?
Thanks in advance.
UPDATE
I have found another way, since it doesn't work this approach.
Thanks for the support and time.

Related

including javascript file and w3c validation

I bought a template and in the main page, it included the javascript file as follow:
<script
src="thefile.js?v=v1.9.6&sv=v0.0.1"></script>
As you can see there are two argument at the end of the file one is v which I assume is the version and sv which I don't know what is it.
When I check the file in one of the w3c validator, It shows error and says that the " & did not start a character reference".
Now I have two questions: First what is the sv stands for and second should I remove the & in the script to eliminate the error?
The param ?v=v1.9.6&sv=v0.0.1 indicates a query string, and the browser will therefore think it is a new path from, say, ?v=v1.9.6&sv=v0.0.1. Thus causing it to load from file, not from cache. As you want.
And, the browser will assume that the source will stay the same next time you call ?v=v1.9.6&sv=v0.0.1 and should cache it with that string. So it will remain cached, however your server is set up, until you move to ?v=v1.9.7&sv=v0.0.2 or so on.
& is unrecognized char use its html code "&amp" to avoid errors as mentioned in comment

How to create a submatch for this expression?

I am running a regular expression against the DOM to return back an account status from a page.
This is the string on the page:
<h3>Status</h3><p>Completed</p>
And this is the Expression I'm currently using
<h3>Status</h3>[\s\S]*?<p>([\s\S]*?)</p>
My goal is to only get the Status of "Completed" from this string but not sure on how to do this. I have read a little on submatching; just not sure how to implement it.
re.match() returns an array containing the sub-matches for each capture group. So use:
var re = new RegExp('<h3>Status</h3>[\s\S]*?<p>([\s\S]*?)</p>');
var match = re.match(str);
var submatch = match[1];
This will work: /<h3>Status<\/h3>[\s\S]*<[^>]*>([^<]+)<.*/
See it working here: http://jsfiddle.net/M7kJ7/
But seriously... use DOM functions for that! Why a regex?
EDIT: Example of how you could solve it using DOM functions: http://jsfiddle.net/DycGh/
EDIT2: OK, after reading all the comments, I came to the conclusion that you do have valid reason to not access directly the database (you can't! they don't give you access to it)
And you can't use native DOM functions (you are not executing js directly on each page, but instead one central page is going to be used for searching the other pages)
,
However, I still don't think browser-side javascript is the correct path.
Using either server-side javascript (node.js), or some other language, like perl would be better. And using DOM, by means of a parser, is correct too.
If you choose with the node.js path, you can use node-htmlparser. From your node app you'll open each url, get the data using the parser's functions and then construct a json output. Your page will make an ajax request to node, and get its json results, which you will use to create the output.
If you go for perl, you can use HTML::DOM. The rest of the procedure would be similar.
It doesn't has to be perl or node.js, is just the options I know. With php, python or ruby you can do it too. (but you'll have to google for parsers)
But is best if you do it with a server-side script.

Run time error handling on lazy loaded javascript?

Does anyone have any ideas how to do error handling on lazy loaded javascript? I am using an approach in which an ajax request is called and the code is eval'd in global scope. When a runtime error is struck, it spits out the filename as my lazy loading script and the line number is the error line plus the line number of my eval in my loading script. This wouldn't be so bad except all the javascript files get combined into modules for sections of the site. A try catch around the javascript file itself wont catch runtime errors of the functions. Any ideas? Window.onerror doesn't provide the correct filename so it is out of the question. I need to catch it before it is hit.
I was thinking maybe I could programmatically include try catches around all the functions within the eval'd code (which is ugly), but since it is done at the window level I am not sure how to access the eval'd code specifically and dynamically. Sure if the javascript is an object named "Bob" I can access window.Bob but I need to do it dynamically.
I solved the issue, however it is not the most elegant solution. Essentially what I do is this:
1. After the site loads I look at all the objects that are in window and push them into an array. This basically says to my code, ignore these objects.
When I modularize my code I keep track of the length of the files and fileNames being place into a module.
The last line of the modulizer takes the fileLength array and lineLengths and calls a function in my error handling object;
The error handling code finds new objects in window. If they exist, set a property to match fileLengths and fileNames;
Recurse through the new objects and add decorate the functions to have try catches around them.
When one of those catches is hit, traverse upward and find the properties.
Calculate the file and line number based on the properties.
Output the new error based on the correct file and line number;
Yes ugly... but it works.

Getting URL of executing JavaScript file (IE6-7 problem mostly)

Hey all, I've been trying to throw together a generic function that retrieves the absolute URL of an executing JavaScript file on a web page:
http://gist.github.com/433486
Basically you get to call something like this:
getScriptName(function(url) {
console.log(url);
// http://www.example.com/myExternalJsFile.js
});
inside an external JavaScript file on a page and can then do something with it (like find the <script> tag that loaded it for example).
It works great in almost all the browsers I've tested (Firefox, Chrome, Safari, Opera v10 at least, and IE 8).
It seems to fail, however, in IE 6 and 7. The callback function gets executed, but the retrieved name is the URL to the main HTML page, not the JavaScript file. Continuing with the example, getScriptName invokes the callback with the parameter: http://www.example.com/index.html
So all I'm really asking is if there's some other way of getting the URL of the current JavaScript file (which could be IE 6 and 7 specific hackery)? Thanks in advance!
EDIT: Also, this won't work in every case, so please don't recommend it:
var scripts = document.getElementsByTagName("script");
return scripts[scripts.length-1].src;
I'd like it to work in the case of dynamically created script tags (possibly not placed last in the page), aka lazy-loading.
A lot of this depends on what you have access to. If, as it appears, you are trying to do this entirely within the JS code, I do not believe that you are able to do it, for some of the reasons shown above. You could get 90% of the way maybe, but not be definitive.
If you are working in a dotnet environment ( which is the only one I know ), I would suggest the use of a module that would intercept all JS requests and add into them the request location, or something of that nature.
I think you need to address this from the server side, not the client side. I do not think you will have a definitive answer form the client side. I think you will also struggle to get an answer from the server side, but you might be more successfull.
Sorry, I suspect you might struggle with this. IE earlier than version 8 typically gives error messages from javascript errors of the form:
line: 342
char: 3
error: expected identifier, string or number
code: 0
url: http://example.com/path/to/resource
where the url is the window.location.href, rather than the URL of the external Javascript resource that contains the problem. I suggest that IE gives the unhelpful URL value since the script URL isn't available to IE at that point, and neither is it available to any Javascript you might write to try to display it.
I would love to be able to link to IE8 release notes which say this bug / feature has been fixed, hence the reason I created this as community wiki. My MSDN foo is pretty weak!

How can I execute javascript in Bash?

I try to get to a page straight from Bash at http://www.ocwconsortium.org/. The page appears when you write mathematics to the field at the top right corner. I tested
open http://www.ocwconsortium.org/#mathematics
but it leads to the main page. It is clearly some javascript thing. How can I get the results straight from Bash on the first page?
[Clarification]
Let's take an example. I have the following lines for a Math search engine in .bashrc:
alias mathundergradsearch='/Users/user/bin/mathundergraduate'
Things in a separate file:
#!/bin/sh
q=$1
w=$2
e=$3
r=$4
t=$5
open "http://www.google.com/cse?cx=007883453237583604479%3A1qd7hky6khe&ie=UTF-8&q=$q+$w+$e+$r+$t&hl=en"
Now, I want something similar to the example. The difference is that the other site contains javascript or something that does not allow me to see the parameters. How could I know where to put the search parameters as I cannot see the details?
open "http://www.ocwconsortium.org/index.php?q=mathematics&option=com_coursefinder&uss=1&l=&s=&Itemid=166&b.x=0&b.y=0&b=search"
You need quotes because the URL contains characters the shell considers to be special.
The Links web browser more or less runs from the commandline (like lynx) and supports basic javascript.
Even though the title of the post sounds general, your question is very specific. It's unclear to me what you're trying to achieve in the end. Clearly you can access sites that rely heavily on javascript (else you wouldn't be able to post your question here), so I'm sure that you can open the mentioned site in a normal browser.
If you just want to execute javascript from the commandline (as the title suggests), it's easy if you're running bash via cygwin. You just call cscript.exe and provide a .js scriptname of what you wish to execute.
I didn't get anything handled by JavaScript - it just took me to
http://www.ocwconsortium.org/index.php?q=mathematics&option=com_coursefinder&uss=1&l=&s=&Itemid=166&b.x=0&b.y=0&b=search
Replacing mathematics (right after q=) should work. You may be able to strip out some of that query string, but I tried a couple of things and and it didn't play nice.
Don't forget to encode your query for URLs.
You will need to parse the response, find the URL that is being opened via JavaScript and then open that URL.
Check this out: http://www.phantomjs.org/.
PhantomJS it's a CLI tool that runs a real, fully-fledged Browser without the Chrome.

Categories

Resources