I want to fetch this site
https://www.film-fish.com/modern-mindless-action
to fetch the IMDB IDs of all movies listed there.
The problem is that the page loads all movies listed there just after scrolling down. So, a simple wget doesn't work.
Even if I scroll to the bottom of the page and view the source code, I do not see the last movie in the list (Hard Kill (2020)).
So the problem seems to be that the content is being created via JavaScript.
Has anybody a tip on how to achieve that?
So the problem seems to be that the content is being created via a js
script. Has anybody a tip on how to achieve that?
Indeed, executing JavaScript code is beyond scope of GNU Wget. You would need browser automation tool. If you know some Node.js or JavaScript I suggest taking look at PhantomJS Quick Start, Page Automation. Please take look at first example in 2nd link, you should be probably able to rework to your needs, i.e. instruct page to scroll down using JavaScript then extract what you need using JavaScript.
I am trying to scrape website data and I can look for a tags that I want to explore.
However, this tags are like <a title="Annex" href="https://www.myite.com/d…-3.pdf?sfvrsn=b3a84558_2" sfref="[documents|librariesProv…-4ef8-8b90-ae20b6b7590d">
getting a.href just returns https://www.myite.com/d…-3.pdf?sfvrsn=b3a84558_2 and that results in 404 page.
However, when I click on the tag on web page it opens pdf - the url is slightly modified.
How to handle these types of links in javascript. I am using js fetch similar to this post.
Thanks in advance.
sfref is an attribute that is used to resolve dynamic links. For example, when the a tag is cliked, the sfref attribute value is used to build href on the fly.
I assume you do not have the access to the logic used for link resolution.
The best thing you can do here is to tuse Chrome Dev Tools -> Event Listeners and find out the event where the link resolution happens.
Please keep in mind that different a tags might have a different resolution logic attached to them.
I am trying to automatically download a plugin on my wordpress site by implementing phantomJs. For some reason, I cannot seem access the download button (shown below)
This is the HTML code to the image (with domain sensitive information blurred out for security purposes)
So far, I have tried accessing this element by using the following code:
function() {
page.evaluate(function() {
let mainLink = document.querySelector('a[data-slug="better-wp-security"]')
mainLink.click()
})
}
Some things to mention:
This function, as it is part of a larger file, will NOT execute until the page has finished loading.
PhantomJS is executing correctly, there are no problems with permissions
The script before-hand is properly accessing the install plugins page, which I verified by capturing screenshots before trying to click.
I have defined click earlier int he file, it works perfectly.
Any ideas how I can accomplish this? Thanks all!
ADDED INFORMATION:
It seems as if the path from the main div element is as follows:
#the-list .plugin-card plugin-card-better-wp-security .plugin-card-top .action-links .plugin-action-buttons .install-now button
I imagine the solution to this question has something to do with this sequence.
I was able to accomplish this by now going after the data-slug attribute, but rather going after the href element itself. Although I can't generate my own wponce value without the use of the Rest API, I was able to search the document to find an href that contained certain parts of the url. This is the final code below:
document.querySelector('a[href*="action=install-plugin&plugin=better-wp-security"]').click()
That's it! Simple and easy!
The website i'm trying to link to is pretty much a text document (see below), i'm trying to link to the last line preferable, highlighting it would be ideal but a link to the end of the page would work.
I've tried various code snippets, but as i have no access to the code of the page i cannot create anchor in the target page and link directly to that.
if i can get the following code to run on the page once i have navigated to it, i believe that would solve the problem, but my JS knowledge does not extend that far
window.onload=toBottom;
function toBottom()
{
alert("Scrolling to bottom ...");
window.scrollTo(0, document.body.scrollHeight);
}
i am linking using the following code
`— Alan Turing `
http://www.loebner.net/Prizef/TuringArticle.html
I would deeplink and find an ID on the remote page and link directly to that so for example
www.loebner.net/Prizef/TuringArticle.html#ELEMENTID
However if the page does not have an element at the bottom with an ID then might be a problem, do you have access to that page to add an ID?
Cleanest solution was contacting the site administrator of the site and setting up a mirror on my server of the original file and adding an #ID to the element i wanted to deeplink to and linking to the #ID from within my webpage
href="<c:url value="loebner#ID"/>"
I have developed a small component which can be put in to any website. Now, I want to develop a code that could demonstrate how would my component look like on any website.
So, the person would come to my page and put in his URL and then my code should embed my custom JS/CSS in to the downloaded HTML and display it. Something like this.
Here, like the feedback tab, I want to show my component any where on that page.
Try a bookmarklet.
Create a piece of javascript that adds your code into the page such as the following:
javascript:(function(){var%20script=document.createElement('script');script.src='http://www.example.org/js/example.js';document.getElementsByTagName('head')[0].appendChild(script);})()
Add it as the href of a link like so:
Link Text Here
Tell your users to drag the link to their bookmark toolbar and click on it on different websites to try your code out.
Some examples: http://www.reclaimprivacy.org/, http://www.readability.com/bookmarklets
In the example you linked, they are requesting the page specified in the url querystring parameter on the server, and then doing more or less the following steps:
In the <head> tag they are adding a <base href="url" /> tag to the document. The base tag will make any relative links in the document treat the value in the href attribute as their root. This is how they are getting around broken css / images. (The base tag is supported by all browsers)
At the end of the document (IE the </body> tag) they are injecting the javascript that runs their demos.
They serve the modified HTML requested to the browser.
All of this is pretty straight forward in implementation. You could use regular expressions to match the <head> and </body> tags for steps 1 and 2 respectively. Depending on the server platform how you actually request the page will vary, but here are some links to get you started:
C# - HttpWebRequest object documentation
PHP - HttpRequest::send
Nathan's answer is the closest to how we have done the demo feature at WebEngage. To make such a demo functional, you'll need to create a Javascript widget that can be embedded on third party sites. syserr0r's answer on creating a bookmarklet is the simplest approach to do so. Our's is a JAVA backend and we use HttpClient to fetch the responses. As Nathan suggested, we parse the response, sanitize it and add our widget Javascript to the response. The widget JS code takes it on from there to render the Feedback tab and load a demo short survey.
Disclosure: I am a co-founder and ceo at WebEngage.
You can not do this with JQuery due to cross site scripting restrictions.
I suggest you write a PHP script that downloads the URL specified by the user and includes your widget code and then echo it back to the user.
I recommend using bookmarklets. I've made a bookmarklet generator for adding jQuery-enabled bookmarklets to a page to make development easier.
There's a caliper bookmarklet on the page that you can mess around with just to show an example of it working.
Full disclosure, this is something I've made, I'm not trying to be spammy as I think it's relevant: zbooks
You could make an iframe page, which loads their page in the iframe, and uses javascript to inject your code into the iframe.
Here is my approach...
http://jsfiddle.net/L2kEf/
html
<iframe src="http://www.bing.com"></iframe>
<div>I am div</div>
css
div { background: red; position: absolute; top: 20px; width: 100px; left:20px;}
iframe{width: 100%; height: 500px;}
you can add javascript/jquery too, so you could do something like,
jQuery //not 100% sure it would work coz of cross browser thingy, but you know, worth a try.
$('div').click(function (){
$('iframe').contents().html('changed');///
});
if this can't change any of the contents, you can display a dialog, to say it would normally work if it was in your website, then use #syserr0r approach for bookmarked users, for better results, since you are offering this kinda services, to developers, im sure they would know about bookmarking, my approach would be rarely used :) so hope it helps.
I had a problem of a similiar nature, and the main obstacle is the cross-domain policy.
You have to ask the user to put your code in a <script src="..."> or create a proxy solution that would add your code for them.
I went for the proxy and here are my observations:
it's easy to create a basic proxy in php - there are some php proxies on sourceforge and Ben Alman has created a simple php proxy for AJAX. Based on those I was able to create a php proxy altering the content properly in one day.
after that I spent a lot of time making it work with more and more sites with issues. You can never create a perfect proxy.
As an alternative (sa long as you are non-commercial) you can use http://www.jmarshall.com/tools/cgiproxy/ and put the site in an iframe and then do whatever you want to do with the iframes document, as it's in your domain thanks to the proxy. You can access iframeDOMnode.contentWindow.document then, etc.
You can create a Crossrider extension which your users can download.
Then simply add this to your App/Extension code:
appAPI.dom.addRemoteJS("http://yourdomain.com/file.js")
Your users can then download the extension (it works cross-browser for Internet Explorer, Chrome and Firefox) and it will load your JS code on every page load.
You can get an approximation of what it will look like using a iframe. Take a look at that link for an example.
http://jsfiddle.net/jzaun/5PjRy/
The issue with this appoch is that you can't move your DIV(s) when the page scrolls, they are in effect just floating over the iframe. There is no way around this as cross-domain scripting wont let you access the iframe's document to monitor scroll events.
The only other option you have for a better fitting example would be to load the page from the server side in whatever scripting language you are using and load that into the iframe (or into a div, etc.) and you can use javascript all you want as the page is coming from your domain.
For your example of what will your widget look like I imagine floating your DIV(s) over an iframe would give enough of an idea.
Please note the example you gave is using the server side method, not the iframe method.
I agree with the bookmarklet strategy.
I'm a fan of http://bookmarklets.heroku.com/, which lets you generate bookmarklets easily, inject jQuery, etc.