How do I get all hrefs(links) that reside in anchor tags with JavaScript code using selenium python? Those links are dynamically changed every time.
this is the tag i have used to click:enter image description here
i got all the anchor tags inside that webpage, but for above anchor tag i got output like this
javascript:selectItem('/7000/7020.aspx?reqID=' + ContentPH_hidReqID.getValue() + '&wf=0' + addUrlText() + '#ContentPH_tabReq:ContentPH_pnlCandidates')
Anyone can help me out how to get those type of links on webpage using selenium python.
Thanks in advance
himabindu y
You want to use this to find all anchor link elements on the page that have href
driver.find_elements_by_css_selector("a[href]");
Related
I am trying to get the value of the href attribute of an anchor element from a web page using a self-made Python script. However, all of the contents of the div element inside which the anchor element sits are received by the web page by using AJAX jQuery calls when the web page initially loads. The div element contains about 90% of the web page's content. How can I get the contents of the div element and then the value of the href attribute of the anchor element?
Later, after I get the value of the 'href' attribute, I want to get the contents of the web page that the link points to. But unfortunately, that call is also made with AJAX (jQuery). When I click on this in the web browser, the address of the web page does not change in the address bar, which means that the contents of the web page that is received is loaded into the same web page (inside the above mentioned div) element.
After I get this, I will be using BeautifulSoup to parse the web page. So, how will I be able to do this with Python? What sort of modules do I need to use? And what is the general pseudo-code required?
By the way, the anchor element has an onclick event handler that triggers the corresponding jQuery function that loads the contents into the div element inside the web page.
Moreover, the anchor element is not associated with an id, if its needed for the solution.
You'd want to use a headless web browser. Take a look at Ghost.py or phantompy.
I just realized that phantompy is no longer being actively developed, so here's an example with Ghost.py.
I created an HTML page which is blank. Some JavaScript adds a couple links to a div.
<html>
<body>
<div id="links">
<!-- Links go here -->
</div>
</body>
<script type="text/javascript">
var div = document.getElementById('links');
var link = document.createElement('a');
link.innerHTML = 'DuckDuckGo';
link.setAttribute('href', 'http://duckduckgo.com');
div.appendChild(link);
</script>
</html>
So if you were to scrape the page right now with Beautiful Soup using something like soup.find_all('a') you wouldn't get an links, because there aren't any.
But we can use a headless browser to render the content for us.
>>> from ghost import Ghost
>>> from bs4 import BeautifulSoup
>>>
>>> ghost = Ghost()
>>>
>>> ghost.open('http://localhost:8000')
>>>
>>> soup = BeautifulSoup(ghost.content)
>>> soup.find_all('a')
[DuckDuckGo]
If you have to do something like clicking a link to change the content on the page, you could also do this. Check out the Sample use case on the project's website.
I'm not sure if this will be possible, but is there any way to extract the dynamic link under this webpage: http://www.mobileonline.tv/channel.php?n=69111 the dynamic link is named "Link 1 (HLS): not compatible with all channels" thanks
I suppose, besides xss, the only easy way is by
going to that page,
right clicking on that link,
click on inspect element.
The anchor tag should be highlighted for you in the developer tool. You can find the url in the href attribute of the anchor tag.
Can I insert script tag below any div tag in html using this:
$("<script src='//ajax.googleapis.com/ajax/libs/jquery/1.10.1/jquery.min.js'></script>").html("").insertAfter($('#sidebar'));
I tried it but seems not working, not appearing in the page source.
You have to remove the .html("") and split the script tag otherwise it causes the browser to think you are closing the script tag you are working in. JSfiddle
$("<script src='//ajax.googleapis.com/ajax/libs/jquery/1.10.1/jquery.min.js'></scr" + "ipt>").insertAfter($('#sidebar'));
** Solution requires the both the suggestions by #HerrSerker and #AlexK. to get it to work properly
I am trying to do one thing. I have a external link on my test page and I want to get the title tag of that external link page that will be like <title>Desired Title Tag</title> in a variable on my test page through pure JavaScript only. Can I do it? Do you have the code? Can you do it?
I am using CKeditor to allow users to add images to their textboxes in a CMS.
A possible scenario is this: I develop a new site for a customer at http://developer.com/customer/a. The base url is "/customer/a". But when I ship the finished site to their domain www.customer-a.com, base url is changed to "/" and all image links are broken.
I would like to CKEditor to save something like {base_url}/media/my-image.jpg, but still keep all the WYSIWYG-features of CKeditor. Is there a hook or event in CKeditor where I could replace for e g {base_url} before the html i viewed?
I would appreciate any hints.
The hard way would be to use CKEditor's html parser and traverse whole html text when its loaded into editor and check/correct url of img tags.
Second option, although im not sure if it can be applied on your case, would be to make all images dependant on CKEDITOR.basePath and determine just that when CKEDITOR is initializing.
Or just develop on http://developer.com/customer/a, but let images be placed on www.customer-a.com even for development :)