How to browse over a page using PhantomJS and Selenium

How to browse over a page using PhantomJS and Selenium - javascript

I got some DIV elements on a web page. Totally there are abound 30 DIV blocks of the following similar structure:
<div class="w-dyn-item">
<a href="/project/soft" class="jobs-wrapper no-line w-inline-block w-clearfix">
<div class="jobs-client">
<img data-qazy="true" src="https://global.com/test.jpg" alt="Soft" class="image-9">
<div style="background-color:#cd7f32" class="job-time">Level 1</div>
</div>
<div class="jobs-content w-clearfix">
<div class="w-clearfix">
<div class="text-block-19 w-condition-invisible">PROMO</div>
<h3 class="job-title">Soft</h3>
<img height="30" data-qazy="true" src="https://global.com/test.jpg" alt="Soft" class="image-15 w-hidden-main w-hidden-medium w-hidden-small"></div>
<div class="div-block w-clearfix">
<div class="text-block-4">Italy</div>
<div class="text-block-4 w-hidden-small w-hidden-tiny">AMB</div>
<div class="text-block-4 w-hidden-small w-hidden-tiny">GTL</div>
<div class="text-block-13">January 10, 2017</div><div class="text-block-14">End date:</div></div><div class="space small"></div><p class="paragraph-3">Text text text</p></div>
</a>
</div>
I am trying to access a href and click on the link. However, the problem is that I cannot use find_element_by_link_text, because the link text does not exist. Is it possible to access a href by class class="jobs-wrapper no-line w-inline-block w-clearfix"? When I used find_element_by_class_name, I got the error Message: {"errorMessage":"Compound class names not permitted","request
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get("https://myurl.com/")
driver.find_element_by_link_text("//a href").click()
print driver.current_url
driver.quit()

The error you're getting is because Selenium's find_element_by_class_name does not support multiple classes.
Use a CSS selector with find_elements_by_css_selector instead:
driver.find_elements_by_css_selector('.jobs-wrapper.no-line.w-inline-block.w-clearfix')
Will select all tags with your wanted class, then you can iterate over them and use click() or any other wanted action
EDIT
Following your comment, new snippet to help you do what you want:
result = {}
urls = []
# 'elements' is a the list you previously obtained using the css selector
for element in elements:
urls.append(element.get_attribute('href'))
# Now you can iterate over all extracted hrefs:
for url in urls:
url_data = {}
driver.get(url)
field1 = driver.find_element_by_id('wanted_id_1')
url_data['field1'] = field1
field2 = driver.find_element_by_id('wanted_id_2')
url_data['field2'] = field2
result[url] = url_data
Now, result is a dictionary in a structure similar to what you wanted.
Note that field1 and field2 are of type WebElement so you'll probably need to do something with them first (extract attribute, text, etc).
Also, on personal note, Look into the requests together with BeautifulSoup, they might be a way better fit than Selenium for this or future similar cases.

If your only requirement is to click the a tag inside a tag with w-dyn-item class, then you could do it like this:
driver.find_element_by_class_name("w-dyn-item").find_element_by_tag_name("a").click()
To iterate over all tags with w-dyn-item class -> click the a inside them -> do something -> go back, do this:
tags = driver.find_elements_by_class_name("w-dyn-item")
for i in range(len(tags)):
tag = driver.find_elements_by_class_name("w-dyn-item")[i]
tag.find_element_by_tag_name("a").click()
# Do what you want inside the page...
driver.back()
The key here is of course to go back to the root page after you're done with the inner page.

To access and click the a href you can use the following line of code :
driver.find_element_by_xpath("//div[#class='w-dyn-item']/a[#href='/project/soft']").click()

Related

Using Tampermonkey to make an "ignore" button?

I have a script that goes as follows:
// ==UserScript==
// #name TempleOS Ignorer with added Braed Ignoring
// #include *://v3rmillion.net/*
// #grant none
// ==/UserScript==
const ignore = ['189822','695729', '1797404', '1439', '1290050', '941293', '1696676', '440792', '1391811', '114505']
new Array(...document.getElementsByClassName('author'))
.filter(author => ignore.includes(author.firstElementChild.getAttribute('href').match(/=([0-9]+)/)[1]))
.forEach(author => author.parentElement.parentElement.parentElement.remove())
new Array(...document.getElementsByClassName('author_information'))
.filter(author => ignore.includes(author.firstElementChild.firstElementChild.firstElementChild.getAttribute('href').match(/=([0-9]+)/)[1]))
.forEach(author => author.parentElement.parentElement.remove())
There's a div called post-head and I want to add a button to it that takes the user's ID and adds it to the ignore const. I'm fairly new to using Tampermonkey, most of this was done by someone else and I want to make it easier to add users to this ignore list. Something like this. (Very sloppily done, but you get the idea.)

So you can try taking a look at this script I came up with real quick, and then you can try putting it into a UserScript if you'd like. I tried to keep this extremely simple and basic. Just know there are plenty of ways to accomplish this, and this isn't exactly the standard in 2022. Otherwise, you can just copy this JS code into the Dev Tools Console tab on the message board, and it should work for you.
Here is the overall summary of what this is:
HTML is just basic message board HTML, the button is added next at the bottom of the post on the left side
Loops through each .post element on the page, using jQuery .each(), get the idof the post, and the id of the author.
There are probably tons of ways to get the author/post id from the markup on this message board. I simply chose to split() the URL using = and grab the ID at the array index of [2]. Getting the id of the post was just replacing the post_ from the id element on the .post element.
Left in the console.log() so it can be tested, output an alert() notification for button onclick
Appended the markup, which was created using String Literals to the .author-controls elements. I used an <a> element instead of a <button> because I didn't want to add any CSS to make it match.
const ignore = [];
$('.post').each(function() {
let US_pid = $(this).attr('id').replace('post_', '');
console.log('id=post_' + US_pid);
let US_author_id_array = $('.author_buttons > .postbit_find', this).attr('href').split('=')
let US_author_id = US_author_id_array[2];
console.log('author id: ' + US_author_id)
let US_ignore_btn_markup = `Ignore User`;
console.log(US_ignore_btn_markup);
$('.author_buttons', this).append(US_ignore_btn_markup);
})
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="post " style="" id="post_7747511">
<div class="post_head" style="padding-right:5px">
<div class="float_right" style="vertical-align: middle">
<strong><a class="postlink" href="showthread.php?pid=7747511#pid7747511" title=""></a></strong>
</div>
</div>
<div class="post_author">
</div>
<div class="post_content">
</div>
<div class="post_controls">
<div class="postbit_buttons author_buttons float_left">
<span>PM</span>
<span>Find</span>
</div>
<div class="postbit_buttons post_management_buttons float_right">
<span>Report</span>
</div>
</div>
</div>

Capturing the text of the closest span element with a specific class

I'm trying to fetch the text of a span that has a given class -> closest to the click via Google Tag Manager. Is it possible via plain JS or JQuery?
The code looks like this:
<a class="contenttile" href="/mypage" style="height: 193px;">
<div class="imageContainer" style="height: 97px;">
<img src="http://http:someadress.com/foto.jpg" class="blurr" alt="">
</div>
<div class="textContainer">
<span class="text3">My text</span>
<br>
</div>
</a>
What I want to return via a function is My text.
I was trying different snippets found here, but since im a JS lame I couldn't adjust it to work properly.
For example this one:
function(){
var ec = {{Click Element}};
var x = $(ec).closest('span');
return x.innerText;
}

since you are using jQuery
return ec.find('.text3').text();

In GTM, your click may register two events: gtm.click and gtm.linkClick. Depending on which one your tag is set to fire on (ie. you can set it to fire on all clicks or just links), then you could use either of the following:
If using just links, then $(ce).find('.textContainer').find('span').text()
If using all clicks, then $(ce).closest('span').text()

Protractor: Retrieve text from Element that shares its name with other elements

I have a <h3> tag that is inside a <div>, I'd like to be able to retrieve the text inside the <h3> tag, only problem is that the name of the <div> is used in other elements across the page. I know xpath is one way of doing this but I've been advised to try use a different method. Any suggestions?
This is the html
<div class="box-header">
<h3>User's Balance: EUR 45,173.80</h3>
</div>
This is my page object file:
checkReceivablesDue (receivablesDue) {
browser.sleep(8000);
var checkBalance = element.all(by.css('box-header> h3'));
checkBalance.getText().then(function (balance){
console.log(balance);
});
}
When I run the above, I get [] printed in the console, I've tried a few variations but can't get it to work.
Thanks

Using JQuery to replace text in HTML but ignoring certain HTML id's

So I'm currently using Handlebars to grab data from a JSON file to show its data on the screen. Right now it's looking similar to this:
<div class="content" id = "topic">
{{#each topics}}
<a href="{{topic}}" id = "ignore">
<h2>{{topic}}</h2>
</a>
{{/each}}
</div>
I want to replace specific characters within the word topic with another one, for example if the {{topic}} was "Hi%3F" I want to replace the "%3F" part with a '?' everywhere except the part with id="ignore". The replace function I'm using right now is:
$("#topic").html($("#topic").html().replace(/%3F/g,'?'));
this manages to replace everything so far, but I'm not sure how to get it to ignore the tags with the id="ignore". There's probably an easier way to make the link portion work like its supposed to but this is what I have gotten now and I don't want to mess around or change too much.
Thanks!

Is not legal html to duplicate Ids. Ids need to be unique. I'd suggest adding a class="unique" to your anchor tag and use
<div class="content" id = "topic">
<a href="Hi%3F" class = "topic">
<h2>Hi%3F</h2>
</a>
<a href="Hi%3F" class = "topic ignore">
<h2>Hi%3F</h2>
</a>
<a href="Hi%3F" class = "topic">
<h2>Hi%3F</h2>
</a>
<a href="Hi%3F" class = "topic ignore">
<h2>Hi%3F</h2>
</a>
</div>
<script>
$("a.topic").not(".ignore").each(function() {
$(this).html($(this).html().replace(/%3F/g,'?'));
});
</script>
http://jsbin.com/huquyufafe/edit?html,output

Google Tag Manager - CSS selector challenge

I am trying to get the URL of a link in the source code. The challenge is that the URL is hidden behind a image, and thus only letting me fetch the image-url.
I've been trying to figure a way to solve this issue by using the new CSS selector in the trigger system and also made a DOM variable that should get the URL when the image is clicked. There can also be multiple downloads.
Here is an example of what I am trying to achieve:
<div>
<div class="download">
<a href="example.com/The-URL-I-Want-to-get-if-top-image-is-clicked.pdf" target="_blank">
<img src="some-download-image.png"/></a>
<div class="download">
<a href="example.com/Another-URL-I-Want-to-get-if-middle-image-is-clicked.pdf" target="_blank">
<img src="some-download-image.png"/></a>
<div class="download">
<a href="example.com/Last-URL-I-Want-to-get-if-bottom-image-is-clicked.pdf" target="_blank">
<img src="some-download-image.png"/></a>
</div>
</div>
There are much code above and below this snippet, but with the selector it should be fairly easy to get the information I want. Only that I don't.
If anyone have met this wall and solved it, I really would like to know how. :)

This is one possible solution. So as I understand it, you would like to grab the anchor element's href attribute when you click the "download" image.
A Custom Javascript variable would need to be created so that you can manipulate the click element object:
function(){
var ec = {{Click Element}};
var href = $(ec).closest('a').attr('href');
return href;
}
So you will need to do your due diligence and add in your error checking and stuff, but basically this should return to you the href, and then you will need to parse the string to extract the portion that you need.

Develop Reference

JavaScript is the programming language of the Web.

How to browse over a page using PhantomJS and Selenium - javascript

To access and click the a href you can use the following line of code : driver.find_element_by_xpath("//div[#class='w-dyn-item']/a[#href='/project/soft']").click()

Related

Using Tampermonkey to make an "ignore" button?

Capturing the text of the closest span element with a specific class

Protractor: Retrieve text from Element that shares its name with other elements

Using JQuery to replace text in HTML but ignoring certain HTML id's

Google Tag Manager - CSS selector challenge

Categories

Resources