Does Cypress decode all text elements? - javascript

I have a question about decoding and Cypress.
On a test website, it is possible for a user to sort a table by a specific column via a URL: https://www.example.com/Search?text=blabla&maxrowcount=20&orderby=abc+def. The sort column is called "abc def" and consists of two words. The URL then says "abc+def". After the user has called the URL via GET, the URL https://www.example.com/Search?text=blabla&maxrowcount=100&orderby=abc%20def is embedded on the new page. maxrowcount is now 100.
When I test this with Cypress, the test will pass
let orderby='abc def'
cy.get("#new-link-maxrowcount-100").invoke('attr', 'href').then(href => {
const urlParams = new URLSearchParams(href);
cy.expect(urlParams.get('orderby')).to.equal(orderby);
});
With orderby='abc%20def' the test fails.
Is it correct that Cypress always decodes the text and then tests it?
P.S.
The web page was created just because of this question. It will never go online.

Related

Recursive Facebook Page Webscraper with Selenium & Node.js

What I try to do is to loop through an array of Facebook page IDs and to return the code from each event page. Unfortunately, I only get the code of the last page ID in the array but as many times as elements are in the array. E.g. when I have 3 ID's in the array I get 3 times the code of the last page ID.
I already experimented with async await but I had no success.
The expected outcome would be the code of each page.
Thank you for any help and examples.
//Looping through pages
pages.forEach(
function(page) {
//Creating URL
let url = "https://mbasic.facebook.com/"+page+"?v=events";
//Getting URL
driver.get(url).then(
function() {
//Page loaded
driver.getPageSource().then(function(result) {
console.log(result);
});
}
);
}
);
you faced the same issue i did when i created a scraper using python and selenium. Facebook has countermeasure on manual URL change, you cannot change it , i receive the same data again and again even though it was automated. in order to get a good result you need to have access of face books Graph API which provides a complete object of Facebook page with its pagination URL.
or the second way i got it write was i used on click button of selenium browser automation to scroll down the next page.it wont work like you are typing , i prefer the usage of graph API

Webscraping javascript rendered pages by inspecting and going to network tab and checking to see the request which is getting the data

This is code I am using to get the reviews of Sephora webpage.
My problem is I want to get the reviews in the webpage https://www.sephora.com/product/crybaby-coconut-oil-shine-serum- P439093?skuId=2122083&icid2=just%20arrived:p439093
I want to inspect the webpage ,go to network tab to check through all the request and find which url is returning my data.
I am not able to find the url which is returning the reviews.
from selenium import webdriver
chrome_path = (r"C:/Users/Connectm/Downloads/chromedriver.exe")
driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(20)
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-
serum- P439093?skuId=2122083&icid2=just%20arrived:p439093")
reviews = driver.find_element_by_xpath('//*[#id="ratings-
reviews"]/div[4]/div[2]/div[2]/div[1]/div[3][#data-comp()='Elipsis Box'])
print(reviews.text)
In the string you pass to driver.get() you have a white space that should not be there...
The correct string should be:
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")

Python Web Scraper - limited results per page defined by page JavaScript

I'm having trouble getting the full results of searches on this website:
https://www.gasbuddy.com/home?search=67401&fuel=1
This link is one of the search results I'm having trouble with. The problem is that it only displays the first 10 results (I know, that's a common issue that has been described in multiple threads on stackoverflow - but the solutions found elsewhere haven't worked here.)
The page's html seems to be generated by a javascript function, which doesn't embed all of the results into the page. I've tried using a function to access the link provided in the "More [...] Gas Prices" button, but that doesn't yield the full results either.
Is there a way to access this full list, or am I out of luck?
Here's the Python I'm using to get the information:
# Gets the prices from gasbuddy based on the zip code.
def get_prices(zip_code, store):
search = zip_code
# Establishes the search params to be passed to the website.
params ={'search': search, 'fuel': 1}
# Contacts website and make the search.
r = requests.get('https://www.gasbuddy.com/home', params=params, cookies={'DISPLAYNUM': '100000000'})
# Turn the results of the above into Beautiful Soup object.
soup = BeautifulSoup(r.text, 'html.parser')
# Searches out the div that contains the gas station information.
results = soup.findAll('div', {'class': 'styles__stationListItem___xKFP_'})
Use selenium. It's a little bit of work to set up, but it sounds like it's what you need.
Here I used it to click on a website's "show more" button. See more at my exact project.
from selenium import webdriver
url = 'https://www.gofundme.com/discover'
driver = webdriver.Chrome('C:/webdriver/chromedriver.exe')
driver.get(url)
for elem in driver.find_elements_by_link_text('Show all categories'):
try:
elem.click()
print('Succesful click')
except:
print('Unsuccesful click')
source = driver.page_source
driver.close()
So basically you need to find the name of the element you need to click to show more info, or you need to use a webdriver to scroll down the webpage.

Python webscraping a form that uses Javascript to process request

I am trying to scrape the resulting page from this page:
http://data.philly.com/philly/property/
I am using 254 W Ashdale St as my trial entry, when I do that in my browser it directs me to the result I'm looking for in the HTML (same URL though).
Python requests is successfully putting the address I put in in the results page, but I am not able to get the owner information, which is what I am trying to scrape. I have been trying with Selenium and phantomjs, nothing I am doing is working.
I was also confused about the form action, it seemed to just be the same URL as the page the form is on.
I appreciate any and all advice or help!
Selenium takes care of virtually everything, just find the elements, enter the information, find the button, click on it, then go to the owner, click on it and get scrap the info you need.
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://data.philly.com/philly/property/')
#enter the street address
driver.find_element_by_name('LOC').send_keys('254 W Ashdale St')
#click on the submit button
driver.find_element_by_name('sendForm').click()
#find the owner
owner_tag = driver.find_elements_by_tag_name('td')[2]
owner = driver.find_elements_by_tag_name('td')[2].text
print(owner)
#click on the owner
owner_tag.find_element_by_tag_name('a').click()
#get the table with the relevant info
rows = driver.find_element_by_tag_name('tbody').find_elements_by_tag_name('tr')
#get the row with the sale prices
sale_prices = list()
for row in rows:
sale_prices.append(row.find_elements_by_tag_name('td')[4].text)
print('\n'.join(sale_prices))
Output:
FIRSTNAME LASTNAME
$123,600.00
$346,100.00
[..]
$789,500.00

Change URL data on page load

Hello I have a small website where data is passed between pages over URL.
My question is can someone break into it and make it pass the same data always?
For example let say, when you click button one, page below is loaded.
example.com?clicked=5
Then at that page I take value 5 and get some more data from user through a form. Then pass all the data to a third page. In this page data is entered to a database. While I observe collected data I saw some unusual combinations of records. How can I verify this?
yes. as javascript is open on the website, everyone can hack it.
you will need to write some code on you backend to validade it.
always think that you user/costumer will try to hack you sytem.
so take precautions like, check if user is the user of the session, if he is logged, if he can do what he is trying to do. check if the record that he is trying get exists.
if u are using a stand alone site, that u made the entire code from the ashes, you will need to implement this things by yourself.
like using the standard php session, making the data validation etc.
or you can find some classes that other people have made, you can find a lot o this on google. as it is a common problem of web programing.
if u are using a backed framework that isnt from another world, probably already has one. sp, go check its documentation.
html:
<a id = 'button-one' name = '5'> Button One </a>
javascript:
window.onload = function() {
document.getElementById('button-one').onclick = function() {
changeURL(this.attributes.name.value);
};
};
function changeURL(data) {
location.hash = data;
}

Categories

Resources