Python webscraping a form that uses Javascript to process request - javascript

I am trying to scrape the resulting page from this page:
http://data.philly.com/philly/property/
I am using 254 W Ashdale St as my trial entry, when I do that in my browser it directs me to the result I'm looking for in the HTML (same URL though).
Python requests is successfully putting the address I put in in the results page, but I am not able to get the owner information, which is what I am trying to scrape. I have been trying with Selenium and phantomjs, nothing I am doing is working.
I was also confused about the form action, it seemed to just be the same URL as the page the form is on.
I appreciate any and all advice or help!

Selenium takes care of virtually everything, just find the elements, enter the information, find the button, click on it, then go to the owner, click on it and get scrap the info you need.
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://data.philly.com/philly/property/')
#enter the street address
driver.find_element_by_name('LOC').send_keys('254 W Ashdale St')
#click on the submit button
driver.find_element_by_name('sendForm').click()
#find the owner
owner_tag = driver.find_elements_by_tag_name('td')[2]
owner = driver.find_elements_by_tag_name('td')[2].text
print(owner)
#click on the owner
owner_tag.find_element_by_tag_name('a').click()
#get the table with the relevant info
rows = driver.find_element_by_tag_name('tbody').find_elements_by_tag_name('tr')
#get the row with the sale prices
sale_prices = list()
for row in rows:
sale_prices.append(row.find_elements_by_tag_name('td')[4].text)
print('\n'.join(sale_prices))
Output:
FIRSTNAME LASTNAME
$123,600.00
$346,100.00
[..]
$789,500.00

Related

Netsuite Customize NetSuite Transaction Summary

I'm trying to figure out how to change the Transaction Summary on the sales order UI.
I've been doing some research and I'm getting the impression I need to create a user event script(?)
For example: I want to add more details like Markup / Freight cost / etc.
I'm very new to this, but please let me know if I'm going in the right direction.
/**
* #NApiVersion 2.0
* #NScriptType UserEventScript
*/
define([], function() {
return {
afterSubmit : function (context) {
var salesorder = context.newRecord;
var total = salesorder.getValue('total');
var subtotal = salesorder.getValue('subtotal');
var tbl = document.getElementById("totallingtable");
log.debug('total', total);
log.debug('subtotal', subtotal);
}
};
});
NetSuite does not support access to the native UI through the DOM. You could do this on client side, but it will absolutely break the record. Therefore, I would recommend either:
Making an HTML field on the Sales Order record and writing HTML/CSS to mimic the Transaction Summary box. It would probably be best to place it in its own subtab; something like "Transaction Details", and write to it with a client script on pageInit().
Add a button to the Sales Order record on User Event beforeLoad(). Have that button open a Suitelet that essentially mimics a popup with HTML/CSS to mimic the Transaction Summary as stated in example 1.
Thinking about it, I think the second option is better.
If you need additional help, please comment back and I'll be happy to assist. I have no problem stepping through code, even if baby steps.
EDIT:
Based on the conversation below, the second option seems better for your business needs.
Before we dive into that, I did some research and believe you were trying to do something similar to this post. That approach may very well work, but is not something I would do, citing this excerpt from the NetSuite Help Documentation:
"SuiteScript does not support direct access to the NetSuite UI through the Document Object Model (DOM). You should only access the NetSuite UI by using SuiteScript APIs."
NetSuite Help Documentation: SuiteScript 2.0 Custom Pages
In addition, I'm adding a few notes to this post regarding the whole thing overall. I'll say this once here, and once at the end as a reminder. But remember, it carries for everywhere.
First, if you don't understand something, don't hesitate to ask questions. The only stupid question is the one you don't ask at any time, even if it's the i-th time you're asking as asking questions is a fundamental part of learning anything. Second, make sound coding decisions. There's no way for me to know if this code is still exactly what you need, especially since I don't know your business or the setup of your account. But, I'm sure going to try my best to help in the spirit of help.
With that said, for this example, do the following in order. Once you get the hang of it, you can work with the nuances as you see fit. I won't discuss any of the nuances here. I wish I could, but that's somewhat like explaining how to ride a bike. Here we go!
Make a sub-folder in your SuiteScripts folder called "Transaction_Details" (make sure to use the underscore). We're going to upload everything to here so it's all in one folder under the folder where we need it to be for scripts to run (that is the SuiteScripts folder). From the Administrator role that's:
Documents > Files > SuiteScripts > Add Folder
Create an HTML file that displays what you're looking for when opened on localhost. Name that file "Transaction_Details.html". NetSuite will default to this file name later, and we're going to use it a few times in our example. So, don't name the file something else. You may want to test the file in a few browsers, especially if you're supporting Internet Explorer. Since we're trying to mimic the Transaction Summary, let's make that our target. To me, it looks like a table with some CSS to make it stand out. Transaction summary is also a native NetSuite term, so we'll call this our "Transaction Details". This way, when someone says, "I don't see it in the Transaction Summary", you can say "Click the Transaction Details button which shows more details than the summary". We want to pass some numbers into this HTML which are the numbers that pertain to our "details". We don't really need syntax to do this, but it can help our readability if we, sort of, alert ourselves that whatever is supposed go in this place is marked in some way. So, we'll arbitrarily mark this as [VARIABLE_TO_REPLACE]. A sample file is below to get you started. When you have the file the way you like it, upload that to your "Transaction_Details" subfolder. From the Administrator role that's:
Documents > Files > SuiteScripts > Transaction_Details > Add File
Select Transaction_Details.html from your computer and set the folder to SuiteScripts : Transaction_Details
<!DOCTYPE html>
<html>
<head>
<title>Transaction Details</title>
<style>
<!--Any CSS you might need; background color, bold, etc.-->
</style>
</head>
<body>
<table>
<tr>
<td>Gross</td>
<td>[GROSS]</td>
</tr>
<tr>
<td>Discount</td>
<td>[DISCOUNT]</td>
</tr>
<tr>
<td>Variable To Replace</td>
<td>[VARIABLE_TO_REPLACE]</td>
</tr>
<!--Rinse and repeat-->
</table>
</body>
</html>
Create a Saved Search with whatever information you need. Let's title the Saved Search "Transaction Details" and give it an ID of "_transaction_details". You may need more than one Saved Search to get everything out. But, in many, many, situations, you can get the data out in one Saved Search. Most times, this is done using SQL. Unfortunately, I don't know everything you'll need here, so I have to leave much of this step up to you; there's a lot of nuances here. However, one thing you should consider is using a filter to filter results which only pertain to a record's Internal ID. This will ensure you're only searching for information on a record. In the Saved Search, you can pick any Internal ID of a relevant record(a Sales Order, most likely) and test on that. Once your results are how you like them, remove the Internal ID filter, as we will dynamically push this as an object in our Suitelet (coming up in step 4). Eventually, you'll need to provide access to the Saved Search. Access is controlled on the Saved Search itself using things like the Audience subtab, and the "Public" checkbox. You will know which access is best for your business. But, note that for anyone to see the Transaction Details we are building they will at least need access to the search. Access permissions is a whole other thing with NetSuite; nuances. To create a saved search from the Administrator role that's:
List > Search > Saved Searches > New > Transaction
Warning: this is a big step, but it really is all one step. Create a Suitelet that loads our HTML file and replaces our placeholder text with the proper text for that placeholder. Call this file "Transaction_Details_Suitelet.js". We're going to replace the text by running the Saved Search we created with an Internal ID filter that points to our transaction and filters the data, making it easier to extract. Now, I'm making an assumption here that the data we need came out in only one row of results. If there is more than one row, that is fine, but you'll have to do your own formatting, or give me a screenshot of your results so I can edit my answer again. An example is below to get you started. This is going to take some configuration, so it's best to get something into NetSuite that at least passes it's code inspection (which is done as you try to upload the file). Once it's uploaded, you can open the URL to the Script from its Script Deployment, which if error-free will render the HTML we've written in step 2. If it doesn't and you're getting frustrated, just upload the script as seen in example 1 below. All example 1 is saying is, "render my HTML", on the condition that your HTML is properly formatted. If that works, you know "Transaction_Details.html" is good, so you can move onto example 2. If example 2 breaks, it is likely "Transaction_Details_Suitelet.js" that is the problem. To upload the Suitelet from the Administrator role that's:
Customization > Scripting > Scripts > New > Select Transaction_Details_Suitelet.js from your computer
Set the folder to SuiteScripts : Transaction_Details
Some error checking will now occur.
If it passes, title it "Transaction Details Suitelet", and give it an ID of "_transaction_details_sl" (sl is short for Suitelet)
Click the Deployments subtab, and title the Deployment "Transaction Details Suitelet". Give it an id of "_transaction_details_sl".
So, the Script and its Deployment should mimic each other in their names. This will be important in our next step
Example 1:
/**
* #NScriptType Suitelet
* #NApiVersion 2.x
*/
define(["N/file"], function(file) {
function onRequest(context) {
if (context.request.method === "GET") {
var fileTransactionDetails = file.load({
id: "/SuiteScripts/Transaction_Details/Transaction_Details.html"
}).getContents();
//Tell NetSuite to take the HTML in our "Transaction_Details.html" file and render that. We'll navigate to the Suitelet which renders our HTML file in step 5.
context.response.write(fileTransactionDetails);
}
}
return {onRequest: onRequest}
})
Example 2
/**
* #NScriptType Suitelet
* #NApiVersion 2.x
*/
define(["N/file"], function(file) {
function onRequest(context) {
if (context.request.method === "GET") {
var fileTransactionDetails = file.load({
id: "/SuiteScripts/Transaction_Details/Transaction_Details.html"
}).getContents();
var searchTransactionDetails = search.load({
id: "customsearch_transaction_details"
});
searchTransactionDetails.filters.push(search.createFilter({
name: "internalid",
operator: search.Operator.EQUALTO,
value: context.request.params.id
}));
var resultsTransactionDetails = searchTransactionDetails.run();
resultsTransactionDetails.each(function(result) {
// getText, if you need the text representation
fileTransactionDetails = fileTransactionDetails.replace("[GROSS]", result.getText(resultsTransactionDetails.columns[0]);
// getValue, if you need something that is a date, a SQL formula result, or something "behind the scenes" like the Internal ID of an element in a list/record.
fileTransactionDetails = fileTransactionDetails.replace("[GROSS]", result.getValue(resultsTransactionDetails.columns[0]);
// I'm intentionally commenting this line out, but if you had more than one row to consider, you would return true, which tells the search, "go to the next row". For now, don't do that.
//return true;
});
//Tell NetSuite to take the HTML in our "Transaction_Details.html" file and render that. We'll navigate to the Suitelet which renders our HTML file in step 5.
context.response.write(fileTransactionDetails);
}
}
return {onRequest: onRequest}
})
Create a User Event Script that places a button, on a Transaction, on beforeLoad, in view mode only, that opens our Suitelet in step 4, where step 4 runs our search in step 3, replaces text in our HTML file from step 2, and renders the whole thing (sorry for the long sentence. but that's really the one sentence summary of the whole thing anyways). A very important note here: it is best to turn buttons on in view mode only so you can ensure all the data which you need is written to the database.
We'll need to upload the User Event script and tell the script to make the button appear on only the records selected in the User Event's Deployment tab. Let's assume its a Sales Order for now. From the Administrator role that's:
Customization > Scripting > Scripts > New > Select Transaction_Details_User_Event.js from your computer and set the folder to SuiteScripts : Transaction_Details
Some error checking will now occur.
If it passes, title it "Transaction Details User Event", and give it an ID of "_transaction_details_ue" (ue is short for User Event).
Select the Deployments tab > Applies To > Sales Order and set the ID to "_so_transaction_details_ue" which is like saying "This is the Sales Order Deployment for Transaction Details on the User Event side".
/**
* #NScriptType UserEventScript
* #NApiVersion 2.x
*/
define(["N/url"], function(url) {
function beforeLoad(context) {
if (context.type === context.UserEventType.VIEW) {
// This gives us the link to our Suitelet.
// We can pass any URL parameters to our Suitelet with params.
// One param you will definitely need is id,
// which is the id of the current record you are opening
// the transaction details from, and is used in our
// Saved Search in step 3.
// Recall that we need the id's of our Suitelet and it's Script
// Deployment to be exactly what we said above. NetSuite will prepend
// "customscript" and "customdeploy" to the id of the Suitelet and
// Deployment, respectively, so we need to do the same.
// Missing this step is a common reason why links won't open as
// expected.
// Lastly, although returnExternalUrl defaults to false, let's
// set it to false for peace of mind. We don't want to expose
// anything to the outside.
var urlTransactionDetails = url.resolveScript({
scriptId: "customscript_transaction_details_sl",
deploymentID: "customdeploy_transaction_details_sl",
params: {
id: context.newRecord.id
},
returnExternalUrl: false
});
// Because we're using at least 2.1, we can use `backticks` to
// interpolate urlTransactionDetails into window.open, which makes
// things much easier.
context.form.addButton({
id: "custpage_view_transaction_details,
label: "Transaction Details,
functionName: `window.open("${urlTransactionDetails}");`
});
}
}
return {beforeLoad: beforeLoad}
}
And that's it! Hopefully that gets you farther along. As I stated before, don't hesitate to ask questions. Also, practice good coding practices. Practice good anything for that matter. My intentions are to help you and others here. Unfortunately, this post will not solve everything, but I hope it will help you and others who stop by.
Let me know how it goes!

Webscraping javascript rendered pages by inspecting and going to network tab and checking to see the request which is getting the data

This is code I am using to get the reviews of Sephora webpage.
My problem is I want to get the reviews in the webpage https://www.sephora.com/product/crybaby-coconut-oil-shine-serum- P439093?skuId=2122083&icid2=just%20arrived:p439093
I want to inspect the webpage ,go to network tab to check through all the request and find which url is returning my data.
I am not able to find the url which is returning the reviews.
from selenium import webdriver
chrome_path = (r"C:/Users/Connectm/Downloads/chromedriver.exe")
driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(20)
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-
serum- P439093?skuId=2122083&icid2=just%20arrived:p439093")
reviews = driver.find_element_by_xpath('//*[#id="ratings-
reviews"]/div[4]/div[2]/div[2]/div[1]/div[3][#data-comp()='Elipsis Box'])
print(reviews.text)
In the string you pass to driver.get() you have a white space that should not be there...
The correct string should be:
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")

Python Web Scraper - limited results per page defined by page JavaScript

I'm having trouble getting the full results of searches on this website:
https://www.gasbuddy.com/home?search=67401&fuel=1
This link is one of the search results I'm having trouble with. The problem is that it only displays the first 10 results (I know, that's a common issue that has been described in multiple threads on stackoverflow - but the solutions found elsewhere haven't worked here.)
The page's html seems to be generated by a javascript function, which doesn't embed all of the results into the page. I've tried using a function to access the link provided in the "More [...] Gas Prices" button, but that doesn't yield the full results either.
Is there a way to access this full list, or am I out of luck?
Here's the Python I'm using to get the information:
# Gets the prices from gasbuddy based on the zip code.
def get_prices(zip_code, store):
search = zip_code
# Establishes the search params to be passed to the website.
params ={'search': search, 'fuel': 1}
# Contacts website and make the search.
r = requests.get('https://www.gasbuddy.com/home', params=params, cookies={'DISPLAYNUM': '100000000'})
# Turn the results of the above into Beautiful Soup object.
soup = BeautifulSoup(r.text, 'html.parser')
# Searches out the div that contains the gas station information.
results = soup.findAll('div', {'class': 'styles__stationListItem___xKFP_'})
Use selenium. It's a little bit of work to set up, but it sounds like it's what you need.
Here I used it to click on a website's "show more" button. See more at my exact project.
from selenium import webdriver
url = 'https://www.gofundme.com/discover'
driver = webdriver.Chrome('C:/webdriver/chromedriver.exe')
driver.get(url)
for elem in driver.find_elements_by_link_text('Show all categories'):
try:
elem.click()
print('Succesful click')
except:
print('Unsuccesful click')
source = driver.page_source
driver.close()
So basically you need to find the name of the element you need to click to show more info, or you need to use a webdriver to scroll down the webpage.

creating a configuration page and passing variables to a simply.js app

i developed a simply.js app that fetches bus arrival time from a webservice, problem is that as of now it work only for one stop.
i want to create a configuration page with a multiselect where i could choose multiple stops , sending them to the pebble as an array and at the press of up/down buttons i want to cycle the array to show different bus stops.
Im not good in C, i prefere javascript thats because i used simply.js.
id like to know and learn how to do it, because i think online there isnt much documentation and examples.
Found a similar question/ issue at simply.js github page https://github.com/Meiguro/simplyjs/issues/11. The code example below comes from Meiguros first answer. The code sends the user to your configuration website, which you should configure to send json back.
You can probably copy the code example for enabling the configuration window and paste it in the begining of your main pebble app.js file. Do not forget to add "capabilities": [ "configurable" ], in your appinfo.json file. If you are using cloudpebble you should go to the settings page of your app and make sure the configurable box is checked.
var initialized = false;
Pebble.addEventListener("ready", function() {
console.log("ready called!");
initialized = true;
});
Pebble.addEventListener("showConfiguration", function() {
console.log("showing configuration");
//change this url to yours
Pebble.openURL('http://assets.getpebble.com.s3-website-us-east-1.amazonaws.com/pebble-js/configurable.html');
});
Pebble.addEventListener("webviewclosed", function(e) {
console.log("configuration closed");
// webview closed
var options = JSON.parse(decodeURIComponent(e.response));
console.log("Options = " + JSON.stringify(options));
});
(https:// github.com/pebble-hacks/js-configure-demo/blob/master/src/js/pebble-js-app.js - remove space after https://)
To then push the settings back to the pebble i think you need to add
Pebble.sendAppMessage(options);
just before
console.log("configuration closed");
// webview closed
I found this out at the last post on this pebble forum thread http://forums.getpebble.com/discussion/12854/appmessage-inbox-handlers-not-getting-triggered-by-javascript-configuration-data
You can aslo find a configuration website example named configurable.html in the same git as the code example at https:// github.com/pebble-hacks/js-configure-demo remove space after https://
Hope this helps a bit on the way to achieving your goal
So the configuration page is a web page, and you can host it and provide your URL as mentioned by Ankan above.
Like this:
Pebble.openURL('http://assets.getpebble.com.s3-website-us-east-1.amazonaws.com/pebble-js/configurable.html');
Lets say you decide to take the name and age of the user in the configuration page, you would have two text fields for them to enter their information, then you would have a submit button. For the submit button write a javascript function which uses jQuery to take the values of the text fields onclick, then save those values to a variable, and use JSON to send them to the phone. Here is an example of a fully created configuration web page: https://github.com/pebble-hacks/js-configure-demo
Enjoy.

Form filling and pressing javascript buttons in python

I want to automate form filling on a website with information of certain parameters that will return products based on the parameters I enter. I tried using mechanize in python but it does not support javascript and it seems like in order to navigate the entire process of filling in parameters requires pressing some buttons that seem like javascript objects. For instance the Guided Selection button:
<a onclick="_gaq.push(['_trackEvent', 'Navigation Menu', 'Click', 'Guided Selection Link']);" id="ctl00_NavigationMenu_ConfigureLink" href="javascript:__doPostBack('ctl00$NavigationMenu$ConfigureLink','')">Guided Selection</a></li>
I also tried using selenium but I do not want to create a new instance of a browser. Any python based suggestions? Perhaps jython? Thanks a lot!
My recommendation would be to simply capture the form information using a tool like firebug. And then "replay" the request (whether it's GET or POST) through Python. Here is a snippet for a POST request.
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
In fire bug you can get the url and values from under the console tab.
This probably won't work if you are doing automated form discovery.

Categories

Resources