Python Web Scraper - limited results per page defined by page JavaScript

Python Web Scraper - limited results per page defined by page JavaScript - javascript

I'm having trouble getting the full results of searches on this website:
https://www.gasbuddy.com/home?search=67401&fuel=1
This link is one of the search results I'm having trouble with. The problem is that it only displays the first 10 results (I know, that's a common issue that has been described in multiple threads on stackoverflow - but the solutions found elsewhere haven't worked here.)
The page's html seems to be generated by a javascript function, which doesn't embed all of the results into the page. I've tried using a function to access the link provided in the "More [...] Gas Prices" button, but that doesn't yield the full results either.
Is there a way to access this full list, or am I out of luck?
Here's the Python I'm using to get the information:
# Gets the prices from gasbuddy based on the zip code.
def get_prices(zip_code, store):
search = zip_code
# Establishes the search params to be passed to the website.
params ={'search': search, 'fuel': 1}
# Contacts website and make the search.
r = requests.get('https://www.gasbuddy.com/home', params=params, cookies={'DISPLAYNUM': '100000000'})
# Turn the results of the above into Beautiful Soup object.
soup = BeautifulSoup(r.text, 'html.parser')
# Searches out the div that contains the gas station information.
results = soup.findAll('div', {'class': 'styles__stationListItem___xKFP_'})

Use selenium. It's a little bit of work to set up, but it sounds like it's what you need.
Here I used it to click on a website's "show more" button. See more at my exact project.
from selenium import webdriver
url = 'https://www.gofundme.com/discover'
driver = webdriver.Chrome('C:/webdriver/chromedriver.exe')
driver.get(url)
for elem in driver.find_elements_by_link_text('Show all categories'):
try:
elem.click()
print('Succesful click')
except:
print('Unsuccesful click')
source = driver.page_source
driver.close()
So basically you need to find the name of the element you need to click to show more info, or you need to use a webdriver to scroll down the webpage.

Related

Does Cypress decode all text elements?

I have a question about decoding and Cypress.
On a test website, it is possible for a user to sort a table by a specific column via a URL: https://www.example.com/Search?text=blabla&maxrowcount=20&orderby=abc+def. The sort column is called "abc def" and consists of two words. The URL then says "abc+def". After the user has called the URL via GET, the URL https://www.example.com/Search?text=blabla&maxrowcount=100&orderby=abc%20def is embedded on the new page. maxrowcount is now 100.
When I test this with Cypress, the test will pass
let orderby='abc def'
cy.get("#new-link-maxrowcount-100").invoke('attr', 'href').then(href => {
const urlParams = new URLSearchParams(href);
cy.expect(urlParams.get('orderby')).to.equal(orderby);
});
With orderby='abc%20def' the test fails.
Is it correct that Cypress always decodes the text and then tests it?
P.S.
The web page was created just because of this question. It will never go online.

Recursive Facebook Page Webscraper with Selenium & Node.js

What I try to do is to loop through an array of Facebook page IDs and to return the code from each event page. Unfortunately, I only get the code of the last page ID in the array but as many times as elements are in the array. E.g. when I have 3 ID's in the array I get 3 times the code of the last page ID.
I already experimented with async await but I had no success.
The expected outcome would be the code of each page.
Thank you for any help and examples.
//Looping through pages
pages.forEach(
function(page) {
//Creating URL
let url = "https://mbasic.facebook.com/"+page+"?v=events";
//Getting URL
driver.get(url).then(
function() {
//Page loaded
driver.getPageSource().then(function(result) {
console.log(result);
});
}
);
}
);

you faced the same issue i did when i created a scraper using python and selenium. Facebook has countermeasure on manual URL change, you cannot change it , i receive the same data again and again even though it was automated. in order to get a good result you need to have access of face books Graph API which provides a complete object of Facebook page with its pagination URL.
or the second way i got it write was i used on click button of selenium browser automation to scroll down the next page.it wont work like you are typing , i prefer the usage of graph API

Webscraping javascript rendered pages by inspecting and going to network tab and checking to see the request which is getting the data

This is code I am using to get the reviews of Sephora webpage.
My problem is I want to get the reviews in the webpage https://www.sephora.com/product/crybaby-coconut-oil-shine-serum- P439093?skuId=2122083&icid2=just%20arrived:p439093
I want to inspect the webpage ,go to network tab to check through all the request and find which url is returning my data.
I am not able to find the url which is returning the reviews.
from selenium import webdriver
chrome_path = (r"C:/Users/Connectm/Downloads/chromedriver.exe")
driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(20)
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-
serum- P439093?skuId=2122083&icid2=just%20arrived:p439093")
reviews = driver.find_element_by_xpath('//*[#id="ratings-
reviews"]/div[4]/div[2]/div[2]/div[1]/div[3][#data-comp()='Elipsis Box'])
print(reviews.text)

In the string you pass to driver.get() you have a white space that should not be there...
The correct string should be:
driver.get("https://www.sephora.com/product/crybaby-coconut-oil-shine-serum-P439093?skuId=2122083&icid2=just%20arrived:p439093")

Python webscraping a form that uses Javascript to process request

I am trying to scrape the resulting page from this page:
http://data.philly.com/philly/property/
I am using 254 W Ashdale St as my trial entry, when I do that in my browser it directs me to the result I'm looking for in the HTML (same URL though).
Python requests is successfully putting the address I put in in the results page, but I am not able to get the owner information, which is what I am trying to scrape. I have been trying with Selenium and phantomjs, nothing I am doing is working.
I was also confused about the form action, it seemed to just be the same URL as the page the form is on.
I appreciate any and all advice or help!

Selenium takes care of virtually everything, just find the elements, enter the information, find the button, click on it, then go to the owner, click on it and get scrap the info you need.
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://data.philly.com/philly/property/')
#enter the street address
driver.find_element_by_name('LOC').send_keys('254 W Ashdale St')
#click on the submit button
driver.find_element_by_name('sendForm').click()
#find the owner
owner_tag = driver.find_elements_by_tag_name('td')[2]
owner = driver.find_elements_by_tag_name('td')[2].text
print(owner)
#click on the owner
owner_tag.find_element_by_tag_name('a').click()
#get the table with the relevant info
rows = driver.find_element_by_tag_name('tbody').find_elements_by_tag_name('tr')
#get the row with the sale prices
sale_prices = list()
for row in rows:
sale_prices.append(row.find_elements_by_tag_name('td')[4].text)
print('\n'.join(sale_prices))
Output:
FIRSTNAME LASTNAME
$123,600.00
$346,100.00
[..]
$789,500.00

Python - Sending __doPostBack (To join groups in Roblox)

Roblox is a game, and you can also make groups (Like Clans in other games).
I am making a program that goes through these group pages, and checks to see if they have an owner. If there is not an owner, then join the group to become the owner.
I am doing this to collect some data from non-owned groups.
Lastly, I'd like it to leave the group once the data is collected.
Here is the "Join Group" button:
<div id="ctl00_cphRoblox_JoinGroup" class="btn-neutral btn-large" onclick="__doPostBack('JoinGroupDiv', 'Click');" style="margin-top: 10px;">
Join Group
</div>
Here is the snippet of code I've tried:
import requests
s = requests.session()
Join_Group = dict('JoinGroupDiv', 'Click')
s.post('http://www.roblox.com/Groups/Group.aspx?gid=40', data=Join_Group)
I get the following error:
Traceback (most recent call last):
File "C:\Users\User\Desktop\Group.py", line 18, in <module>
Join_Group = dict('JoinGroupDiv', 'Click')
TypeError: dict expected at most 1 arguments, got 2
I'm pretty sure that I'm not supposed to be sending a Post request to the page, as to join the group, it does an onClick event.
EDIT 1:
Here is my updated code:
import requests
s = requests.session()
Join_Group = {"__EVENTTARGET":"ctl00_cphRoblox_JoinGroup", "__EVENTARGUMENT":"JoinGroupDiv"}
s.post('http://www.roblox.com/Groups/Group.aspx?gid=40', data=Join_Group)
But I don't quite know what to pass as the eventArguement. I don't get any errors, but when I load the page in Google Chrome it doesn't say I've tried to join.
EDIT 2:
I've also tried this:
import requests
s = requests.session()
Join_Group = {"__EVENTTARGET":"ctl00_cphRoblox_JoinGroup", "__EVENTARGUMENT":{"JoinGroupDiv","Click"}}
s.post('http://www.roblox.com/Groups/Group.aspx?gid=40', data=Join_Group)
EDIT 3:
So what I have here is the updated Join_Group with all of the necessary fields, but not all of the information. I don't know where to get the information.
Join_Group = {"__EVENTTARGET":"JoinGroupDiv", "__EVENTARGUMENT":"Click", "__LASTFOCUS":"", "__VIEWSTATE":"a lot of mumbo jumbo", "__VIEWSTATEGENERATOR":"It is 3D1CCC47 but that might change per post", "__EVENTVALIDATION":"some more mumbo jumbo"}
Here is a correct sample POST, but I don't know where it gets that information for __VIEWSTATE, __VIEWSTATEGENERATOR, and __EVENTVALIDATION.
__EVENTTARGET=JoinGroupDiv&__EVENTARGUMENT=Click&__LASTFOCUS=&__VIEWSTATE=fM9l9%2BTw8z%2FRNbMArorqQ7HQ2w3soDiNp5gJOc5sNNQPEzEzApHiWLOHghGcAriUGv6pCVCi15my4%2BUxUozLVQyGx%2Fiq%2FU9BxRdN80kWJgMyiIyZYtSnfsvlFkqmrHaLIMNKag4eYwnKi5K3TP6JpP5xAxfNIOjekh6vpSa84YVL6eOwPsh5vqlHSN9VjFlwjA1r7AJVZkoeVliUz7vpK1f7DM6lDnOWBtFaAc66xxe2SIoLcjdMlfrVJpJADRjLTEfLp6PNARua3FLJKDezN7WOekGOlSIXHHrzAFlyMY0uZflFykzc9E3zE%2FldijdHWJnIoBVNW1c7fl2ehrbbv%2BAQWpeWqMagkuGNWOmFD6SHPixWLVNgNMlVLXrVKtLEMp3jmXLR40vqdv3bpCkJTLqA9M1XpKsBJlX5szlJlqJzYleD8NncUA8sO2sqiRhnk%2BZURIHV0EQRdPdzJIWMuENZZr%2FPKRl6MNekpZHOtF6wcWQYd4oRYptUjwzWQupbovVnaAyehNLMJbJi6sARkiXAQMC1kUyHpMQVrEdC69%2FNPEW8Cy6QIffUW1d0VBzVuK2hNJV1IgsbqEJZ56HXTtQCz4rJRlATwum2%2BcDS1ITfjA1JWuXwkbFON73TDPFsuz4fOhpVdmNvgN%2B%2F9h4YhmlxlsP3Ud%2BIfuPgzm4b%2FymVzdK%2FBOag7SjO9YRbqH6%2BrIXeXUmPf332sRw4twDy6LhqYxQmzBcr%2BLF4O0%2FTWW1N1pbusZ2p5nyVwaQEQd1FFeVTz5UXNvIUqA89x%2FWzrNHRUEdbR4ZQh5%2B9Eefd0IzIYza6TgRnWYknClVK1sq0qZ7bM1xhJfCsH3xCpWpVNqRqUZvOcWl8JH8aTrs2PhkkDU2%2Flto%2BlcsKG95lV06xukVyfa%2B2uzH%2FVp1Pph2jGLvMHTfLFdgt8jymryMt84jWtg8g%2F9N%2B%2Fb%2B69Pf4OuqLKx0md6Z6gOo4erBMvjrxSwWZxPBf%2BX674CclYnPpBsQjprvRMYGGGUtO957breZv5yQ0zWEB86BWHN6%2Fp6GJ%2Fb3TpeCEgEIKWB6VXxIt9EV7Ls9DrSb43eYSfUDKiBqpKcSD59g5W98rxrm%2FJhrzUnGQsVS4GRNR%2FquTJeWOc4NFIeyULqyLxLECaeXSPoqKe1ijkZc2cz%2FIDLNnmKY%2BMte7JslLswmWC469aL5%2BGMlgC%2FRJ%2BDhxZVbgOfAy7WZ4CoPtpoM2ixT5l5%2Ffg%2F1Z9jlRM4SAdU6XuC%2FnIJjYRBjZtA2IHE4xl87bgR8noUW%2FpMUmWeJrrScXuNpOKKhGoeM1XBg%2BiCZbFQ6oIihzGdqsvW4YQ3i%2BdzJEQvO%2FabotjD6zgekjvonlWOXTiSFXRS9h1JhyPFjMUdooDCkB7%2BtvuMj%2BUXUX2TccPtt%2BHdZpwtRVwzzf96J7bcs%2F1FaYhDsoJO01SqEtUAFWVqLO004kauPB6uJc%2FDSlBDqGI2hY02ORAl6BfyHVT4vxW8YDv55kD0R2hJhBtfhHsvQaEvrOR2BFHIB9hp5G2KW6zDuCWthVckhjUDsBDzOzc%2FDYtcBHCw8oWVXAbX%2BEjJNwHH2CVXVFO7HRSg3LEjRmqX1Hu72wXCp%2FlV3PCa%2FZcytUhuwMl%2B4PaqzD%2BQcTM0bm8JXylJAuksYMqQNkb9D%2BukLxyzADwfYFWy3aEJI1bVu0S0s6SJlLhGL1Z1E8x3sVh6vrMwYV6lZXzAU3BjYBgozfNvWpUDbM7bRJmEgBIgJFZJImFKs%2FwltOI6XACd1E%2BHQXchinr%2FHFYX2JW7uawkF0eJ7uNcYQ9O0pw9UNPWTOdVP855V3JIJZA0HvKt99Sn2Q%2FHgTWBAFmjefEI73B0wEGqIjPCTPJfljbcouvH6DfzttfGItUJ5g2lAN2H6lr%2F2UuRpjaJa3X924BF6WlIr%2FZ0JoAwxWvw1dvd00BCBamThc%2BK515yzwo5uYFKBlyt%2BQz9y%2FAf2mfRlI%2Bg%2BBV7b3LipC%2B66gfY6HJwOxNniy67qD1pkgKMRT6xeP8h86IOHUDyaG54RIMQAkYDSOyhUlKbMEsVbhq%2Fuqo9wvVq9djLSB%2BUcjpes%2Bfsn%2FyDdTl3nCWHd3xDC6MywPSvb%2F7pvhriQmpPjiXHhrvaMpWdecsXEF60kB3Hl5gm6klilE4ZkbcwakkSdGUVYn0b2mbM%2B7gq2lopcmTzAJuPAZI%2Fdp%2BnfT9mMywge6d1RP%2BCFBWCZdSXUjHHGvuWBPg2u8yfF7lQGOq7avsR%2BXZ2pi2afwwgzEyoHoxPYSfXn3J5%2FSkAINRE8rjbxK2n2qCRXSUlex1fp38OrPHK9TODFHPuq0whQQ2JYgq2IHXeUUiysJ1j5cub8JfZkoOEAm2TwnExVx9Fvvq3SadMWKKW8yOwZ6u%2BSLJUKu%2FtSzD6cXtiCbDpoZTe9LAeAgD%2FH9CDsvgX6sm6m441qf2glcOg01GsiLi%2BcVOGy14oT0I%2F0qtAjZS5qQskhMFNhrPudHgXgIw8%2BC81KK396oDN5JccahOO2Zuxkgiv3BtWO%2Fp8RBzrkPLR3hK2hj%2FVcSdwe0VUvq7%2FBNKyoRUsTN0tLWk6uvsce8P5avqG0VXXnUYuhvQTRBd88QACFMQW47kINvBq0N%2Ba8byLHlN8Kq%2BogTeBDuTn7CKOlOxi1ryxYm%2BUtzP4Ep0XBfrYba9Ztyp9L4Il0aNXElYudeV7i1NGwjh5FRgvYOPLQwTs7kHCIIUgpcYcX4oUUwdvyyZUDnzGPuWUhu1E0RELWiV89%2BJIiCrw4SKdohI86thnXhR9Ye%2FciOOjq8%2Foo0Vl9lq2Re2uJttZ9l99cCi54xFIPS1celZfZYKxR%2B2HTaB3EZ4z7%2FvdvQJ8BqXTWlfLTUh6M1wDJvZJPJkHBILh8sHMdew3NT2NSfF7dIzXIdg%2F8h5Pyy7NCQEXYn0nhlEecjuL3%2FOD4ccd4nq7FvoA9RcSLkpVg4OUy7KsQgfxwdp3KLrgJFowsqb5oN6zBojmWpFXhqaSSQFM3jsDQ6eCtxruTKTFqXc0eb23enSXNJoEIk%2BMQnwbycOYZomJ05Bi0dQrCEwMQ6jG%2Fq6pH2qfNiA5hoVHZww5miQHEHCT4vw0mTRky9d%2FuUNgjpZlK28iKV%2BT7i%2Ftq6UJ8ldEi%2FeFIpJUNi2tjQtQk2qrBw3L%2FN0Z7c%2BdkzqXxAByQCXuCA5v1Zt4yJKoC0763V6rrKU9JaIyeNAl4arqk%2BZ%2BQxfY6Ia6MmCpaGrzn4yHRnxnR52TEbn%2FhWM3LTzNeghmRphtKi4AereUSOJXbQOrdtVEG0uVF6ZNhX7DWYQ7BVI%2Bw5bN4Cp%2FBL9WxjD0s0WetzoRor0vL7IUHIzkoQkJHlBnCXXHh6Mif6Rzgm8O%2FfEgalGqhScqAwlXE5JYeZpm%2BusCO4gC0PE%2BSJzWQy63gHq5q3semEMn1cje95%2FIO5No09ZPoYkqAVd7XSBsVrYu%2BQhoGR8h95PcKeW7q2SVRanFCzC%2FhNALnQBh9qs78L23MiioERyE8D%2BnLi0K6l2Gba1BJ5tkbTaFynhuRWyq%2Bacvjbno5gPXVINSPoVf%2BbpZYSmgbZCAIrctXgInQuNBaWTzWICeJxKctTmaEIO6kXBB7y1p%2BzDpqVr4hjMkjZTS9mz8YLuU4eSAMhYD59GJZQi6mVkR1U8C9OsV6O%2BXOvnDQVWjrqQGIpVmfRv0B6sCtrBd%2BAX6N4TkaRfVehaW39%2BgnBB6SpTc0IZZ4apy%2BHXiOqVfLuABrL6JN6gH29V%2BRsd%2BybvouJRxdb47mxv7trl6JOwINUbERqsLQAPjrr54wBtS4LlRkvfBZpDAM7yCK%2FRpeT%2BjuC1X9i1p3CwL74V%2FN5Q5LxlA51Z9oxciOohCy4GkawbMSE8qor5%2BPwM4bNn%2FM%2FhgxmaOYRsoQZ4ZaEcRpAI5Vi4A3sIyYIdCbUvMmAKrEW%2FUeRhShapsCjLxcXeNuumWDCOHpJfVt3n8K0q6Ona%2FU4VkupnBbdK6Yo2%2FdBt64Zhmuzl3Pp9tnCmbYeuq%2BmR4B7PkCVNibRS0%2Fe8JI2S2mtSKb8XLVk%2BhmOXBxJ7sscsa%2FMjtWICOMVjg7fGYcwEXSUeJbVKMoCY1Jv7B6k4KV2sO9yruESkscm%2By3bV1rOFQgoQ2BFCS%2F1XwvFCWBoa4hoMgs1yVihXN7mRz61taqiPLqVoHNTSA4wOQtgS1xsYThYlZDpKyltNnFrs%2BB42yjByPn8qyR91eoobFYzjBvxnVrDg4DX58kJ2VumhFRe4NPg29seqk64Ps%2BbLUFDiCV39CVrsriWGR4G91xCeZpcv5NABou3M3RareJeoTTAmpBD3KUA3orZ8xRAUuFOkC9nDPJujH9nI117UMgScL9t9Rx7eB6l6jEt7KiMocEoRGHOMnSQJz28bbidFQWx0z0H67%2FcwOwSq0P71fDojxRc8Uhyk9yZKFfCiQ5HN6WnBS0u4O61hYDHO4oioPrFh4VvxdeYXf5Wz%2BDQFtdqUzYhC%2FlA%2FLWAWJF8WkQ%2BX7NEKUh3JO%2BxxFZJaJH28xgd4E5jnTT92orjalpTwTZ1y2Pm%2FsVUKyLm8oXhD9N2nTi%2FB2oey0H6Dn1Charhyc1MiS9dYLICym1zFBcTAQt6Our9D96Z20LRDi5UYci%2FxUDVJrwsBgREMxT4tXx41XsCJhkAxYvL1coY7HHmlgQUg6RDruY21h85VBJQeyNX0GkNUqXTaW5lTPRPflsIIbs80L0UX%2FZc6coyvlnWmkl3e93liSifT4eWW7AhhCAdXXZRKX2ZTzHKWnHzuXHoeP6MSHmSw6lA4Shi3VHv8TAmK4aumANtqASpxPw0I0rXy1U6dXW2UMj8FHEJP8pZ6BX5smKqMQaB6JuWP1pZCPgINg%3D%3D&__VIEWSTATEGENERATOR=3D1CCC47&__EVENTVALIDATION=qv5aJzTc3fM%2BcENN%2BTP2DgRs4ocXUM4qLJNP%2FtEv4q0xMardTbzlDm9uqRxoi%2BfRFn8e%2FC0PuVHiCvBR2xRuCXu%2BBQLORcJ%2BQ%2FANICydWIh6GamZbMbX0BfCN%2BuVKQqW8v1HzL9oN9IOmupGv9F%2BvyxGsToAR94w6szmvNNYvcmQKqcflo2K04UZh1lqzC7ScOHIhyMJb4xooM4oTg3qlmISKwYKDPhVgVgzU4zvzFueU2kToA0DykBBodt8%2BJcKHXbxt4UkL%2FBAZvHssrUeFA9OtAECcG4T3r68EB632IBprg8m9uiVX1wP%2BB8yQQdpQjBtYfT9rBHblz1HvaAu1mcRB0E%3D&ctl00%24cphRoblox%24GroupSearchBar%24SearchKeyword=Search+all+groups&ctl00%24cphRoblox%24rbxGroupRoleSetMembersPane%24dlRolesetList=2343447&ctl00%24cphRoblox%24rbxGroupRoleSetMembersPane%24RolesetCountHidden=2&ctl00%24cphRoblox%24rbxGroupRoleSetMembersPane%24currentRoleSetID=2343447&ctl00%24cphRoblox%24GroupWallPane%24GroupWallPager%24ctl01%24PageTextBox=1
EDIT 3:
s = requests.session()
login_data = dict(username='USERNAMEHERE', password='PASSWORDHERE')
s.post('https://www.roblox.com/newlogin', data=login_data)
page = s.get('http://www.roblox.com/Groups/Group.aspx?gid=403577')
soup=BeautifulSoup(page.content, "html.parser")
VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
Join_Group = dict(__EVENTTARGET="JoinGroupDiv", __EVENTARGUMENT="Click", __LASTFOCUS="", __VIEWSTATE=VIEWSTATE, __VIEWSTATEGENERATOR=VIEWSTATEGENERATOR, __EVENTVALIDATION=EVENTVALIDATION)
join = s.post('http://www.roblox.com/Groups/Group.aspx?gid=403577', data=Join_Group)
print join

Your error is because this is not valid dict construction. You either need a literal (like {'foo': 'bar'}) or, if using the constructor, keyword arguments dict(foo='bar').
You should do a POST as that's what __doPostBack() does - post back to the same page/URL that's been served, see What is a postback?.
To determine what actually needs to be set for the POST, you should understand the __doPostBack() function.
Those arguments are used to populate hidden form fields called __EVENTTARGET and __EVENTARGUMENT.
You can read an explanation about this here:
http://www.codeproject.com/Articles/667531/doPostBack-function
Edit: It's also worth noting that, when interacting with an application in this manner, you also need to be aware of any other state that's being managed and passed to the server. There are cookies, viewstate etc. to be considered. It's best to go through the process in a browser and inspect/record the requests with the developer tools to help you determine what data is required.
Edit 2: I see that Roblox has an API. You're much better off using this if/where possible rather than interacting with their UI.
See http://wiki.roblox.com/index.php/Web_APIs#Group_APIs
If this isn't letting you get access to a list of groups (I don't see a search here at a glance) then you might consider a hybrid approach.
Edit 3: For the scraping-type approach, you may want to consider Beautiful Soup for parsing the page/form and extracting the values being added by ASP.NET -
http://www.crummy.com/software/BeautifulSoup/

Develop Reference

JavaScript is the programming language of the Web.