Get Dynamiclly loaded Source Code from python using mechanize and bs4

Get Dynamiclly loaded Source Code from python using mechanize and bs4 - javascript

I want to get source code of page that loads from javascript actually that page is linkedin profile page and i want to get job and education details.
I'm not using selenium i don't want browser window to open i know about headless but cookies problem
I have logedin through mechanize and i have get some data like phone number, address, headlines, emails, and Full Name. But as it is loaded from javascript so i can't get whole page data.
Data getting:
.....<code id="bpr-guid-892585" style="display: none">
{"data":{"entityUrn":"urn:li:collectionResponse:uPYuDSPXzooiHx+zPOguG1+f+JFMWTWFEfhiIQtEFMM=","elements":[],"paging":{"count":10,"start":0,"total":0,"links":[]},"$type":"com.linkedin.restli.common.CollectionResponse"},"included":[]}
</code>
<code id="datalet-bpr-guid-892585" style="display: none">
{"request":"/voyager/api/takeovers","status":200,"body":"bpr-guid-892585","method":"GET","headers":{"x-li-uuid":"AAXafRyXk/WxvhRuOZTrnA\u003D\u003D"}}
</code>
<img class="datalet-bpr-guid-892585" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none"/><code id="bpr-guid-892586" style="display: none">
{"data":{"entityUrn":"urn:li:collectionResponse:nZx6/1e1AAbOHh075gv083zrunZT186/K+rx5FP70A4=","elements":[{"lixTracking":{"urn":"urn:li:member:865882626","segmentIndex":2,"experimentId":4358724,"treatmentIndex":1,"$type":"com.linkedin.voyager.common.ChameleonConfigLixTrackingInfo"},"data":{"namespace":"premium/templates/components/chooser/plan-card","locale":"en_US","message":"Learn more","key":"i18n_card_select_plan","$type":"com.linkedin.voyager.common.ChameleonConfigDataI18n"},"displayName":"card_select_plan","description":"testing 'Learn more' on SKU cards, vs control of 'Select plan' ","lixTreatment":"VAR_t20152_PR_1","lixKey":"chameleon.PREMIUM:us.copy.17654","creatorDisplayName":"cyount","status":"PERMANENT_RAMP","$type":"com.linkedin.voyager.common.ChameleonConfigItem"},{"lixTracking":{"urn":"urn:li:member:865882626","segmentIndex":3,"experimentId":4395729,"treatmentIndex":0,"$type":"com.linkedin.voyager.common.ChameleonConfigLixTrackingInfo"},"data":{"namespace":"onboarding/templates/components/widgets/people-you-may-know","locale":"en_US","message":"Connecting with people lets you see updates and keep in touch","key":"i18n_onboarding_pymk_page_header_phase_3","$type":"com.linkedin.voyager.common.ChameleonConfigDataI18n"},"displayName":"onboarding_pymk_page_header_phase_3","description":"Onboarding PEOPLE_YOU_MAY_KNOW widget header copy test","lixTreatment":"control","lixKey":"chameleon.ONBOARDING:global.copy.19060","creatorDisplayName":"zihliu","status":"MAX_RAMP","$type":"com.linkedin.voyager.common.ChameleonConfigItem"},{"lixTracking":{"urn":"urn:li:member:865882626","segmentIndex":3,"experimentId":4395707,"treatmentIndex":1,"$type":"com.linkedin.voyager.common.ChameleonConfigLixTrackingInfo"},"data":{"namespace":"onboarding/templates/components/widgets/profile-edit-common","locale":"en_US","message":"What’s your most recent experience?","key":"i18n_onboarding_profile_edit_work_header_v2","$type":"com.linkedin.voyager.common.ChameleonConfigDataI18n"},"displayName":"onboarding_profile_edit_work_header_v2","description":"Onboarding PROFILE_EDIT widget header copy test","lixTreatment":"VAR_t21697_PR_1","lixKey":"chameleon.ONBOARDING:global.copy.19063","$recipeTypes":["com.linkedin.voyager.dash.deco.relationships.ProfileWithEmailRequired","com.linkedin.voyager.dash.deco.identity.profile.WebTopCardCore"],"$type":"com.linkedin.voyager.dash.identity.profile.Profile","firstName":"Adarsh ","profilePicture":{"displayImageWithFrameReferenceUnion":{"vectorImage":{"$recipeTypes":["com.linkedin.voyager.dash.deco.common.VectorImageOnlyRootUrlAndAttribution"],"rootUrl":"https://media-exp1.licdn.com/dms/image/C4E35AQEIVkoUWgLFvw/profile-framedphoto-shrink_","artifacts":[{"width":200,"$recipeTypes":["com.linkedin.voyager.dash.deco.common.VectorArtifact"],"fileIdentifyingUrlPathSegment":"200_200/0/1597096649541?e=1647694800&v=beta&t=XVOK0upwO6V3NaJtUWLwy-yLMDa8cZICzYH0do67vhU","expiresAt":1647694800000,"height":200,"$type":"com.linkedin.common.VectorArtifact"},{"width":400,"$recipeTypes":["com.linkedin.voyager.dash.deco.common.VectorArtifact"],"fileIdentif
</code>
<img class="terminatorlet" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none"/>
<div aria-live="polite" class="visually-hidden" id="a11y-notification" role="region"></div>
</body></html>
And also if there is way by using of selenium then please guide me but using headless attribute.
It contains all data that i mentiond above but when loaded from browser after login it's different.
Thanks for any help.

Related

Proper way to handle ajax HTML block

So I was wondering what is the "spec" or "proper" way to handle HTML that is used via ajax.
For example, should I keep all the HTML in the actual page that is using it? Or should I just an ajax call to load it in?
Is there performance increase in keeping it loaded in the page since its one less load? Or does loading that extra data at page load off-set it.
Here is a screenshot illustrating what I mean.. You can see the {name} which is changed depending on what the user provide (limited characters of course).
Any help/opinion is appreciated! Thanks!
Partial source for those asking:
<!-- text field -->
<div class="add-field-wrapper float-left">
<input type="radio" value="text" name="input_type" id="rad-type-text" class="type-radio-btn">
<label for="rad-type-text" class="radio-lbl" data-tooltip="Used for simple inputs such as: <b>Phone Number</b> or <b>Email Address</b>">
<img src="https://placeholdit.imgix.net/~text?txtsize=16&txt=70x70&w=70&h=70" class="field-type-icon" />
<div class="field-type-text">Text Field</div>
</label>
</div>
<!-- select -->
<div class="add-field-wrapper float-left">
<input type="radio" value="select" name="input_type" id="rad-type-select" class="type-radio-btn">
<label for="rad-type-select" class="radio-lbl" data-tooltip="Use this option when you need to provide a list of choices for the user." >
<img src="https://placeholdit.imgix.net/~text?txtsize=16&txt=70x70&w=70&h=70" class="field-type-icon" />
<div class="field-type-text">Select Menu</div>
</label>
</div>
<!-- textarea -->
<div class="add-field-wrapper float-left">
<input type="radio" value="textarea" name="input_type" id="rad-type-textarea" class="type-radio-btn">
<label for="rad-type-textarea" class="radio-lbl" data-tooltip="The textarea field will appear as a <b>WYSIWIG</b> (What you see is what you get) editor. This allows for some customization of the appearance of the input.">
<img src="https://placeholdit.imgix.net/~text?txtsize=16&txt=70x70&w=70&h=70" class="field-type-icon" />
<div class="field-type-text">Text Area</div>
</label>
</div>
Edit:
<!-- Datepicker HTML block - used in JS -->
<div id="datepicker_html" style="display: none;">
<div id="{name}-block" class="datepicker-wrapper form-input-wrapper">
<div class="template-drag-handle"><img src="images/design/up-down-icon.png" class="template-drag-handle-icon" alt="Drag" /></div>
<div class="inputs-wrapper">
<div class="form-row"><input type="text" name="{name}" class="input-datepicker" placeholder="{placeholder}" id="{name}"/></div>
</div>
<?php echo $default_template_chkbox_options_html; ?>
</div>
</div>
That's a "piece" of the html.. it gets loaded into a JS variable:
This is what processes it -- adds the name, changes the placeholder (these can be reused as many times as you want)
function addDatePickerField(){
//Get the HTML
var datepicker_html = $('#datepicker_html').html();
datepicker_html = datepicker_html.replaceAll(/{name}/g, input_name_underscores);
datepicker_html = datepicker_html.replaceAll('{placeholder}', input_name);
$('#template-fields-wrapper').append(datepicker_html);
wrapUpAddInput('datepicker');
}
I just didnt now it if would be better to do an ajax call, store the "external" html and call it in when I need it -- Like, that datepicker HTML block, would be store in separate file, then on a link, load into the DOM.

I will try to address your question, even though it's a very broad one.
Generally, loading your content (e.g. HTML) dynamically via an ajax request does not always give you a performance boost, it really depends what you are doing and trying to achieve.
Should you always pre-load all of your HTML content with the initial request ? Or should you ajax load a portion of it after the page is already loaded on screen ? that is solely depends on your application and needs.
I will explain by an example:
Assuming I am developing a content site, which will be mainly content oriented (e.g articles) and will be served from traditional web browsers (desktop or mobile) then loading my articles for each page via ajax might not be a good idea, with very few rare exceptions.
On the other hand-
If I am developing a web application that needs to send and receive blocks of data in "real time", a project that contains a rich UI which has to have a rich and "enterprise"-like experience where stuff is being executed, updated and displayed on-the-fly smoothly without having to refresh and re-load my application page every time I am saving my work or executing an operation - I will certainly use ajax requests for handling some of that work.
Another aspect is the overall loading time of your page:
Some web-sites are loading some of their HTML via ajax after the page body is loaded - by doing this they are reducing the perceived loading-time of the page, by "perceived" I mean - to the user, it appears to load faster since the partial page is loading almost instantly, and then some blocks inside of the page are loading async via ajax.
Like i've said, this is a very broad question and there are many methods to learn and investigate to finally see what works best for your specific needs.
Good luck

Python Requests getting all html data from site

I am trying to get product data from Metal Mulisha, I have a list of product IDs that I need to find data on. So I use python with python package requests, with the search URL "http://www.metalmulisha.com/shop/search/?q=20M35518334Z%20M45518403Z%20M45518415Z"
I then use BeautifulSoup to find the class and data I need, but I get an error that says there was nothing there.
So I first went to the URL in Chrome then inspected the elements and all the information I needed was in the html on Chrome.
Here is a snippet of what Chrome showed.
<div class="col-md-10 col-md-push-2">
<div data-rfkid="rfkid_7" data-keyphrase="20M35518334Z M45518403Z M45518415Z" class="rfk_sp rfk-sp">
<div class="rfk_sp_container" data-nrp="2" data-ntp="2" data-pg="1" data-status="2" rfk_track_appear_once="f=sp,rfkid=rfkid_7,a=1,c=1">
<div class="rfk_header">
</div>
<div class="rfk_message">
<div class="rfk_msg_noresult">
</div>
<div class="rfk_msg_results">Top Results for "20m35518334z m45518403z m45518415z"</div>
It keeps continuing under the first div, all I am showing you is there in a lot of information after <div data-rfkid=.
Once I ran my python script to find the first div, this is what I get.
<div class="col-md-10 col-md-push-2">
<div data-keyphrase="20M35518334Z M45518403Z M45518415Z" data-rfkid="rfkid_7"></div>
</div>
As if all the product information that I need is not there.
Here is my python code, so you can see what I did. I am using python 3.5.
import requests
from bs4 import BeautifulSoup
url = "http://www.metalmulisha.com/shop/search/?q=20M35518334Z%20M45518403Z%20M45518415Z"
html = requests.get(url).text
bs = BeautifulSoup(html, 'lxml')
possible_links = bs.find('div', attrs={'class': 'col-md-10 col-md-push-2'})
print(possible_links)
My question is why can't python find the html I need? If I inspect the site in Chrome I see it just fine, but when I use Python and request the site, it's not there. Is this to do with JavaScript? And if so how do I fix this?

Ajax Live Search Questions

So basically this has been asked many times but I couldn't find an answer with my needs. Basically I have found many urls with links like example.com/ajax/search?query=Thing
I have a bit of a header in the works and I currently use W3schools XML version but it doesn't fit my needs at all since I need it to basically search IMDB for whatever the user enters, Once they enter for example 'The Simpsons' it will then popup all search results with the name and it being a clickable link to the IMDB link for example http://www.imdb.com/title/tt0096697/ but then replace imdb.com in that url with my websites url (To make it responsive in a way).
But I need it to use AJAX/jQuery in a way so that it searches on IMDB so using this XML file method wont work.
How is the sites with /ajax/search doing this type of IMDB search which is used a lot on torrenting sites lately.
This is where I got my current code for the search from: Live search with PHP AJAX and XML
But as I said it needs to be run with Ajax, Have live search, and basically scrape/search on IMDB in a way and then change imdb.com to mysite.com
Update:
I managed to find something like this:
http://pastebin.com/PAD5AXUK
And this is the HTML:
<div class="main-nav-links hidden-sm hidden-xs">
<form method="GET" action="http://www.imdb.com/find" accept-charset="UTF-8" id="quick-search" name="quick-search">
<div id="quick-search-container">
<input id="quick-search-input" name="query" autocomplete="off" value="Quick search" type="search">
<div style="background-position: -160px 0px;" class="ajax-spinner"></div>
</div>
</form>
<ul class="nav-links">
<li> Home
</li>
<li> Browse
</li>
</ul>
<ul class="nav-links nav-link-guest">
<li> <a class="login-nav-btn" href="javascript:void(0)"> Login </a> | <a class="register-nav-btn" href="javascript:void(0)"> Register </a>
</li>
</ul>
</div>
But it still doesnt seem to work at all

you can check this.
https://twitter.github.io/typeahead.js/
Problem will be solved

As for the searching of IMDB You're going to have to use something like the php curl extension to get the contents from the website and parse them with an html parser library.
For the live searching thing, you can change the url in javascript with this pushState() function like in the following answer: Changing the url with javascript. Then you can use some jQuery to make it easier to send ajax get requests to YOUR OWN server, where the php script will process the request to IMDB (something involving
$.get("/ajax?query=TMNT", function(data) { //handle the data in here } );
Then in that callback function you can update your page with the contents of the result. You could even do some of the processing locally, like with the changing the url of the IMDB link.
The process would resemble the following chart:
User Enters Query (TMNT) ->
Ajax sends data to my own backend page (process.php) ->
process.php scrapes IMDB search query and parses it with html parser ->
outputs results which return to ajax function ->
ajax callback function places results in a DOM element

How to dynamically change facebook comments plugin url based on javascript variable?

I want to dynamically change the data-href for the fb comments plugin below based on a javascript variable. I'm running a flash swf file and am passing the new link for data-href into the html wrapper via a javascript function. When I do that, I want the fb comments plugin to refresh to the new data-href link.
<div style="float: left; padding-left:5px; min-height:500px" class="fb-comments" data-href="www.example.com" data-num-posts="20" data-width="380"></div>
Called javascript function passing in the new link for the comments plugin:
function changeCommentsUrl(newUrl){
// should refresh fb comments plugin for the "newUrl" variable
}

This will load the initial comments box, the script will when executed will clear the comments div wrapper and replace html5 comments box with new url. JS SDK will then parse the new box.
JS SDK is required for this work. refer to https://developers.facebook.com/docs/reference/javascript/
fix for xfbml render from dom manipulation
<div id="comments">
<div style="float: left; padding-left:5px; min-height:500px" class="fb-comments" data-href="www.example.com" data-num-posts="20" data-width="380"></div>
</div>
<script>
function changeCommentsUrl(newUrl){
// should refresh fb comments plugin for the "newUrl" variable
document.getElementById('comments').innerHTML='';
parser=document.getElementById('comments');
parser.innerHTML='<div style="float: left; padding-left:5px; min-height:500px" class="fb-comments" data-href="'+newUrl+'" data-num-posts="20" data-width="380"></div>';
FB.XFBML.parse(parser);
}
</script>
user solved:
document.getElementById('comments').innerHTML='<div style="float: left; padding-left:5px; min-height:500px" class="fb-comments" data-href="'+link+'" data-num-posts="20" data-width="380"></div>';
FB.XFBML.parse(document.getElementById('comments'));

I found the simplest and effective way to make your Facebook Comment box to recognize the individual URL of each page (particularly good for e-commerce sites).
Add this script to your top header portion of your website template (it generates de data-href value for your Comment Box div:
<script type="text/javascript" language="javascript">
jQuery("#FC").attr("data-href", window.location.href.split("?")[0]);
</script>
And then on your Comment Box div, add the id for the value generated on the javascript:
<div id="FC" class="fb-comments" data-href="" data-width="700" data-numposts="5" data-colorscheme="light">
Voilá. I dedicated so much time to crack this nut, I just had to share it for you to save some time for break!
Cheers!

Can I get some help decoding this bit of a Facebook page?

I'm trying to figure out just how a particular function works on a Facebook page, and being no friend of JS syntax, am having trouble. Here's the question mark bit:
<a href="#" clicktoshowdialog="my_dialog" onclick="
(new Image()).src = '/ajax/ct.php?app_id=4949752878&action_type=3&post_form_id=3b933f46f9c4c44981e51b90c754bfce&position=2&' + Math.random();
FBML.clickToShowDialog("app4949752878_my_dialog");
return false;">
<img src="linktopicture" title="Are your friends fans?" width="190" height="230" />
</a>
<div style="display:none">
<div id="app4949752878_my_dialog" fbcontext="aa3fcff8e653">
<div class="app_content_4949752878" style="padding:10px">
<div with hidden then exposed content...
The functionality of this is an image that, when clicked, pops out the previously hidden div. I know that the app###### is prepended to all JS used in Facebook to limit its scope. I'm confused by the anchor parameter of
clicktoshowdialog="mydialog"
What is that identifying, and how is it targeting the div that's exposed when the image is clicked? Thanks for any clarification, and let me know if I can post any more sample code.

According to the wiki it's just for opening the dialog (which is defined at the bottom). Facebook generates the JS to open the dialog. The attribute got post-processed and the JS code (that you see in the onclick= attribute) was generated on it's basis.

Develop Reference

JavaScript is the programming language of the Web.