Scraping text generated by script with PHP Simple HTML DOM Parser - javascript

I am trying to get the following text "Hug­gies Pure Baby Wipes 4 x 64 per pack" shown in the code below.
<div class="offerList-item-description-title">
<div id="result-title-5" class="offerList-item-description-title">
<script type="text/javascript">
document.write(getContents('wF8UD9Jj8:6D !FC6 q23J (:A6D c I ec A6C A24\<'));
</script>Hug­gies Pure Baby Wipes 4 x 64 per pack
</div>
</div>
I have tried using code such as:
foreach($element -> find('.offerList-item-description-title') as $title)
{
foreach($element -> find('text') as $text){
echo $text;
}
}
But just get returned an empty string, any suggestions?
Thanks.

If you are aware your HTML returned by your scraper does not contain Javascript rendered code, like in your case text is generated by javascript that's why you are getting empty response. What you need is a headless browser like PhantomJS you can use PHP wrapper of PhantomJS http://jonnnnyw.github.io/php-phantomjs/.
This will solve your problem. It has following features:
Load webpages through the PhantomJS headless browser
View detailed response data including page content, headers, status
code etc.
Handle redirects
View javascript console errors
Hope this helps.

I'm not sure what code your using in your example (and I suspect the getContents function result gets in the way of your method for retrieving the text) but if you wrap the text you're after in a <span> like so:
<div class="offerList-item-description">
<div id="result-title-5" class="offerList-item-description-title">
<script type="text/javascript">
document.write(getContents('wF8UD9Jj8:6D !FC6 q23J (:A6D c I ec A6C A24\<'));
</script><span>Hug­gies Pure Baby Wipes 4 x 64 per pack</span>
</div>
</div>
you can retrieve it using javascript:
<script>
var $title = document.getElementsByClassName("offerList-item-description-title");
for (var i = 0; i < $title.length; i++) {
var span = $title[i].getElementsByTagName("span");
var $text = span[0].innerText || span[0].textContent;
//echo $text;
console.log("==> " + $text);
}
</script>

Related

How to convert string into html format using ejs in javascript

The scenario is the user will enter the text in HTML format (e.g. <\/b>Testing<\/b>) then the inserted text will get saved into the database(HTML code as a string (e.g. <\b>Testing<\/b>).
I want the string fetched back from the database to be displayed as HTML text (e.g. Testing).
I followed the below snippet but didn't get anything as output.
Note: <%= cData.description %> worked fine when executed simply but displayed HTML code as plain text.
test.js (route file):
var testid = 234123;
b.find(testid, function(data) {
b.otherdet(testid, function(cdata){
res.render('blogdesc',{
Data: data,
cdata: cdata
});
});
});
test.ejs file:
<p class="" id="descid"></p>
<script>
var $log = $( "#descid" );
html = $.parseHTML('<%= cData.description %>'); //description is column in database
$log.append( html );
</script>
https://stackoverflow.com/a/8125053/20394 shows how to emit HTML as-is in EJS, but please make sure that you sanitize that content to avoid XSS.
I've found, where did it go wrong. I was using:
html = $.parseHTML('<%=blogData.description%>');
while the actual syntax should be this:
html = $.parseHTML('<%-blogData.description%>');

Getting Javascript value in Python

I am currently writing a script that prints the content of a page and then extracts the data I need for a future request payload.
I am unable though to locate a certain value named "dfValue" it seems to be located within Javascript so when I try and extract the "dfValue" content I can only get a blank response.
The dfValue snippet is below:
<script type="text/javascript" src="/hpp/js/df.js?v=20170531"></script>
<div id="df_swf_c" style="display:none;"></div>
<input type="hidden" name="dfValue" id="dfValue" value="" />
<script type="text/javascript">
//<![CDATA[
dfDo("dfValue");
//]]>
</script>
With similar values on the page I am able to extract it by simply using code such as.
soup.find(None, {'name': 'dfValue'}).get('value')
but this does not work, is there a particular way I am able to extract the dfValue?
Advise is appreciated.
That input element is not within Javascript. It's accessible. As shown here, the name of the input element is 'dfValue' and its value is '' (an empty string).
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('temp.htm'), 'lxml')
>>> input = soup.find('input')
>>> input.attrs['name']
'dfValue'
>>> input.attrs['value']
''
If you need to be able to enter data into this field, or otherwise manipulate this form, then you could consider using the selenium library.

Tooltips in Joomla containing HTML

I created a component using Component Creator for Joomla 3.X
The problem is that on the site the tooltips show html code. For example:
<strong>Title</strong><br/>Description
Instead, on the administrator they are displayed properly:
Title
Description
I was reviewing the documentation here and here, and it seems that it is possible to define the output format when calling the tooltip function but there are no calls inside the component to that function. The only call I see is JHtml::_('behavior.tooltip'); at the begining of the view file and don't know how to specify the output format.
The site is in this URL:
http://50.87.185.99/colombianadederecho_blank/index.php/administradores-ph/registrarse
Added:
It seems like the code is called when building the label for each field:
<div class="control-group">
<div class="control-label"><?php echo $this->form->getLabel('document'); ?></div>
<div class="controls"><?php echo $this->form->getInput('document'); ?></div>
</div>
And it is stored in the title attribute:
title="<strong>Title</strong><br />Description"
So, I think the problem is in the Javascript function, maybe using a .text() jQuery function instead of a .html().
Inspecting the source code I found this, but don't understand why it is not working properly:
window.addEvent('domready', function() {
$$('.hasTip').each(function(el) {
var title = el.get('title');
if (title) {
var parts = title.split('::', 2);
el.store('tip:title', parts[0]);
el.store('tip:text', parts[1]);
}
});
var JTooltips = new Tips($$('.hasTip'), {"maxTitleChars": 50,"fixed": false});
});
jQuery(document).ready(function() {
jQuery('.hasTooltip').tooltip({"html": true,"container": "body"});
});
Solution:
As #ilias found it was a problem between libraries. I had to disable the bootstrap call in the header using this plugin and call it from the end of the body:
<script src="/site/media/jui/js/bootstrap.min.js" type="text/javascript"></script>
I don't know the code that Component Creator produces, however I assume that the tooltips are like in the administrator views. If this is the case you should check for something like this:
<button type="submit" class="btn hasTooltip" title="<?php echo JHtml::tooltipText('JSEARCH_FILTER_SUBMIT'); ?>"><i class="icon-search"></i></button>
and enclose the contents of JHtml::tootlipText() in strip_tags().
Otherwise try to find the line <strong>Title</strong><br/>Description where the tags are shown and enclose it in strip_tags(). For example you might have something that looks like:
echo $this->escape($item->description);
that should be turned into:
echo $this->escape(strip_tags($item->description));

Dynamically creating jQuery listview in PHP loop

I would like to populate a jQuery listview in a PHP loop, and I attempted to do so by echoing javascript code that populates the list with a PHP variable. This is what I'm working with:
My HTML
<div data-role='page' id='feedPage'>
<div data-role='content'>
<ul id='pics' data-role='listview'>
<li>test</li>
</ul>
</div>
</div>
and my PHP / JavaScript
echo "<script type='javascript'>
var pics = \$('#pics')
var pitem = \$('<li/>').html($myArray[element])
var plink = \$('<a/>')
pitem.append(plink)
pics.append(pitem)
pics.listview('refresh')
</script>";
but the list comes up blank. This code is running inside of a PHP for loop, and I am able to access and manipulate all the elements of $myArray just fine in PHP, but I cannot seem to populate the list. I even tried running this code with a simple .html('hello') to no avail. All I get is a blank list with the exception of the test item I hardcoded in the HTML. Is there a way to generate a list in PHP like this, and if so, how can I do it properly?
Thanks!
SOLUTION:
I got this working simply by doing <script type='text/javascript'> and .html('$myArray[element]') (notice the single quotes). This works because the javascript is running inside a PHP echo. Oh, and none of my $ needed to be escaped. Final code:
echo "<script type='text/javascript'>
var pics = $('#pics')
var pitem = $('<li/>').html('$myArray[element]')
var plink = $('<a/>')
pitem.append(plink)
pics.append(pitem)
pics.listview('refresh')
</script>";
I think the main problem is in type='javascript', it should be type='text/javascript'.
Another thing to consider is that the content of the $myArray[element] needs to be printed as a javascript string. Running $myArray = array_map('json_encode', $myArray); should do the trick.
You forgot to put your JS-code into document.ready event callback
$(document).ready(function($){
var pics = $('#pics')
var pitem = $('<li/>').html($myArray[element])
var plink = $('<a/>')
pitem.append(plink)
pics.append(pitem)
pics.listview('refresh')
});

Showing text from resources.resx in JavaScript

This is example code in ASP.NET MVC 3 Razor:
#section header
{
<script type="text/javascript">
$(function() {
alert('#Resources.ExampleCompany');
});
</script>
}
<div>
<h1>#Resources.ExampleCompany</h1>
</div>
The code above this is just an example, but it also shows my problem with encoding. This variable #Resources.ExampleCompany is a file resources.resx with value ExampleCompany = "Twoja firma / Twój biznes"
In JavaScript, the alert shows the "Twoja firma / Twój biznes".
Why is character 'ó' '&#243'? What am I doing wrong?
In HTML tag, <h1>#Resources.ExampleCompany</h1> is displayed correctly.
UPDATE:
Mark Schultheiss wrote a good hint and my "ugly solution" is:
var companySample = "#Resources.ExampleCompany";
$('#temp').append(companySample);
alert($('#temp').text());
Now the character is ó and looks good, but this is still not answer to my issue.
According to HTML Encoding Strings - ASP.NET Web Forms VS Razor View Engine, the # syntax automatically HTML encodes and the solution is to use the Raw extension-method (e.g., #Html.Raw(Resources.ExampleCompany)) to decode the HTML. Try that and let us know if that works.
Some of this depends upon WHAT you do with the text.
For example, using the tags:
<div id='result'>empty</div>
<div id='other'>other</div>
And code (since you are using jQuery):
var whatitis="Twoja firma / Twój biznes";
var whatitisnow = unescape(whatitis);
alert(whatitis);
alert(whatitisnow);
$('#result').append(whatitis+" changed to:"+whatitisnow);
$('#other').text(whatitis+" changed to:"+whatitisnow);
In the browser, the "result" tag shows both correctly (as you desire) whereas the "other" shows it with the escaped character. And BOTH alerts show it with the escaped character.
See here for example: http://jsfiddle.net/MarkSchultheiss/uJtw3/.
I use following trick:
<script type="text/javascript">
$('<div/>').html("#Resources.ExampleCompany").text();
</script>
Maybe it will help.
UPDATE
I have tested this behavior of Razor more thoroughly and I've found that:
1.When the text is put as normal content of html then #Html.Raw method simply helps and writes char 'ó' without html encoding (not as: ó)
example:
<div> #Html.Raw("ó") </div>
example:
<script type="text/javascript">
var a = $('<div/>').html('#("ó")').text();// or var a = '#Html.Raw("ó")';
console.log(a); // it shows: ó
</script>
2.But if it is put inside html tags as attribute then Razor converts it to: ó and #Html.Raw doesn't help at all
example:
<meta name="description" content="#("ó")" />
Yo can fix it by putting the entire tag to Resource (as in that post) or to string (as in my example)
#("<meta name="description" content="ó" />")
So, sometimes somebody could have been little confused that the answers helps the others but not him.
I had similar issue, but in my case I was assigning a value from Resource to javascript variable. There was the same problem with letter ó encoding. Afterwards this variable was binded to a html object (precisely speaking by knockout binding). In my situation below code give a trick:
var label = '#Html.Raw(Resource.ResourceName)';

Categories

Resources