Parsing CDATA from Javascript - javascript

This is my first post and I'm sorry if I'm doing it wrong but here we go:
I've been working on a project that should scrape values from a website. The values are variables in a javascript array. I'm using the PHP Simple HTML DOM and it works with the normal scripts but not the one stored in CDATA-blocks. Therefore, I'm looking for a way to scrape data within the CDATA-block. Unfortunately, all the help I could find was for XML-files and I'm scraping from a HTML file.
The javascript I'm trying to scrape is a follows:
<script type="text/javascript">
//<![CDATA[
var data = [{"value":8.41,"color":"1C5A0D","text":"17/11"},{"value":9.86,"color":"1C5A0D","text":"18/11"},{"value":7.72,"color":"1C5A0D","text":"19/11"},{"value":9.42,"color":"1C5A0D","text":"20/11"}];
//]]>
</script>
What I need to scrape is the "value"-variable in the var data.
The problem was that I tried to replace the CDATA string on an object.
The following code works perfectly :-)
include('simple_html_dom.php');
$lines = file_get_contents('http://www.virtualmanager.com/players/7793477-danijel-pavliuk/training');
$lines = str_replace("//<![CDATA[","",$lines);
$lines = str_replace("//]]>","",$lines);
$html = str_get_html($lines);
foreach($html->find('script') as $element) {
echo $element->innertext;
}
I will provide you with more information if needed.

A decent HTML parser shouldn't require Javascript to be wrapped in a CDATA block. If they're throwing it off, just remove them from the HTML before parsing, doing something like this:
Download the HTML file into a string, using file_get_contents() or cURL if your host disabled HTTP support in that function.
Get rid of the //<![CDATA[ and //]]> bits using str_replace()
Parse the HTML from the cleaned string using Simple DOM's str_get_html()
Process the DOM object as before.

Related

I want to display a php code shortcode in JavaScript, how do I do that?

This is my code
enter image description here
<script>
let myDiv = document.createElement("div");
myDiv.classList.add('test');
let my_var = `<?php echo do_shortcode("[elementor-template id="5078"]"); ?>`;
myDiv.innerHTML = my_var;
document.querySelector("#instagram").appendChild(myDiv);
</script>
Actually, it is not possible to do it this way because the script is being fired on the client-side while the PHP interpreter is working on the server-side.
If you are sure that you need to use JS to render some PHP code, I'd suggest sending a request using wp_ajax https://codex.wordpress.org/AJAX_in_Plugins to send a request on a server, and then the server can return any PHP code result that you want. Don't forget that shortcodes may use additional assets that will not be sent as a response.
The best way is to use shortcodes in a native form inside the content or PHP templates. Especially the elementor template where you can create template parts for different places of the website easily.
It's an enclosure problem:
You used nested double quotes in your php – resulting in breaking the echo.
Provided, your <script> tag is in your template php (so your php shortcode could be parsed) try this (replace the shortcode double quotes by single quotes):
<script>
let myDiv = document.createElement("div");
myDiv.classList.add('test');
let my_var = `<?php echo do_shortcode("[elementor-template id='5078']"); ?>`;
myDiv.innerHTML = my_var;
document.querySelector("#instagram").appendChild(myDiv);
</script>

PHP json to a JavaScript variable inside an HTML file

I write the following script that creates a nice JSON of all the images under the current folder:
<?php
header('Content-type: application/json');
$output = new stdClass();
$pattern="/^.*\.(jpg|jpeg|png|gif)$/i"; //valid image extensions
$dirs = array_filter(glob('*'), 'is_dir');
foreach ($dirs as $dirname) {
$files = glob(''.$dirname.'/*');
$images = preg_grep($pattern, $files);
$output->{$dirname} = $images;
}
echo json_encode($output, JSON_PRETTY_PRINT);
?>
I have an HTML file with a basic page and I want to display the JSON's data in a formatted way after some javascript manipulation.
So the question is how can I get the PHP data into a javascript variable?
<html>
...
<body>
<script src="images.php"></script>
<script type="text/javascript">
// Desired: Get access to JSON $output
</script>
...
<div>
<img ... >
</div>
</body>
</html>
I tried to put both https://stackoverflow.com/a/61212271/1692261
and https://stackoverflow.com/a/50801851/1692261 inside that script tag but none of them work so I am guessing I am missing something fundamental here (my ever first experience with PHP :)
you should focus on what needs to be done, but currently you are trying to implement your own idea. maybe you should change your approach and do what you want in another way?
passing php variable to js is possible. but for what reason do you need this json? if you want to operate with it to generate html (f.e show images to user) you can do it on pure php without js. if you need exactly json you can generate json file with php and and get this file via additional js request. but the simplest way is
// below php code that generates json with images
$images = json_encode($output, JSON_PRETTY_PRINT);
...
// php code but in html template
<script type="text/javascript">
var images = "<?= $images ?>";
</script>
I won't guarantee that this js line is going to work but you get the idea)
P.S you dont need to use stdClass for such purposes. we do it via arrays (in you case it will be associative arrays), arrays are very powerful in php. json_encode() will generate same json from both array or object. but if this part of code works fine that let it stay as it is
I took #Zeusarm advice and just used ajax (and jquery) instead.
For others need a reference:
Nothing to change in PHP script in the original post.
Add jquery to the HTML file with <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
Make a GET request like this:
<script type="text/javascript">
var images = ''
$.get('images.php',function (jsondata) {
images = jsondata
});
I am sure this is not the cleanest code but it works :)

How to access JSON file in JavaScript?

I have a JSON file as data.json and I have an HTML file with JavaScript embedded in it. I want to access data from the JSON file in a simple HTML(file:///C:/Users/XYZ/Desktop/htmlpage.html) file and NOT in a server-client manner(http://....). I have tried following simple code to import JSON file.
<!DOCTYPE html>
<html>
<body>
<p>Access an array value of a JSON object.</p>
<p id="demo"></p>
<script type="text/javascript" src="F:/Folder/data.json">
var myObj, x;
x = data[0].topic;
document.getElementById("demo").innerHTML = x;
</script>
</body>
</html>
I have read this method of using
<script type="text/javascript" src="F:/Folder/data.json">
on other StackOverflow Questions. But it is not working.
Please tell me the simplest way to access the data in the JSON file.
You could try writing something like this in your JSON file in order to assign the data to a variable:
window.data = JSON.parse('insert your json string here');
You can then access window.data in your page's javascript. You can also omit window. and just assign and/or read from data, which is the same as window.data.
Perhaps a cleaner approach would be to use an AJAX request either with jQuery or vanilla Javascript, both approaches have many answers available on this site.
You could also look into a solution with jQuery.getJSON(): Loading local JSON file
If you are able to use PHP for your desired task (accessing data from JSON file and doing some stuff with data) it will be easier to use PHP to open JSON files. You can use following code to access JSON files.
<?php
$str = file_get_contents('data.json');
$json = json_decode($str, true);
?>
Here $json will be the outermost object (if file starts with '{') / array (if file starts with '['). Then you can use it in a regular way.
Maybe some of you can think that why I'm posting PHP solution in Javascript question? But I found this very much easier than opening file in Javascript. So if you are allowed to use PHP go with that.

Download javascript output using file_get_content

Is it possible to download the resulting HTML code after the JavaScript code on the page has been run using PHP.
For example, when the page has this jQuery code $("p").html("Hello world"); and I use file_get_content('website.com') I don't get the string "Hello world" because the JavaScript runs after the page load.
use cURL:
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Then do :
<?php echo get_data('http://theURLhere.com'); ?>
Hope that helped
Please refer to similar questions and share your research you have done so far:
Get the content (text) of an URL after Javascript has run with PHP
Get Document source After AJAX or js Action With PHP
How to get content of a javascript/ajax -loaded div on a site?
Get HTML code after javascript execution using CURL PHP
One way to achieve this would be to use Selenium, and write a custom script to gather the output from it... But I'm sure that falls far beyond the scope of what you're attempting to do.
The way I would go would be to invert the responsibility. Have the JS send the output to a PHP endpoint, and use that output however you see fit.
Here's an example.
Javascript
<script>
var outputElement = 'html';
var HTML = $(outputElement).html();
var endpoint = 'myEndpoint.php';
$.post(endpoint, { html: HTML }, function(data) {
alert('Output sent');
});
</script>
One caveat here is that you will not get the DOCTYPE declaration, or any attributes on your HTML tag, if this isn't acceptable, you may reconstruct them in the PHP file below.
PHP
<?php
$html = $_POST['html']; // Be VERY CAREFUL with what you do with this...
// If you need to have the doctype and html tag... Use your own doctype.
// $html = sprintf('<DOCTYPE html><html class="my-class">%s</html>', $html);
// Do something with the HTML.
You have to be very careful when sending HTML over POST. If you're using this HTML to output on your website, it can easily be spoofed to reveal sensitive data on your website.
Reference
jQuery.post()

ruby nokogiri restclient to scrape javascript variable

I'm using restclient and nokogiri to parse some html which works great, but there is one piece of information stored in a js (jquery) variable which I need to return and I'm not sure how to parse it. I can use Nokogiri to parse the javascript block, but I need one subset of it which is probably simple but I'm not sure how to do it. I could probably regex it but I'm assuming there's an easier way to just ask for it using JS.
#resource = RestClient.get 'http://example.com'
doc = Nokogiri::HTML(#resource)
doc.css('script').each do |script|
puts script.content
end
What I'm trying to get:
<script type="text/javascript">
$(function(){
//this is it
$.Somenamespace.theCurrency = 'EUR';
//a lot more stuff
not sure if that fits, but you could retrieve it as follows:
irb(main):017:0>
string
=> "<script type=\"text/javascript\"> $(function(){$.Somenamespace.theCurrency = \"EUR\"}); "
irb(main):018:0>
string.scan(/\$\.Somenamespace\.(.*)}\);/)
=> [["theCurrency = \"EUR\""]]
Nokogiri is an XML and HTML parser. It doesn't parse the CDATA or text content of nodes, but it can give you the content, letting you use string parsing or regex to get at the data you want.
In the case of Javascript, if it's embedded in the page then you can get the text of the parent node. Often that is simple:
js = doc.at('script').text
if there is the usual <script> tag in the <head> block of the page. If there are multiple script tags you have to extend the accessor to retrieve the right node, then process away.
It gets more exciting when the scripts are loaded dynamically, but you can still get the data by parsing the URL from the script's src parameter, then retrieving it, and processing away again.
Sometimes Javascript is embedded in the links of other tags, but it's just another spin on the previous two methods to get the script and process it.

Categories

Resources