ruby nokogiri restclient to scrape javascript variable - javascript

I'm using restclient and nokogiri to parse some html which works great, but there is one piece of information stored in a js (jquery) variable which I need to return and I'm not sure how to parse it. I can use Nokogiri to parse the javascript block, but I need one subset of it which is probably simple but I'm not sure how to do it. I could probably regex it but I'm assuming there's an easier way to just ask for it using JS.
#resource = RestClient.get 'http://example.com'
doc = Nokogiri::HTML(#resource)
doc.css('script').each do |script|
puts script.content
end
What I'm trying to get:
<script type="text/javascript">
$(function(){
//this is it
$.Somenamespace.theCurrency = 'EUR';
//a lot more stuff

not sure if that fits, but you could retrieve it as follows:
irb(main):017:0>
string
=> "<script type=\"text/javascript\"> $(function(){$.Somenamespace.theCurrency = \"EUR\"}); "
irb(main):018:0>
string.scan(/\$\.Somenamespace\.(.*)}\);/)
=> [["theCurrency = \"EUR\""]]

Nokogiri is an XML and HTML parser. It doesn't parse the CDATA or text content of nodes, but it can give you the content, letting you use string parsing or regex to get at the data you want.
In the case of Javascript, if it's embedded in the page then you can get the text of the parent node. Often that is simple:
js = doc.at('script').text
if there is the usual <script> tag in the <head> block of the page. If there are multiple script tags you have to extend the accessor to retrieve the right node, then process away.
It gets more exciting when the scripts are loaded dynamically, but you can still get the data by parsing the URL from the script's src parameter, then retrieving it, and processing away again.
Sometimes Javascript is embedded in the links of other tags, but it's just another spin on the previous two methods to get the script and process it.

Related

How do I get a JS script to parse raw JSON from a file?

I have a really simple JSON data file with no kind of wrapping, e.g.:
[{"name:","fakename"}, {"lang:", "javascript"}]
I'm trying to use this data in a js script in the same directory. I'm reading in both files in the document <head>, like such:
<script id="myJSON" src="data.json"></script>
<script src="myScript.js"></script>
and then the part I'm stuck on is how to get myScript to SEE that JSON data. In the script I can do:
d = document.getElementById("myJSON");
console.log(d);
The script returns the DOM element "myJSON" but I don't know how to access its JSON contents. I figured it would be a property like 'text' or 'value' or 'innerHTML' but I don't see it anywhere. I've tried various combinations of the type attribute in the tag but none of that makes any difference.
I know I could use an API like fetch but that's more complexity than I want. My best idea so far is to edit the json file to put an identifier at the beginning of the json file, like var myJSON = '[{"json"}]'; so then myScript would have a clear handle to JSON.parse(myJSON). But that's an extra step in an automated process which I don't want (or think I need).
How do I get my JS to see this "anonymous" JSON?
If you don't want to use fetch, you could use the server side code (PHP, ASP, etc) to write the text of the JSON into a hidden div in the page, and then you could use getElementByID to get the text, followed by parse to get an object that represents the JSON data.
This was not that much extra code but it was kind of a pain to wrap my head around. I had to wrap everything in a .then clause of fetch, or of an async function which called fetch. Here's what my code ended up looking like.
readJSONFile().then(jq => {
for (i = 0; i < jq.length; ++i) {
console.log( jq[i].myJSONKey )
});
async function readJSONFile() {
const response = await fetch('data.json');
const jq = await response.json();
return jq;
}

Transferring javascript from a view to a seperate JS file

I am working on a legacy application and I want to move some JS code onto a separate JS file.
I will have to refractor some of the code to do this. I can put #Url.Content statements into data attributes in the HTML.
But how would I replace this line of code?
var array = #Html.Raw(Json.Encode(ViewBag.JobList));
A separate JS file will not know what #Html.Raw means.
Server side code like that cannot run in a seperate javascript file. My solution for such problems is having a short javascript part in the head that runs on the onload event. There you can set variables that you can use in a seperate javascript file:
in the head:
array = #Html.Raw(Json.Encode(ViewBag.JobList));
in the seperate javascript file:
var array;
Then, in the seperate javascript file you can do with your array whatever is necessary.
The ViewBag.JobList data is only known at HTML page generation time. To include it in an external JavaScript file, you have to have another ASP.NET resource that recalculated ViewBag.JobList and then served as part of a dynamic JavaScript file. This is pretty inefficient.
Instead, do what you're doing with the URLs: pass the data through the DOM. If you're writing into normal DOM instead of a script block, you don't need the raw-output any more (*), normal HTML escaping is fine:
<script
id="do_stuff_script" src="do_stuff.js"
data-array="#Json.Encode(ViewBag.JobList)"
></script>
...
var array = $('#do_stuff_script').data('array');
// jQuery hack - equivalent to JSON.parse($('#do_stuff_script').attr('data-array'));
(Actually, the raw-output might have been a security bug, depending on what JSON encoder you're using and whether it chooses to escape </script to \u003C/script. Writing to HTML, with well-understood HTML-encoding requirements, is a good idea as it avoids problems like this too.)
I think you need to create action with JavaScriptResult
public ActionResult Test()
{
string script = "var textboxvalue=$('#name').val();";
return JavaScript(script);
}
But, before proceeding please go through following links
Beware of ASP.NET MVC JavaScriptResult
Working example for JavaScriptResult in asp.net mvc
I would also follow MelanciaUK's suggestion :
In your javascript file, put your code inside a function :
function MyViewRefactored( array ){
... your code ...
}
In your view, leave a minimal javascript bloc :
<script>
var array = #Html.Raw(Json.Encode(ViewBag.JobList));
MyViewRefactored( array );
</script>

Parsing CDATA from Javascript

This is my first post and I'm sorry if I'm doing it wrong but here we go:
I've been working on a project that should scrape values from a website. The values are variables in a javascript array. I'm using the PHP Simple HTML DOM and it works with the normal scripts but not the one stored in CDATA-blocks. Therefore, I'm looking for a way to scrape data within the CDATA-block. Unfortunately, all the help I could find was for XML-files and I'm scraping from a HTML file.
The javascript I'm trying to scrape is a follows:
<script type="text/javascript">
//<![CDATA[
var data = [{"value":8.41,"color":"1C5A0D","text":"17/11"},{"value":9.86,"color":"1C5A0D","text":"18/11"},{"value":7.72,"color":"1C5A0D","text":"19/11"},{"value":9.42,"color":"1C5A0D","text":"20/11"}];
//]]>
</script>
What I need to scrape is the "value"-variable in the var data.
The problem was that I tried to replace the CDATA string on an object.
The following code works perfectly :-)
include('simple_html_dom.php');
$lines = file_get_contents('http://www.virtualmanager.com/players/7793477-danijel-pavliuk/training');
$lines = str_replace("//<![CDATA[","",$lines);
$lines = str_replace("//]]>","",$lines);
$html = str_get_html($lines);
foreach($html->find('script') as $element) {
echo $element->innertext;
}
I will provide you with more information if needed.
A decent HTML parser shouldn't require Javascript to be wrapped in a CDATA block. If they're throwing it off, just remove them from the HTML before parsing, doing something like this:
Download the HTML file into a string, using file_get_contents() or cURL if your host disabled HTTP support in that function.
Get rid of the //<![CDATA[ and //]]> bits using str_replace()
Parse the HTML from the cleaned string using Simple DOM's str_get_html()
Process the DOM object as before.

Store very small amount of data with javascript

I have one of those websites that basically gives you a yes or no response to a question posed by the url. An example being http://isnatesilverawitch.com.
My site is more of an in-joke and the answer changes frequently. What I would like to be able to do is store a short one or two word string and be able to change it without editing the source on my site if that is possible using only javascript. I don't want to set up an entire database just to hold a single string.
Is there a way to write to a file without too much trouble, or possibly a web service designed to retrieve and change a single string that I could use to power such a site? I know it's a strange question, but the people in my office will definitely get a kick out of it. I am even considering building a mobile app to manipulate the answer on the fly.
ADDITIONAL:
To be clear I just want to change the value of a single string but I can't just use a random answer. Without being specific, think of it as a site that states if the doctor is IN or OUT, but I don't want it to spit out a random answer, it needs to say IN when he is IN and OUT when he is out. I will change this value manually, but I would like to make the process simple and something I can do on a mobile device. I can't really edit source (nor do I want to) from a phone.
If I understand correctly you want a simple text file that you change a simple string value in and have it appear someplace on your site.
var string = "loading;"
$.get('filename.txt',function(result){
string = result;
// use string
})
Since you don't want to have server-side code or a database, one option is to have javascript retrieve values from a Google Spreadsheet. Tabletop (http://builtbybalance.com/Tabletop/) is one library designed to let you do this. You simply make a public Google Spreadsheet and enable "Publish to web", which gives you a public URL. Here's a simplified version of the code you'd then use on your site:
function init() {
Tabletop.init( { url: your_public_spreadshseet_url,
callback: function (data) {
console.log(data);
},
simpleSheet: true } )
}
Two ideas for you:
1) Using only JavaScript, generate the value randomly (or perhaps based on a schedule, which you can hard code ahead of time once and the script will take care of the changes over time).
2) Using Javascript and a server-side script, you can change the value on the fly.
Use JavaScript to make an AJAX request to a text file that contains the value. Shanimal's answer gives you the code to achieve that.
To change the value on the fly you'll need another server-side script that writes the value to some sort of data store (your text file in this case). I'm not sure what server-side scripting (e.g. PHP, Perl, ASP, Python) runtime you have on your web server, but I could help you out with the code for PHP where you could change the value by pointing to http://yoursite.com/changeValue.php?Probably in a browser. The PHP script would simply write Probably to the text file.
Though javascript solution is possible it is discouraged. PHP is designed to do such things like changing pieces of sites randomly. Assuming you know that, I will jump to javascript solution.
Because you want to store word variation in a text file, you will need to download this file using AJAX or store it in .js file using array or string.
Then you will want to change the words. Using AJAX will make it possible to change the words while page is loaded (so they may, but do not have to, change in front of viewers eyes).
Changing page HTML
Possible way of changing (words are in array):
wordlist.js
var status = "IN"; //Edit IN to OUT whenever you want
index.html
<script src="wordlist.js"></script>
<div>Doctor is <span id="changing">IN</span></div>
<script>
function changeWord(s) { //Change to anything
document.getElementById("changing").innerHTML = s;
}
changeWord(status); //Get the status defined in wordlist.js
</script>
Reloading from server
If you want to change answer dynamically and have the change effect visible on all open pages, you will need AJAX or you will have to make browser reload the word list, as following:
Reloading script
function reloadWords() {
var script = document.createElement("script"); //Create <script>
script.type="text/javascript";
script.src = "wordlist.js"; //Set the path
script.onload = function() {changeWord(status)}; //Change answer after loading
document.getElementsByTagName("head")[0].appendChild(script); //Append to <head> so it loads as script. Can be appended anywhere, but I like to use <head>
}
Using AJAX
Here we assume use of text file. Simplest solution I guess. With AJAX it looks much like this:
http = ActiveXObject==null?(new XMLHttpRequest()):(new ActiveXObject("Microsoft.XMLHTTP"));
http.onloadend = function() {
document.getElementById("changing").innerHTML = this.responseText; //Set the new response, "IN" or "OUT"
}
http.open("GET", "words.txt")
http.send();
Performance of AJAX call may be improved using long-poling. I will not introduce this feature more here, unless someone is interested.

node.js javascript var available in res.render()'s target

I'm trying to make a variable (eventually to be replaced by more complex json selected from the database) accessible to client-side javascript. I wanted to load it when the page is rendered instead of an ajax call and its not going to be rendered via a template like ejs (I want to pass the data to an extjs store for a combobox). So I have a standart response I render:
function (req, res) {
res.render('index.html', {foo: ['a','b']});
}
and a blank html page I want to access foo:
<!DOCTYPE html>
<html>
<head>
<script type=text/javascript>
console.log(foo);
</script>
</head>
<body>
</body>
</html>
any ideas? I've thought of maybe writing the whole html page via res.send() (which has a few more things than the example above) but that seems like such a workaround for something that should be obvious to do...
Assuming the same array foo in your question above, here are a couple ways you could do this.
This one uses an EJS filter to write an array literal:
<script type="text/javascript">
var foo = ['<%=: foo | join:"', '" %>'];
</script>
This one encodes it as JSON, to later be parsed by your client-side javascript:
<script type="text/javascript">
// note the "-" instead of "=" on the opening tag; it avoids escaping HTML entities
var fooJSON = '<%-JSON.stringify(foo)%>';
</script>
IIRC, ExtJS can handle JSON directly as its data. If not, then you could use its JSON parser first and then hand it a local variable. If you weren't using ExtJS, you could use this to parse on the client: https://github.com/douglascrockford/JSON-js
If you choose to encode it as JSON, it would make it also make it easier to later switch back to AJAX for retrieving your data. In some cases, that would have an advantage. The page could load and display some data, along with a busy icon over the element for which you're loading data.
This isn't to say there's anything inherently wrong with including all the data in the original request. It's just that sticking with JSON gives you the flexibility to choose later.
In EJS the following should work
<script type=text/javascript>
console.log( <%= foo %>);
</script>
I do recommend against dynamically generating JavaScript though as it breaks seperation of concerns and forces JavaScript to be on.
Edit:
Turns out the above doesn't work nicely for arrays. So simply encode your data in semantic HTML. Then enhance it with JavaScript. If the JavaScript must get data then store it somewhere more sensible like the cookie or retrieve it through ajax.

Categories

Resources