Python: Setting JavaScript Variables With Beautifulsoup? - javascript

Most of the pertinent questions I see answered online revolve around accessing JS variables from parsed pages. I want to go the other way as I have an HTML page which is being printed in python by beautifulsoup. Pretty easy stuff except that the page contains a bunch of dynamic JavaScript (GUI stuff) which I am not sure how to pre-populate on render without things getting messy.
To clarify: The page is being rendered by a python script running on a server. The python script retrieves a bunch of market data which it slices/dices and then is supposed to populate various JS variables in my HTML page. Again the static HTML is fairly straightforward and I've done that before. But populating the JS variables and arrays could get tricky. FYI - this is a single page and thus it's not worth setting up Flask or Django.
Finally, I wonder if it may just be easier to skip beautifulsoup and simply parse a static HTML file and pre-populate placeholder strings.
Thanks for any pointers, insights, or even better: examples ;-)

If I understood it correctly you could try to create an script tag at the end of the html file with the help of beautifulsoup. In this script tag you could just set the variables like this.
your_soup is the soup element of your page
from bs4 import BeautifulSoup
variable = "Example"
temp_soup = BeautifulSoup('<script>var yourvariable = ' + variable + '</script>')
script_tag = temp_soup.html.body.script
your_soup.body.insert(len(your_soup.body.contents), script_tag)
I hope it works.

Related

Preferred way to dynamically change variable in Javascript using Python/Flask framework

I am relatively new to HTML/Javascript, and I have a need which I am sure is common, but I don't know the preferred/standard way to handle it.
Basically, I have a web page with Javascript/jQuery code to use AJAX to dynamically change values on a page. I prefer not to encode the AJAX URL statically in the Javascript, but rather be able to pass the URL using Jinja from the Flask application.
So, for example:
$("#inputtext").autocomplete({
source: function (request, response) {
$.getJSON("{{ url_for('main.auto') }}?q=" + request.term,
function(data) { console.log('doing something in the function') }}
The key question is the Jinja template in the middle {{ url_for('main.auto') }}. This correctly renders the route for 'main.auto' if the Javascript code is embedded in the HTML file. However, I prefer to separate the JS from the HTML. So, I took the above code and put it in a separate .js file and imported it into the HTML like this:
<script type="text/javascript" src="{{ url_for('static', filename='js/memberSearch.js') }}"></script>
The above code is in the memberSearch.js file in this example.
When I import the JS code like this, Jinja doesn't render the {{url_for('main.auto')}}. I suppose that this is because Jinja goes through the HTML code first before the JS code is imported.
I found an answer by #martijn-pieters here that lists four methods of passing data from HTML to Javascript:
You can put that information in HTML tags. In the above example, your data is put in the repeated tags.
Or you could add data attributes to your HTML, which are accessible both to Javascript code and to CSS.
Or use AJAX to load data asynchronously; e.g. when you pick an option in the box, use Javascript and AJAX to call a separate
endpoint on your Flask server that serves more data as JSON or
ready-made HTML, then have the Javascript code update your webpage
based on that.
Or generate JSON data and put it directly into your HTML page. Just a some_variable_name = {{datastructure|tojson|safe}};
section is enough; then access that some_variable_name from your
static Javascript code to do interesting things on the page. JSON is a
(almost entirely a) subset of Javascript, and the way the tojson
filter works is guaranteed to produce a Javascript-compatible data
structure for you. The browser will load it just like any other
embedded Javascript code.
Of these methods, I have successfully used the first one by embedding the Jinja template in a element like this, for example:
<input type="hidden" id="urlForPdf" value={{ url_for('main.auto') }}>
Which allows me to access the rendered URL in my JS code.
I have also tried the fourth method of generating JSON directly in JS like this:
<script type="text/javascript">var url = {{ url_for('main.auto')|tojson}}</script>
which allows me to access the variable url from the JS function.
The answer I referred to above also mentions using Data Attributes. I haven't tried this, but I believe it would work as well.
My question, then, is simply whether there is a preferred way of doing something like this? It seems that this must be a very common scenario and I assume that there is some more-or-less standard way of handling this. I'm just not sure which of the four possible solutions is preferred, or if perhaps there is another way altogether.
Thanks in advance for any light you can shed on this.

jinja2 variables in javascript

If I have written jinja2 variables in javascript, for example
var array = [{{count}}...
and it works, will it work even if I move the code to a separate js file? Is there anything else I need to know about this practice?
You can certainly create a Jinja2 template that contains Javascript with Jinja2 variables, render that into a JavaScript file, and serve it to your users. Jinja2 doesn't care what kind of file you are rendering.
An important consideration is that you are changing a static file to a dynamic file. A typical Javascript file is static but you are now making it dynamic which puts additional load on your servers.
A typical solution is to use static JavaScript but render JavaScript data into your HTML page that the JavaScript file can access.
I came across this looking for the same kind of solution, and it was pointed out to me somewhere else that the data attribute in HTML is a good solution here as well.

Include a JavaScript file as it was written in a script tag

I have some html, that had a bunch of JS code inside a script tag. So I moved it to a separate .js file.
JS code also loaded some variables from CGI, using strings in a form of <%ejGet(var)%>. But after separating the code from HTML file, the strings don't get replaced with data from the server.
Is there a way to include a JS file as if it was written inside a script tag or is there another way to do this?
<script language="javascript">
<!-- hide
var syncNvram = '<%ejGetWl(wlSyncNvram)%>';
...about 1000 lines more...
</script>
So after moving this code to a separate file, the variables don't load.
The problem is that your <% ejGetWl(wlSyncNvram) %> is being executed on the server by some templating or processing engine before it gets sent to the browser, so the browser is actually seeing the output, e.g.
var syncNvram = 'abcdefg'; // or whatever the output is
The question you are really asking is, can my server side templating/processing engine process a javascript file as opposed to an html file.
The answer is, it depends on the template/processing engine, but in general, this is a bad idea. JS files should remain static assets for lots of good reasons (breaking code, distributing via CDNs, etc.)
The better thing to do is separate them out:
<script>var syncNvram = '<%ejGetWl(wlSyncNvram)%>';</script>
<script src="myfile.js"></script>
Declare it separately.
Even better might be using ajax to get it, but that is a whole different architecture which may not suit here.
To do that you need to generate the script from the CGI program.
<script src="/cgi-bin/example.js.cgi"></script>
Of course, that will be a different CGI program so getting the variables in the right state may be problematic.
Usually you would solve the problem using a different approach: include the data in the document (either in the script element or in an element such as a <meta> element, a hidden input or a data-* attribute on something relevant and then have a static script read the data from the DOM.

How to extract hidden tags created by javascript from source page by python

I have THIST page that has some javascript in it. You can see them by clicking on show details.
So how can I extract these data from that url source?
Using re? What I tried in re is:
import urllib
import re
gdoc = urllib.urlopen('ThatURL').read()
scriptlis = re.findall('(?si)<script>(.*?)</script>', gdoc)
print scriptlis
But no response...
Using selenium?
In this is case how?
import lxml
out=lxml.html.tostring(lxml.html.parse('ThatURL'))
.
.
.
?
When pages use scripting to generate content, it becomes hard to scrape. Instead of plain html reading, you need a full virtual environment capable of executing the script on the document.
For python, there's ghost.py. It's pretty flexible, and will allow you to inspect the fully rendered website, as well as to execute your own javascript to interact with the page.
ghost.py is a python clone of phantom.js, a node library. This second tool is superior, in my opinion, but it's not written for python.
you can try this
re.findall('<script.*>.*</script>',url_file)

Passing Python Data to JavaScript via Django

I'm using Django and Apache to serve webpages. My JavaScript code currently includes a data object with values to be displayed in various HTML widgets based on the user's selection from a menu of choices. I want to derive these data from a Python dictionary. I think I know how to embed the JavaScript code in the HTML, but how do I embed the data object in that script (on the fly) so the script's functions can use it?
Put another way, I want to create a JavaScript object or array from a Python dictionary, then insert that object into the JavaScript code, and then insert that JavaScript code into the HTML.
I suppose this structure (e.g., data embedded in variables in the JavaScript code) is suboptimal, but as a newbie I don't know the alternatives. I've seen write-ups of Django serialization functions, but these don't help me until I can get the data into my JavaScript code in the first place.
I'm not (yet) using a JavaScript library like jQuery.
n.b. see 2018 update at the bottom
I recommend against putting much JavaScript in your Django templates - it tends to be hard to write and debug, particularly as your project expands. Instead, try writing all of your JavaScript in a separate script file which your template loads and simply including just a JSON data object in the template. This allows you to do things like run your entire JavaScript app through something like JSLint, minify it, etc. and you can test it with a static HTML file without any dependencies on your Django app. Using a library like simplejson also saves you the time spent writing tedious serialization code.
If you aren't assuming that you're building an AJAX app this might simply be done like this:
In the view:
from django.utils import simplejson
def view(request, …):
js_data = simplejson.dumps(my_dict)
…
render_template_to_response("my_template.html", {"my_data": js_data, …})
In the template:
<script type="text/javascript">
data_from_django = {{ my_data }};
widget.init(data_from_django);
</script>
Note that the type of data matters: if my_data is a simple number or a string from a controlled source which doesn't contain HTML, such as a formatted date, no special handling is required. If it's possible to have untrusted data provided by a user you will need to sanitize it using something like the escape or escapejs filters and ensure that your JavaScript handles the data safely to avoid cross-site scripting attacks.
As far as dates go, you might also want to think about how you pass dates around. I've almost always found it easiest to pass them as Unix timestamps:
In Django:
time_t = time.mktime(my_date.timetuple())
In JavaScript, assuming you've done something like time_t = {{ time_t }} with the results of the snippet above:
my_date = new Date();
my_date.setTime(time_t*1000);
Finally, pay attention to UTC - you'll want to have the Python and Django date functions exchange data in UTC to avoid embarrassing shifts from the user's local time.
EDIT : Note that the setTime in javascript is in millisecond whereas the output of time.mktime is seconds. That's why we need to multiply by 1000
2018 Update: I still like JSON for complex values but in the intervening decade the HTML5 data API has attained near universal browser support and it's very convenient for passing simple (non-list/dict) values around, especially if you might want to have CSS rules apply based on those values and you don't care about unsupported versions of Internet Explorer.
<div id="my-widget" data-view-mode="tabular">…</div>
let myWidget = document.getElementById("my-widget");
console.log(myWidget.dataset.viewMode); // Prints tabular
somethingElse.addEventListener('click', evt => {
myWidget.dataset.viewMode = "list";
});
This is a neat way to expose data to CSS if you want to set the initial view state in your Django template and have it automatically update when JavaScript updates the data- attribute. I use this for things like hiding a progress widget until the user selects something to process or to conditionally show/hide errors based on fetch outcomes or even something like displaying an active record count using CSS like #some-element::after { content: attr(data-active-transfers); }.
For anyone who might be having a problems with this, be sure you are rendering your json object under safe mode in the template. You can manually set this like this
<script type="text/javascript">
data_from_django = {{ my_data|safe }};
widget.init(data_from_django);
</script>
As of mid-2018 the simplest approach is to use Python's JSON module, simplejson is now deprecated. Beware, that as #wilblack mentions you need to prevent Django's autoescaping either using safe filter or autoescape tag with an off option. In both cases in the view you add the contents of the dictionary to the context
viewset.py
import json
def get_context_data(self, **kwargs):
context['my_dictionary'] = json.dumps(self.object.mydict)
and then in the template you add as #wilblack suggested:
template.html
<script>
my_data = {{ my_dictionary|safe }};
</script>
Security warning:
json.dumps does not escape forward slashes: an attack is {'</script><script>alert(123);</script>': ''}. Same issue as in other answers. Added another answer hopefully fixing it.
You can include <script> tags inside your .html templates, and then build your data structures however is convenient for you. The template language isn't only for HTML, it can also do Javascript object literals.
And Paul is right: it might be best to use a json module to create a JSON string, then insert that string into the template. That will handle the quoting issues best, and deal with deep structures with ease.
It is suboptimal. Have you considered passing your data as JSON using django's built in serializer for that?
See the related response to this question. One option is to use jsonpickle to serialize between Python objects and JSON/Javascript objects. It wraps simplejson and handles things that are typically not accepted by simplejson.
Putting Java Script embedded into Django template is rather always bad idea.
Rather, because there are some exceptions from this rule.
Everything depends on the your Java Script code site and functionality.
It is better to have seperately static files, like JS, but the problem is that every seperate file needs another connect/GET/request/response mechanism. Sometimes for small one, two liners code os JS to put this into template, bun then use django templatetags mechanism - you can use is in other templates ;)
About objects - the same. If your site has AJAX construction/web2.0 like favour - you can achieve very good effect putting some count/math operation onto client side. If objects are small - embedded into template, if large - response them in another connection to avoid hangind page for user.
Fixing the security hole in the answers by #willblack and #Daniel_Kislyuk.
If the data is untrusted, you cannot just do
viewset.py
def get_context_data(self, **kwargs):
context['my_dictionary'] = json.dumps(self.object.mydict)
template.html
<script>
my_data = {{ my_dictionary|safe }};
</script>
because the data could be something like
{"</script><script>alert(123);</script>":""}
and forward slashes aren't escaped by default. Clearly the escaping by json.dumps may not 100% match the escaping in Javascript, which is where the problems come from.
Fixed solution
As far as I can tell, the following fixes the problem:
<script>
my_data = JSON.parse("{{ my_dictionary|escapejs }}");
</script>
If there are still issues, please post in the comments.

Categories

Resources