IPython Notebook Javascript: retrieve content from JavaScript variables - javascript

Is there a way for a function (called by an IPython Notebook cell) to retrieve the content of a JavaScript variable (for example IPython.notebook.notebook_path which contains the path of the current notebook)?
The following works well when written directly within a cell (for example, based on this question and its comments):
from IPython.display import display,Javascript
Javascript('IPython.notebook.kernel.execute("mypath = " + "\'"+IPython.notebook.notebook_path+"\'");')
But that falls apart if I try to put it in a function:
# this doesn't work
from IPython.display import display,Javascript
def getname():
my_js = """
IPython.notebook.kernel.execute("mypath = " + "\'"+IPython.notebook.notebook_path+"\'");
"""
Javascript(my_js)
return mypath
(And yes, I've tried to make global the mypath variable, both from within the my_js script and from within the function. Also note: don't be fooled by possible leftover values in variables from previous commands; to make sure, use mypath = None; del mypath to reset the variable before calling the function, or restart the kernel.)
Another way to formulate the question is: "what's the scope (time and place) of a variable set by IPython.notebook.kernel.execute()"?
I think it isn't an innocuous question, and is probably related to the mechanism that IPython uses to control its kernels and their variables and that I don't know much about. The following experiment illustrate some aspect of that mechanism. The following works when done in two separate cells, but doesn't work if the two cells are merged:
Cell [1]:
my_out = None
del my_out
my_js = """
IPython.notebook.kernel.execute("my_out = 'hello world'");
"""
Javascript(my_js)
Cell [2]:
print(my_out)
This works and produces the expected hello world. But if you merge the two cells, it doesn't work (NameError: name 'my_out' is not defined).

I think the problem is related with Javascript being asynchronus while python is not. Normally you would think that the Javascript(""" python cmd """) command is executed, and then your print statment should work properly as expected. However, the Javascript command is fired but not executed. Most pobably it is executed after the cell 1 execution is fully completed.
I tried your example with sleep function. Did not help.
The asnyc problem can esaily be seen by adding an alert statement within my_js, but before kernel.execute line. The alert should be fired even before trying a python command execution.
But at the presence of print (my_out) statement within cell 1, you will again get the same error without any alerts. If you take the print line out, you will see the alert poping out within cell 1. But the varibale my_out is set afterwards.
my_out = None
del my_out
my_js = """
**alert ("about to execute python comand");**
IPython.notebook.kernel.execute("my_out = 'hello world'");
"""
Javascript(my_js)
There are other javascript utilities within notebook like IPython.display.display_xxx which varies from displaying video to text object, but even the text object option does not work.
Funny enough, I tested this with my webgl canvas application which displays objects on the HTML5 canvas; display.display_javascript(javascript object) works fine ( which is a looong html5 document) while the two pieces of words of output does not show up?! Maybe I should embed the output into canvas application somewhere, so it s displayed on the canvas :)

I wrote a related question (Cannot get Jupyter notebook to access javascript variables) and came up with a hack that does the job. It uses the fact that the input(prompt) command in Python does block the execution loop and waits for user input. So I looked how this is processed on the Javascript side and inserted interception code there.
The interception code is:
import json
from IPython.display import display, Javascript
display(Javascript("""
const CodeCell = window.IPython.CodeCell;
CodeCell.prototype.native_handle_input_request = CodeCell.prototype.native_handle_input_request || CodeCell.prototype._handle_input_request
CodeCell.prototype._handle_input_request = function(msg) {
try {
// only apply the hack if the command is valid JSON
console.log(msg.content.prompt)
const command = JSON.parse(msg.content.prompt);
const kernel = IPython.notebook.kernel;
// return some value in the Javascript domain, depending on the 'command'.
// for now: specify a 5 second delay and return 'RESPONSE'
kernel.send_input_reply(eval(command["eval"]))
} catch(err) {
console.log('Not a command',msg,err);
this.native_handle_input_request(msg);
}
}
"""))
The interception code looks whether the input prompt is valid JSON, and in that case it executes an action depending on the command argument. In this case, it runs the commend["eval"] javascript expression and returns the result.
After running this cell, you can use:
notebook_path = input(json.dumps({"eval":"IPython.notebook.notebook_path"}))
Quite a hack, I must admit.

Okay, I found a way around the problem: call a Python function from Javascript and have it do all of what I need, rather than returning the name to "above" and work with that name there.
For context: my colleagues and I have many experimental notebooks; we experiment for a while and try various things (in a machine learning context). At the end of each variation/run, I want to save the notebook, copy it under a name that reflects the time, upload it to S3, strip it from its output and push it to git, log the filename, comments, and result scores into a DB, etc. In short, I want to automatically keep track of all of our experiments.
This is what I have so far. At the bottom of my notebooks, I put:
In [127]: import mymodule.utils.lognote as lognote
lognote.snap()
In [128]: # not to be run in the same shot as above
lognote.last
Out[128]: {'file': '/data/notebook-snapshots/2015/06/18/20150618-004408-save-note-exp.ipynb',
'time': datetime.datetime(2015, 6, 18, 0, 44, 8, 419907)}
And in a separate file, e.g. mymodule/utils/lognote.py:
# (...)
from datetime import datetime
from subprocess import call
from os.path import basename, join
from IPython.display import display, Javascript
# TODO: find out where the topdir really is instead of hardcoding it
_notebook_dir = '/data/notebook'
_snapshot_dir = '/data/notebook-snapshots'
def jss():
return """
IPython.notebook.save_notebook();
IPython.notebook.kernel.execute("import mymodule.utils.lognote as lognote");
IPython.notebook.kernel.execute("lognote._snap('" + IPython.notebook.notebook_path + "')");
"""
def js():
return Javascript(jss())
def _snap(x):
global last
snaptime = datetime.now()
src = join(_notebook_dir, x)
dstdir = join(_snapshot_dir, '{}'.format(snaptime.strftime("%Y/%m/%d")))
dstfile = join(dstdir, '{}-{}'.format(snaptime.strftime("%Y%m%d-%H%M%S"), basename(x)))
call(["mkdir", "-p", dstdir])
call(["cp", src, dstfile])
last = {
'time': snaptime,
'file': dstfile
}
def snap():
display(js())

To add to the other great answers, there is a nuance of the browsers attempting to run the jupyter nb javascript magic on nb load.
To demonstrate: create and run the following cell:
%%javascript
IPython.notebook.kernel.execute('1')
Now save the notebook, close it and then re-open it. When you do that, under that cell suddenly you will see an error in red:
Javascript error adding output!
TypeError: Cannot read property 'execute' of null
See your browser Javascript console for more details.
That means the browser has parsed some js code and it tried to run it. This is the error in chrome, it will probably different in a different browser.
I have no idea why this jupyter javascript magic cell is being run on load and why jupyter notebook is not properly escaping things, but the browser sees some js code and so it runs it and it fails, because the notebook kernel doesn't yet exist!
So you must add a check that the object exists:
%%javascript
if (IPython.notebook.kernel) {
IPython.notebook.kernel.execute('1')
}
and now there is no problem on load.
In my case, I needed to save the notebook and run an external script on it, so I ended up using this code:
from IPython.display import display, Javascript
def nb_auto_export():
display(Javascript("if (IPython.notebook) { IPython.notebook.save_notebook() }; if (IPython.notebook.kernel) { IPython.notebook.kernel.execute('!./notebook2script.py ' + IPython.notebook.notebook_name )}"))
and in the last cell of the notebook:
nb_auto_export()

Related

Get variable inside DOM before DOM changes when clicking redirecting button

I have been trying for so long to find a way to persist variables between page refreshes and different pages in one browser session opened from selenium python.
Unfortunately, neither storing variable in localStorage, sessionStorage or window.name doesn't work after testing so many times and research.
So I have resorted to a python script which continuously repeats driver.execute_script('return variable') and continue to gather data while surfing.
Data that needs to be collected, is a value of element that gets clicked, which is catched by eventListener for click and inserted to local variable I have added to the page.
This all works fine, except for the time where the element that gets clicked, is the actual button that contains a link that redirects page and changes the DOM.
My best guess is that at the same moment, the click, my JavaScript script that stores the variable, my JavaScript script that retrieves the variable, and the page redirect, all almost happen at the same time, suspecting that the change of the DOM happens before the retrieving of the variable, thus canceling any of my efforts to get that data.
This is the code:
from selenium.common import TimeoutException, WebDriverException
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
class Main:
def __init__(self, page_url):
self.__driver = webdriver.Chrome()
self.__element_list = []
self.__page_url = page_url
def start(self):
program_return = []
self.__driver.get(self.__page_url)
event_js = '''
var array_events = []
var registerOuterHtml = (e) => {
array_events.push(e.target.outerHTML)
window.array_events = array_events
}
var registerUrl = (e) => {
array_events.push(document.documentElement.outerHTML)
}
getElementHtml = document.addEventListener("click", registerOuterHtml, true)
getDOMHtml = document.addEventListener("click", registerUrl, true)
'''
return_js = '''return window.array_events'''
self.__driver.set_script_timeout(10000)
self.__driver.execute_script(event_js)
try:
for _ in range(800):
if array_events := self.__driver.execute_script(return_js):
if array_events[-2:] not in program_return:
program_return.append(array_events[-2:])
else:
try:
WebDriverWait(self.__driver, 0.1).until(
lambda driver: self.__driver.current_url != self.__page_url)
except TimeoutException:
pass
else:
self.__page_url = self.__driver.current_url
self.__driver.execute_script(event_js)
except WebDriverException:
pass
finally:
print(len(program_return)) # should print total number of clicks made.
To test it out, call it like this:
Main('any url you wish').start()
And after clicking, and should at least click a button which changes the page, you can close the window manually and check the results.
Please indent the functions of the class a tab to the right, I can't format it here for the sake of my life!
Any idea or ideally a solution to this problem would be greatly appreciated.
Overall question---Taking for granted that variable persistence between different pages is not possible, How can I get the value of that variable that gets set on the time of click, before the page changes, from the same click action? (Maybe delay whole page...??)
Theoretically you can get some global data before a navigation like:
data = driver.execute_async_script("""
let [resolve] = arguments
window.unload = () => resolve(window.some_global_data)
""")
but it's likely to timeout ... Puppeteer / Playwright are better suited to things like this. There are python ports of them you might try.

Scraping dynamic content from website in near-realtime

I’m trying to implement a web scraper scraping dynamically updated content from a website in near-realtime.
Let’s take https://www.timeanddate.com/worldclock/ as an example and assume I want to continuously get the current time at my home location.
My solution right now is as follows: Get the rendered page content every second and extract the time using bs4. Working Code:
import asyncio
import bs4
import pyppeteer
def get_current_time(content):
soup = bs4.BeautifulSoup(content, features="lxml")
clock = soup.find(class_="my-city__digitalClock")
hour_minutes = clock.contents[3].next_element
seconds = clock.contents[5].next_element
return hour_minutes + ":" + seconds
async def main():
browser = await pyppeteer.launch()
page = await browser.newPage()
await page.goto("https://www.timeanddate.com/worldclock/")
for _ in range(30):
content = await page.content()
print(get_current_time(content))
await asyncio.sleep(1)
await browser.close()
asyncio.run(main())
What I would like to do instead is: React only when the time is updated on the page. Reasons: Faster reaction and less computationally intensive (especially when monitoring multiple pages that may update in irregular intervals smaller or much larger than a second).
I got / tried the following three ideas how to solve this, but I don’t know how to do continue. There might also a much simpler / more elegant approach:
1) Intercepting network responses using pyppeteer
This does not seem to work, since there is no more network activity after initially loading the page (except from advertising), as I can see in the Network tab in Chrome Dev Tools.
2) Reacting to custom events on the page
Using the “Event Listener Breakpoints” in the “Sources” tab in Chrome Dev Tools, I can stop the JavaScript code execution on various events (e.g. the “Set innerHTML” event).
Is it possible to do something like this using pyppeteer, provide some context information about the event (e.g. which element is updated with which new text)?
It seems to be possible using JavaScript and puppeteer (see https://github.com/puppeteer/puppeteer/blob/main/examples/custom-event.js), but I think pyppeteer does not provide this functionality (I could not find it in the API Reference).
3) Overriding a function in the JavaScript code of the page
Override a relevant function and intercept the relevant data (which are provided to that function as a parameter).
This idea is inspired by this blogpost: https://antoinevastel.com/javascript/2019/06/10/monitor-js-execution.html
Entire code for the blogpost: https://github.com/antoinevastel/blog-post-monitor-js/blob/master/monitorExecution.js
I tried around a bit, but my JavaScript seems too limited to even just override a function in one of the javascripts used by the page.
You could achieve this with Selenium. I am using the Chrome webdriver via webdriver-manager but you can modify this to use whatever you prefer.
First, all of our imports
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
Create our driver object with the headless parameter so that the browser window doesn't open.
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
Define a function that accepts a WebElement to extract the clock time.
def getTimeString(myClock: WebElement) -> str:
hourMinute = myClock.find_element(By.XPATH, "span[position()=2]").text
seconds = myClock.find_element(By.CLASS_NAME, "my-city__seconds").text
return f"{hourMinute}:{seconds}"
Get the page and extract the clock WebElement
driver.get("https://www.timeanddate.com/worldclock/")
myClock = driver.find_element(By.CLASS_NAME, "my-city__digitalClock")
Finally, implement our loop
last = None
while True:
now = getTimeString(myClock)
if now == last:
continue
print(now)
last = now
Before your logic concludes, be sure to run driver.quit() to clean up.
Output
05:27:56
05:27:57
05:27:58

Visual Studio Lightswitch HTML Client validation fails

I have this piece of JavaScript code that's supposed to force a string of text to upper-case characters, but it won't work. I know it hits a breakpoint when I set it, but the code doesn't seem to do what it's supposed to.
I'm new to JavaScript. What am I missing here?
myapp.AddEditVehicle.beforeApplyChanges = function (screen) {
// force string to uppercase
screen.Vehicle.RegNum.toUpperCase();
};
If you'd like to tackle this in JavaScript on the client side, you need to use the following code:
myapp.AddEditVehicle.beforeApplyChanges = function (screen) {
// Write code here.
screen.Vehicle.RegNum = screen.Vehicle.RegNum.toUpperCase();
};
Alternatively, if you'd like to do this in c# on the server side, you can add the following RegNum_Validate code by selecting the Write Code option on the designer screen for your Vehicle.lsml entity:
partial void RegNum_Validate(EntityValidationResultsBuilder results)
{
// results.AddPropertyError("<Error-Message>");
if (this.Details.Properties.RegNum.IsChanged)
{
this.RegNum = this.RegNum.ToUpper();
}
}
Please bear in mind that the Write Code option for the RegNum_Validate general method will only be available if you have the Server project perspective selected at the bottom of the entity designer.

Issue with variable that persists across multiple HTML files with one JS file

This has been giving me a lot of trouble and so far every stackoverflow question/answer I've found and every other bit of googling hasn't helped me all that much.
I have two HTML files. One is called cs, and the other is called csi. They both are linked to a common js file where I'm trying to implement the variable that will persist across both of them.
The variable is defined in the cs html by user input, and is then brought up in csi.
Here's what the Javascript looks like. I have it to run onlyCS on the cs html on body load, and onlyCSI on the csi html on body load.
The colors persist, and the variable MyApp.str is established in cs, but when it loads to csi, MyApp.str becomes "undefined"
I figured I would've avoided this because I established MyApp.str = strChar, which is itself established as a bit of user input that's only available in cs.
var MyApp = {}; // Globally scoped object
function onlyCS(){
MyApp.color = 'green';
setInterval(strdefine, 3000)
}
function onlyCSI(){
MyApp.color = 'red';
setInterval(bar, 3000)
}
function strdefine(){
alert(MyApp.color); // Alerts 'green'
strChar = parseInt($('#Xdemo').text(), 10);
$('#result').text(strChar);
MyApp.str = strChar;
alert('the myapp global obj (str) is currently ' + MyApp.str);
}
function bar(){
alert(MyApp.color); // Should alert 'red'
alert('the myapp global obj (str) is currently ' + MyApp.str);
}
If anyone could help me out, I'd really appreciate it.
EDIT: The comments help me figure out that using localstorage and variables is a good solution for my problem.
within strdefine I put
strChar = parseInt($('#Xdemo').text(), 10);
localStorage.setItem('str', strChar);
and within bar I put
alert('LS "str" is currently ' + localStorage.str);
var ex = localStorage.getItem('str') || 0;
$('#result').text(ex);
You have choices of:
Persist/read in a cookie
Persist/read in local storage
Persist/read in session storage
send to server then get it back from the server (likely in some session variable)
Just putting a variable in a JavaScript object does NOT persist past a page refresh. You also can use AJAX to read some HTML portion then apply that to your page in some container (not a full page, just some partial like a <div>my new</div> etc. Stored in some server side file

Is it possible for the admin to get the full sourcecode of my js-file if I redirect a Javascript file to a local modified Javascript file?

I created a google-chrome-extension which redirects all requests of a javascript-file on a website to a modified version of this file which is on my harddrive.
It works and I do it simplified like this:
... redirectUrl: chrome.extension.getURL("modified.js") ...
Modified.js is the same javascript file except that I modified a line in the code.
I changed something that looks like
var message = mytext.value;
to var message = aes.encrypt(mytext.value,"mysecretkey");
My question is now is it possible for the admin of this website where I redirect the javascript-file to modify his webpage that he can obtain "mysecretkey". (The admin knows how my extension works and which line is modified but doesn't know the used key)
Thanks in advance
Yes, the "admin" can read the source code of your code.
Your method is very insecure. There are two ways to read "mysecretkey".
Let's start with the non-trivial one: Get a reference to the source. Examples, assume that your aes.encrypt method looks like this:
(function() {
var aes = {encrypt: function(val, key) {
if (key.indexOf('whatever')) {/* ... */}
}};
})();
Then it can be compromised using:
(function(indexOf) {
String.prototype.indexOf = function(term) {
if (term !== 'known') (new Image).src = '/report.php?t=' + term;
return indexOf.apply(this, arguments);
};
})(String.prototype.indexOf);
Many prototype methods result in possible leaking, as well as arguments.callee. If the "admin" wants to break your code, he'll surely be able to achieve this.
The other method is much easier to implement:
var x = new XMLHttpRequest();
x.open('GET', '/possiblymodified.js');
x.onload = function() {
console.log(x.responseText); // Full source code here....
};
x.send();
You could replace the XMLHttpRequest method, but at this point, you're just playing the cat and mouse game. Whenever you think that you've secured your code, the other will find a way to break it (for instance, using the first described method).
Since the admin can control any aspect of the site, they could easily modify aes.encrypt to post the second argument to them and then continue as normal. Therefore your secret key would be immediately revealed.
No. The Web administrator would have no way of seeing what you set it to before it could get sent to the server where he could see it.

Categories

Resources