Kotlin, how can I read a dynamic website as text? - javascript

As titled, I'm trying to read the content of sites like this one, which appears to be javascript based.
I tried using plain jdk lib, then jsoup and then htmlunit, but I couldn't get anything useful out of it (I see just the source code or just the title or null):
val url = URL("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
val connection = url.openConnection()
val scanner = Scanner(connection.getInputStream())
scanner.useDelimiter("\\Z")
val content = scanner.next()
scanner.close()
println(content)
val doc = Jsoup.connect("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate").get()
println(doc.text())
WebClient().use { webClient ->
val page = webClient.getPage<HtmlPage>("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
val pageAsText = page.asNormalizedText()
println(pageAsText)
}
WebClient(BrowserVersion.FIREFOX).use { webClient ->
val page = webClient.getPage<HtmlPage>("https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate")
println(page.textContent)
}
It should be something easy peasy, but I cant see what's wrong

In order for this to be possible, you need something to execute the JS that modifies the DOM.
It might be a bit overkill depending on the use case, and probably won't be possible if you're on Android, but one way to do this is to launch a headless browser separately and interact with it from your code. For instance, using Chrome Headless and the Chrome DevTools Protocol. If you're interested, I have written a Kotlin library called chrome-devtools-kotlin to interact with a Chrome browser in a type-safe way.
There might be simpler options, though. For instance maybe you can run an embedded browser instead with JBrowserDriver and still use JSoup to parse the HTML, as mentioned in this other answer.

Regarding HtmlUnit:
the page has initially no content, all you see is rendered from javascript magic on the client side using one of this spa frameworks.
It looks like there is some feature check in the beginning that figures out the js support in HtmlUnit does not have all the required features and based on this you only get a hint like "Please enable Javascript to use this application".
You can use
page.asXml()
to have a look at the content trough HtmlUnit's eyes.
You can open an HtmlUnit issue on github but i fear adding support for this will be a longer story.

Related

Replace Javascript on-the-fly

this questions pops up again and again across the internet (even on SO), but I haven't found a satisfying solution to this problem:
How can we change/replace Javascript code in a running web application, without reloading the page?
Many people answer this with "you cannot, because it is impossible". Some experiments with IntelliJ IDEAs live edit plugin proves me that it is possible. But I don't want to be bound to an IDE for this feature. (Bonus: browser independent)
Here is what I tried:
add //# sourceURL=whatever.js to my dynamically loaded script
add folder to Chrome containing whatever.js
mapping the local whatever.js to the network whatever.js
changing code in either does not affect the web-page at all. In fact editing the network-side file results in a oddish "flashing" of the dev tools.
Please understand that I do not expect the changed JS to magically apply to the webpage once I change it, but I expect it to use the new code when the execution point is passed again.
Example:
Given a button that triggers 'alert(1);'
Change to 'alert(2);'
I expect the button to trigger 'alert(2);'
Having many dependencies and a huge script that is triggered pretty late in a workflow it is really a big problem for me to refresh the page, so I need to find a solution that works on-the-fly.
First of all: What you ask for is really tricky and you can find security problems if you allow this in your applications, anyway it is not impossible.
BUT if you want to achieve your example follow this steps:
Make a code snippet like this:
var message = "1"; // this must be a global variable!!!!
function showMessage() {
alert(message);
}
Given a button that triggers 'alert(1);'
Make button call a function ie: onclick='showMessage()'
Change to 'alert(2);'
I expect the button to trigger 'alert(2);'
Now it's easy, When you detect the event that implies to change the alert message to 2 you just need to change message value:
message = "2";
That's all.
Option 1: Livereload
I would say as long it's for develop reasons you can use livereload on your server.
Depends of your server type. I'm note big expert in apach, glassfish and other java's world stuff, but in world of JS (nodejs) this is a shorter way.
(link for npm-livereload)
Hack: You can handle static-files such as js, css with simple node.js server with built-in livereload.
Option 2: jRebel
I'm not sure about js but perhaps JRebel can handle this issue. Anyway it's a good addition to the develop process - at least it would make a java's "hot reload: for you.
Option 3: Monkey-patching
You can use monkey-patching techniques: Each function in js it's just a string, you can turn string -> function with new Function().
just like:
var foo = {
sum: function (a, b) {return a+b;}
}
//...
obj.sum = new Function(....) //Now you're replaced the original code
check this article about graceful way to do monkey-patching.
And small advertising of my lib for monkey-patching: monkey-punch
Option 4: Attach new tag
You can attach js files with:
var s = document.createElement("script");
s.type = "text/javascript";
s.src = "http://somedomain.com/somescript";
$("head").append(s);
You're also able to remove dom elements (scripts, styles) and attach new at anytime.

JavaScript code working on old server, but not on new

This has been driving me crazy. I'm trying to migrate some legacy applications to a new server and I'm having a lot of problems with document.all being peppered throughout the code.
Before you say it, I know. Don't use document.all. It's there and there's nothing I can do about it...this particular problem has JavaScript within an assembly for which I do not have the source code and I don't have permission to redevelop.
My main confusion is that the current version of the application is working when I test in the same browser as my migrated version. It's a straight copy and paste job and no code has changed during the migration, but when I run the app on the new server document.all(element) always returns null.
Does IIS or newer .NET frameworks somehow handle client-side scripts differently?
I'm coming from .NET 2.0 hosted on Windows Server 2005 with IIS 6, and going to .NET 4.0 hosted on Windows Server 2012 with IIS 8.
I'm looking for ANY idea why these would be behaving differently when tested in the same browser
UPDATE:
A user control being targeted by document.all is getting encoded which is messing up the ID of the control, i.e. what should be 'elementId' is being outputted as 'elementId'
The code in the assembly is using Attributes.Add which is including single quotes. I've found a number sources suggesting a new class such as:
public class HtmlAttributeEncodingNot : System.Web.Util.HttpEncoder
{
protected override void HtmlAttributeEncode(string value, System.IO.TextWriter output)
{
output.Write(value);
}
}
being added which will allow encoding to be turned off by using <httpRuntime encoderType="HtmlAttributeEncodingNot"/> in the web.config, but I am not able to add new classes to this project
I'm going to leave the accepted answer as it is, but add this answer in case anyone else has issues with this in the future.
My circumstances changed and I was able to add a class to the project. The class in my question:
public class HtmlAttributeEncodingNot : System.Web.Util.HttpEncoder
{
protected override void HtmlAttributeEncode(string value, System.IO.TextWriter output)
{
output.Write(value);
}
}
along with using the class for the encoding in the web.config:
<httpRuntime encoderType="HtmlAttributeEncodingNot"/>
was sufficient for dealing with the single quotes in Attributes.Add being encoded to $#39;
Additionally, something to look out for with old JavaScript is the way .net renders the ClientID of controls. Using a combination of the above code, along with <xhtmlConformance mode="Legacy"/> , and clientIDMode="AutoID", I was able to render the html the same as it's 2.0 equivalent and solve my problem for the time being.
It sounds like your project is ending it's life and you just want to breath new life in to it. I think document.all is done as of IE11, isn't it - so the clock is ticking.
What I'd suggest is writing some Javascript to iterate the DOM and look for any encoded IDs - then change those back to the .NET 2 form. Run that JS at the bottom of the page (as opposed to onload) and hopefully nothing will have tried to access document.all before you 'correct' the ids.

Android, Javascript, Rhino, JSON

After long search in repo folders I found rhino1_7R2.jar for Android at http://code.google.com/p/android-scripting/source/browse/rhino/rhino1_7R2.jar Unfortunately 1_7R3 is not there.
The script I'm using uses JSON.stringify function which is not present in 1_7R2. There is the JSON2.js file for Rhino but I don't know the proper way to load it at run time. Documentation and example codes are weak. Should I load it as a string and prepend on the running script? Or there is a better way?
I'm using JavaScript to dynamically evaluate some calculations in a loop. I really want to avoid prepending the JSON2.js every time I call a javascript function. Spent almost one day to find out Rhino has supported JSON object at late version and nobody bothered to port it to Android. Looks like another open source project lacking support.
Should i give up and consider using WebView method? Any ideas?
As I understand, you hava some JavaScript script that you want to run by Rhino. If you want to load another JavaScript file, you can use load function:
load("/your/path/json2.js");
After that call your script can use json2 library.
var testStr = '{"test" : {"a": "aval", "b" : "bval"}}';
var jsonObj = JSON.parse(testStr);
var a = jsonObj.test.a;

Outlook 2010: How to compose e-mail from VBScript/JScript

I have some JScript code I have been using for a few years which is able to read an XML file and open an Outlook compose window with the to/cc/subject fields prepopulated and files pre-attached based on the XML data. The user can then attach more files, make any corrections and send the e-mail. The core part of the code uses CDO to create the new message:
var ol = WScript.CreateObject("Outlook.Application");
var msg = ol.CreateItem(olMailItem);
Unfortunately I have just discovered this no longer works with Outlook 2010 64-bit as CDO is no longer supported. The suggestion from Microsoft is to update your applications to use the Outlook object model instead, but I can't find any examples at all of how I might use the Outlook object model to open a compose window from either VBScript or JScript. All the "VB" examples on MSDN produce syntax errors when run through the VBScript interpreter.
Can anyone point me to any short examples demonstrating how to interface with Outlook 2010 using either VBScript or JScript?
EDIT: Just realised the problem is that I'm using MAPI.Session to adjust attachment properties and this is what's failing. I guess I need to find what this has been replaced by:
var oSession = WScript.CreateObject("MAPI.Session");
oSession.Logon("", "", false, false);
var oMsg = oSession.GetMessage(strMsgID);
var oAttachFields = oMsg.Attachments.Item(i+1).Fields;
...
Ok, turns out most of the MAPI.Session stuff has been merged in with the actual objects, which are still accessible using the first bit of code in my post ("Outlook.Application"). I was only using the MAPI.Session stuff to hide image attachments (so they can be shown inline in the message body, and not as files attached to the e-mail) but this now seems to be incorporated automatically.
So all I actually had to do was remove the MAPI.Session stuff and then everything started working. I will post a link to the code shortly in case anyone else finds it useful.
EDIT: Here is the code on GitHub if anyone is after a relatively simple example.

Tutorial for using JavaScript on a Desktop

I need to do some scripts in java script.
I am working on it but couldn't find a few solutions to a few problems.
First of all I need a GOOD tutorial, but not for an internet page but for a DESKTOP script.
Things couldn't find out like :
1) I wanted a simple message box in order to debug my program, I used:
var name = prompt("What is your name","Type Name Here");
When running it I get error of "Object expected"
2) Couldn't find how to open a file
Based on your comments, I guess that you are attempting to run a JavaScript file directly on Windows. Double-clicking on a .js file in windows will (probably) run it in Windows Script Host.
The prompt() function will not work this way, since WSH provides a completely different API than browser-embedded engines.
The following code should accomplish your intentions. However if you want anything more than a simple popup, HTAs are the only way to do complex GUIs with JScript on the desktop.
var fso, ws, ts;
fso = new ActiveXObject('Scripting.FileSystemObject');
ws = WScript.CreateObject('WScript.Shell');
var ForWriting= 2;
ts = fso.OpenTextFile('foo.txt', ForWriting, true);
ts.WriteLine(new Date().getTime());
ts.Close();
ws.Popup('Wrote to file!');
var ForReading= 1;
ts = fso.OpenTextFile('foo.txt', ForReading, false);
var fileContents = ts.ReadLine();
ts.Close();
ws.Popup('The file contained: ' + fileContents);
WScript.Quit();
I have to ask: why is JavaScript the right tool for the job? Why not use a scripting language intended to be used this way, such as Python, Ruby, Lua, ... etc?
If you are using Microsoft's JScript (and it sounds like you are), look to the MSDN web site for help. The page here looks fairly good. Google can also help with that.
Assuming you don't mind using Java, you could also use the Mozilla Rhino shell. But it doesn't look like there is a standard way of reading from the console in JavaScript. (presumably since this is not something typically required in a JavaScript application...) The built in JavaScript functions in the shell seem fairly basic, but you can read a file.
There area also examples of using Rhino, which may be helpful. You can interface with the Java API to do whatever else you need to do.
Edit: I wrote this answer a long time ago; today I would use node.js. See their learning page.
The latest prerelease of Opera acts as a runtime for JS applications.
They have tutorials describing how to use it.
I used: var name = prompt("What is your name","Type Name Here");
When running it I get error of "Object expected"
Presumably your runtime doesn't implement prompt that in a way that is compatible with those arguments.
2) Couldn't find how to open a file
This depends on the runtime you use. JS itself doesn't have anything built in to read files (or display a prompt). You need an environment that provides those objects.

Categories

Resources