Compiling Oniguruma regex library to javascript using Emscripten - javascript

I'm trying to get a more powerful regex library into javascript. The only solution I found is to compile Oniguruma regex library to javascript using Emscripten
I've installed Emscripten and tested it with their small test scripts, also downloaded oniguruma source code, but still don't know what should be done next.
Anyone familiar with emscripten?

When you utilize Emscripten, the general way of building/compiling from C/C++ stays similar. The steps which change, are that you don't use e.g. the gcc compiler but Emscripten compiler.
That said there is the general question of whether you are familiar with C/C++ and more specific with autotools (which seems like the build tool Oniguruma uses). If you are not, you will probably have a very hard time understanding what needs to be done and how.
Last I checked Emscripten did not have support for Libtool, so building, utilizing autotools, will probably fail. Feel free to ask at Emscripten IRC channel though, whether this is indeed not possible.
Another way I can think of is using autotools to generate Makefiles and then writing custom targets for Emscripten programs. Beware that this is for advanced users, familiar with the make cruft.
If these steps are to taxing for you perhaps you should see whether a Javascript library can be sufficient for you.

A more realistic approach to do this is going to be to use http://xregexp.com. It adds many more features to RegExps and compiles them down to JavaScripts more limited RegExp dialect so it can get the best of both features and performance. Compiling a regexp library using emscripten is very unlikely to be performant enough to use in production. For some uses, emscripten is excellent, but in this case it seems like the overhead is going to be not worth the cost.
The author of XRegExp even has an article on lookbehinds http://blog.stevenlevithan.com/archives/javascript-regex-lookbehind

Related

Can you compile JS code using V8 and feed that directly to Chrome?

I'm looking for a way to protect some javascript code from reading/modifying. I know many people consider that impossible, but still...
From what I see the Chrome's V8 engine does a number of optimizations when it sees JS code, probably compiles it (?) and then runs it.
So I'm wondering is it possible to use V8's C++ api to compile the JS code into machinecode/chromecode and then feed that directly into Chrome (I don't care about other browsers)?
Supposedly it will not only be faster, but also non-humanly readable, something like ASM.
Is this possible?
WebAssembly is doing this thing so I don't understand why we can't do it with JS code.
There's also EncloseJS and pkg that are doing a very similar thing.
V8 developer here. No, it is not possible to compile JavaScript ahead of time and send only the compiled code to the browser. V8 (and other virtual machines like it) contain compilers, but they cannot be used as standalone compilers to produce standalone binaries.
In theory, you could compile JavaScript to WebAssembly -- any two turing-complete programming languages can in theory be compiled to each other. As far as I know, no such compiler exists today though. One big reason for that is that performance of the end result would be horrible (see the discussion with Andreas Rossberg for details); so considering that browsers can execute JavaScript directly, people have little reason to develop such a thing. (It would also be a large and difficult task.)
As for your stated goal: your best shot at making JavaScript code unreadable is to minify it. In fact, that is effectively just as good as your idea to generate assembly, because disassemblers exist that turn assembly back into minified-like higher-level language code; they cannot reconstruct variable names or comments (because that information is lost during compilation), but they can reconstruct program logic.
What I ended up doing is moving some of the logic from JavaScript into C++ and compiling that into NodeJS native modules (that's possible for Electron apps).
It works pretty good, it's very fast, the source is... as protected as it can get, but you may need to worry about cross-platform issues, also compiling/linking can be a bit of a pain, but other than that it's great.
WebAssembly is not doing that. And no, it's not possible either. The web is supposed to be both browser- and hardware-independent.
Moreover, a language like JS would not be faster if compiled offline -- it only is anything close to fast because it is dynamically compiled and optimised, taking dynamic profile information into account.

What do those javascript front-end build tools mean when they say "compile" my js codes?

I saw those javascript front-end build tools, e.g. webpack, using the word "compile" from time to time. I am not sure what does compile javascript codes mean exactly, at least not like compile c/c++ codes.
I think I understand the "build" process in general, like bundle all js codes into one big file, minify/uglify the codes, using babel to transforms ES6 syntax(transpile). But what does compiling mean here, how does it fit in the whole building process or it is just another name for the whole build process?
Currently, I thought it may be just another name for using Babel to transforms ES6 syntax.
PS. after reading this SO Is Babel a compiler or transpiler? I believe my question is not same as that. Because it is not just related to Bable. For example, webpack also uses the term compiler https://webpack.js.org/api/compiler/ I do not understand its meaning there!
Browserify uses compiler as well e.g, https://github.com/robrichard/browserify-compile-templates "Compiles underscore templates from HTML script tags into CommonJS in a browserify transform"
It's better to describe the process as "transpilation."
Javascript always executes in a specific environment: in Chrome and Electron, it's the V8 engine; in Firefox, it's SpiderMonkey; etc. Each of these engines supports a specific set of language features and not others. As an example, some engines only support var and do not support const or let. Some support async/await, and others only support Promise.
But web developers know about these other features, and they want to use them, even when they're writing for an engine that doesn't support those features. Why? Most new language features are designed with the goal of making it possible to express complicated concepts in simpler and cleaner ways. This is extremely important, because the number one job of code is to make its purpose clear.
Thus, most language features are essentially syntactic sugar for existing functionality. In those cases, it's always possible to express a routine using both new and old syntax. This is a logical necessity.
A transpiler like Babel can read a script written using advanced syntax, and then re-express the script using a restricted set of language features. Relying on an intermediate representation called an abstract syntax tree, it can produce code that is guaranteed to be functionally equivalent, even though it does the work using very different, more widely-supported control structures.
Perhaps we web developers have gotten lazy in our terminology. When we talk of "compiling" javascript, we aren't usually talking about converting the script to something like bytecode. We're talking about transpilation.
Other kinds of build tasks are also becoming quite common. These days, the front-end is obsessed with a million flavors of "templating," because it's extremely tedious and confusing to describe DOM changes using pure javascript, and because application complexity is increasingly very rapidly. Some frameworks require you to convert source code to other intermediary forms that are later consumed by the web application at runtime. Others permit devs to describe UI using invented syntaxes that no browser is even attempting to support natively. Which tasks are needed varies by application depending on which frameworks are being used, the particulars of the application architecture, and the contours of the deployment environment, and that's just a start.
At its foundation, a web page is built using HTML, CSS, and javascript. That much hasn't changed. But today, most serious applications are built almost entirely in javascript (or something very much like it) and sass. Building the application is the process of applying a set of transformations to the source code to yield the final artifacts in those three bedrock languages.
We lump all that stuff under the term "compile."
You pretty much hit the nail on the head. When the Compile (or more appropriately transpilation) operation happens on a JavaScript project it can mean a number of things. As you mentioned these could range from minification, applying polyfills, shims, or the literal act of "compiling" the scripts into a single bundle file for platform/browser consumption.
Transpilation when using super sets of the JavaScript language such as TypeScript, ActionScript, or UnityScript describes the process of converting the source x-script back into native JavaScript which can be in turn be interpreted by a browser (since the browser doesn't recognize the superset languages).
However you are absolutely correct. We aren't compiling our JavaScript into binary, but the term gets thrown around a lot which can lead to confusion. All that said, we are closing in on the age of adoption of WebAssembly and ASMJs which promises to bring the age of bytecode running in the browser which will bring about some interesting possibilities, but alas... That's a story for another day ;)
You're right when you say these front-end Javascript tools don't use the word compile in same context in what your used to with build tools for languages like C/C++. C/C++ compilers turn source code into machine code.
These JavaScript build tools-- like Webpack-- use the word compile in a sense thats more metaphorical than conventional.
When web build tools use the word compile, they're using it in the sense that they are transpiling, minifying (a.k.a uglyfying), and bundling the source files so they are better optimized for client browsers and network requests. (Smaller file sizes, better browser compatibility, less HTTP requests from bundled assets, etc.)

Headless single-source library for JVM and JavaScript

I need to write a library that can be compiled to Java-classes (to be more specific: Android) and JavaScript or TypeScript (modern Browser and Node.js).
The lib will deal with lists of objects with a lot of numbers and has to calculate statistics and filter/manipulate the lists. No rocket-science, dependencies can be bridged for each environment. No problems with decimal arithmetics. (=> The libs could be developed in TypeScript and Java, but nobody wants to maintain 2 semantically equal sources.)
I've no fear to learn a new language, but integration should be smooth (i.e. create a .jar with standard Java interface and Java types and a .js file for JavaScript/TypeScript without hundreds of kilobytes of runtime).
Could I choose Scala/Scala.js for this?
Would it work with Kotlin?
Has anybody of you guys managed an equal task successfully? What are the caveats?
Well, the basics are very normal for Scala/Scala.js these days -- many libraries cross-compile with no changes. The Scala.js compiler is highly optimized, and only includes code that is actually invoked, so the output is reasonably lean. (Unless you need bulky external dependencies, which the SJS compiler can't do much about.)
Managing the dependency differences will be some extra effort, if you need to deal with them differently on the two sides. This isn't terribly unusual for Scala/Scala.js, but does require that the project be structured for that. The documentation of CrossProject gets into the details.
But overall -- yeah, that's all fairly common at this point...

How to write a obfuscater for javascript code?

I want to write a program that scans javascript code, and replace variable names with short, meaningless identifiers, without breaking the code.
I know YUI compresser and google's closure compiler can do this. I am wondering how can I implement this?
Is it necessary to build the abstract syntax tree? If not, how can I find the candidate variables for renaming?
Most modern javascript compressors are actually compilers. They parse javascript input into an abstract syntax tree, perform operations on the tree (some safe, some not) and then use the syntax tree to print out code. Both Uglify and Closure-Compiler are true compilers.
Implementing your own compiler is a large project and requires a great knowledge of computing theory. The dragon book is a great resource from which to get started.
You may be able to leverage existing work. I recommend starting from a non-optimizing compiler for reference.
I made http://www.whak.ca for whacking scripts to unreadable obfuscation, over 75 algorithms ready for you to obfuscate your code. There are also 20 compression JavaScript packers on http://www.whak.ca/packer/ that will obfuscate your codes. These all can be reversed engineered if someone wanted your code bad enough. But people can pick locks, yet we still lock our doors...

Pre-Compile JavaScript from VS2008 IDE

Is there a way for me to pre-compile my JS code when building my solution? I would like to be made aware of common problems before I get to the browser. Ideally I would build the sln and, if necessary, have a plugin or call from the build events examine the js code against a Java compiler.
Thank you very much in advance!
Despite what many of the other posters have said, in many cases (incl. the Mozilla Spidermonkey engine found in the Firefox browser) Javascript is in fact compiled into bytecodes, vaguely similar to (but not compatible with) the ones used in Java. You just don't see the compiler's output because it is never available to you, only to the Javascript bytecode interpreter. It's also not possible to save the compiled bytecodes for reuse (at least in the web browser context; in alternate uses of the Spidermonkey engine I do think it is possible to save the compiled bytecodes in memory for reuse, but not in a form that can be saved to disk for another future use), as far as I know.
I use a Javascript shell JSDB which also uses the Spidermonkey engine; when you load in a file, it will complain about syntax errors before it runs even one line of code. This is not the same type of compilation as Java, though; Javascript is a loosely-typed language and so it won't catch problems the way a Java compiler would (e.g. complain about every last thing under the sun that it knows you haven't done right).
Having said that, I would second JSLint as it would probably catch many of your errors.
As a side note, the Rhino project lets you compile Javascript into Java classes; i've never tried this but it sounds interesting.
Javascript is an interpreted language, it's not compiled till runtime. Also, javascript has very, very little to do with Java. Netscape released JavaScript around the same time Sun released Java, and there was some sort of marketing deal between them. Otherwise, they're unrelated.
I apologize for my misuse of the term 'compile'. I do fully understand the difference between compiled and interpreted languages. What I am interested in is to have my syntax checked routinely during a build so typos, invalid method calls and the like are flagged. I'm going to look into what Jason S recommended for this. I am also fully aware that JavaScript is not Java, but have read before that you could run your JS code through a Java Compiler for syntax checking. I was hoping to find something better integrated with VS.
Thank you very much to everyone understanding the intent of my request.
Javascript isn't compiled, has nothing to do with Java but some shared syntax, and is best tested by loading your app into a browser.
There are some JS testing frameworks/tools available but I couldn't recommend any of them myself.
Try JSLint.VS

Categories

Resources