Semi-obfuscate/uglify JavaScript

Semi-obfuscate/uglify JavaScript - javascript

I know about JS minfiers, obfuscators and minifiers. I was wondering if there is any existing tool (or any fast-to-code solution) to partially obfuscate JavaScript. By partially I mean that it should become difficult to read, but not appear as uglified/minified. It should keep indentation, but lose comments, and partially change variable names, making them unclear without converting them to "a, b, c" like an obfuscator.
The purpose of this could be to take an explicit and reusable code and make it implicit and difficult to be reused by other people, without making it impossible to work with for yourself.
Any idea from where to start to achieve this ? Maybe editing an existing obfuscator ?

[This answer is a direct response to OP's request].
Semantic Designs JavaScript obfuscator will do what you want, but you'll need two passes.
On the first pass, run it as obfuscator; it will rename identifiers (although you can control how much or how that is done), strip whitepspace and comments. If you limit its ability to rename the identifiers, you lose some the strength of the obfuscator but that's your choice.
On the second pass, run it as a prettyprinter; it will introduce nice indentation again.
(In fact, the idea for obfsucation came from building a prettyprinter; if you can print-pretty, surely it is easy to print-ugly).
From the point of view of working with the code, you are better off working with your master copy any way you like, complete with your indentation and nice commentary as documentation. When you are ready to obfsucate, you run the obfuscator, shipping the obfuscated result. Errors reported in the obfuscated result that involve obfuscated names can be mapped back to the original names, using the map of obfuscated <--> original names produced during the obfuscation step.
This a product of my company. I'd provide a link but SO hates it when I do that, so you'll have to find it via my bio or googling.
PS: It works exactly as #georg suggests, by parsing to an AST, mangling, and prettyprinting. It doesn't use esprima.

I'm not aware of a tool that would meet your specific requirements, but it seems to be relatively easy to create, given that the vital parts already exist.
parse the source into an AST, using esprima or similar
manipulate the tree in the way you want (eg. remove comments, mangle identifiers etc)
rebuild the source from the tree using escodegen

Related

Coverage-based fuzzing of JavaScript applications

The American Fuzzy Lop, and the conceptually related LLVM libfuzzer not only generate random fuzzy strings, but they also watch branch coverage of the code under test and use genetic algorithms to try to cover as many branches as possible. This increases the hit frequency of the more interesting code further downstream as otherwise most of the generated inputs will be stopped early in some deserialization or validation.
But those tools work at native code level, which is not useful for JavaScript applications as it would be trying to cover the interpreter, but not really the interpreted code.
So is there a way to fuzz JavaScript (preferably in browser, but tests running in node.js would help too) with coverage guidance?
I looked at the tools mentioned in this old question, but those that do javascript don't seem to mention anything about coverage profiling. And while radamsa mentions optionally pairing it with coverage analsysis, I haven't found any documentation on how to actually do it.
How can one fuzz-test java-script (in browser) application with coverage guidance?

Fuzzing a JavaScript engine draws a lot of attention as the number of browser users is about 4 Billion. Several works have been done to find bugs in JS engines, including popular large engines, e.g, v8, webkit, chakracore, gecko, or some small embedded engines, like jerryscript, QuickJS, jsish, mjs, mujs.
It is really difficult to find bugs using AFL as the mutation mechanisms provided by AFL is not practical for JS files, e.g, bitflip can hardly be a valid mutation. Since JS is a structured language, several works using ECMAScript grammar to mutate/generate JS files(seeds):
LangFuzz parses sample JS files and splits them into code fragments. It then recombines the fragments to produce test cases.
jsfunfuzz randomly generates syntactically valid JS statements from JS grammar manually written for fuzzing.
Dharma is a generation-based, context-free grammar fuzzer, generating files based on given grammar.
Superion extends AFL using tree-based mutation guided by JS grammar.
The above works can easily pass the syntax checks but fail at semantic checks. A lot of generated JS seeds are semantically invalid.
CodeAlchemist uses a semantics-aware approach to generate code segments based on a static type analysis.
There are two levels of bugs related to JS engines: simple parser/interpreter bugs and deep inside logic bugs. Recently, there is a trend that the number of simple bugs decreases while more and more deep bugs come out.
DIE uses aspect-preserving mutation to preserves the desirable properties of CVEs. It also using type analysis to generate semantic-valid bugs.
Some works focus on mutating intermediate representations.
Fuzzilli is a coverage-guided fuzzer based on mutation on the IR level. The mutations on IR can guarantee semantic validity and can be transferred to JS.
Fuzzing JS is an interesting and hot topic according to the top conference of security/SE in recent years. Hope this information is helpful.

How would I create a Text to Html parser? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Edit: I recently learned about a project called CommonMark, which
correctly identifies and deals with the ambiguities in the original
Markdown specification. http://commonmark.org/ It has great C# library
support.
You can find the syntax here.
The source that follows with the download is written in Perl, which I have no intentions of honoring. It is riddled with regular expressions, and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!
I'm about to hard code a parser for Markdown. What is experience with this?
If you don't have anything meaningful to say about the actual parsing of Markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution, that is, a third-party library).
To help a bit with the answers, regular expressions are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.
If you think about Markdown, it's fundamentally based around the concept of paragraphs.
As such, a reasonable approach might be to split the input into paragraphs.
There are many kinds of paragraphs, for example, heading, text, list, blockquote, and code.
The challenge is thus to identify these paragraphs and in what context they occur.
I'll be back with a solution, once I find it's worthy to be shared.

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg.
EDIT: Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in OCaml, it is extremely simple and short (268 SLOC for the parser, 43 SLOC for the HTML emitter), yet blazingly fast (20% faster than discount (written in hand-optimized C) and sixhundred times faster than BlueCloth (Ruby)), despite the fact that it isn't even optimized for performance yet. Because it is only intended for internal use by Mauricio himself for his weblog, there are a few deviations from the official Markdown specification, but Mauricio has created a branch which reverts most of those changes.

I released a new parser-based Markdown Java implementation last week, called pegdown.
pegdown uses a PEG parser to first build an abstract syntax tree, which is subsequently written out to HTML. As such it is quite clean and much easier to read, maintain and extend than a regex based approach.
The PEG grammar is based on John MacFarlanes C implementation "peg-markdown".
Maybe something of interest to you...

If I was to try to parse markdown (and its extension Markdown extra) I think I would try to use a state machine and parse it one char at a time, linking together some internal structures representing bits of text as I go along then, once all is parsed, generating the output from the objects all stringed together.
Basically, I'd build a mini-DOM-like tree as I read the input file.
To generate an output, I would just traverse the tree and output HTML or anything else (PS, LaTex, RTF,...)
Things that can increase complexity:
The fact that you can mix HTML and markdown, although the rule could be easy to implement: just ignore anything that's between two balanced tags and output it verbatim.
URLs and notes can have their reference at the bottom of the text. Using data structures for hyperlinks could simply record something like:
[my text to a link][linkkey]
results in a structure like:
URLStructure:
| InnerText : "my text to a link"
| Key : "linkkey"
| URL : <null>
Headers can be defined with an underline, that could force us to use a simple data structure for a generic paragraph and modify its properties as we read the file:
ParagraphStructure:
| InnerText : the current paragraph text
| (beginning of line until end of line).
| HeadingLevel : <null> or 1-4 when we can assess
| that paragraph heading level, if any.
Anyway, just some thoughts.
I'm sure that there are many small details to take care of and I'm pretty sure that Regexes could become handy during the process.
After all, they were meant to process text.

I'd probably read the syntax specification enough times to know it, and get a feel for how to parse it.
Reading the existing parser code is of course brilliant, both to see what seems to be the main source of complexity, and if any special clever tricks are being used. The use of MD5 checksumming seems a bit weird, but I haven't studied the code enough to understand why it's being done. A comment in a routine called _EscapeSpecialChars() states:
We're replacing each such character with its corresponding MD5 checksum value;
this is likely overkill, but it should prevent us from colliding with the escape
values by accident.
Replacing a single character by a full MD5 does seem extravagant, but perhaps it really makes sense.
Of course, it'd be clever to consider creating a "true" syntax, for a tool such as Flex to get out of the regex bog.

If Perl isn't your thing, there are Markdown implementations in at least 10 other languages. They probably don't all have 100% compatibility, but tend to be pretty close.

MarkdownPapers is another Java implementation whose parser is defined in a JavaCC grammar.

If you are using a programming language that has more than three other
users, you should be able to find a library to parse it for you. A
quick Google-ing reveals libraries for CL, Haskell, Python,
JavaScript, Ruby, and so on. It is highly unlikely that you will need
to reinvent this wheel.
If you really have to write it from scratch, I recommend writing a
proper parser. With this technique, you won't have to escape things
with MD5 hashes. (I agree that if you have to do something like this,
it's time to reconsider your design.)

There are libraries available in a number of languages, including php, ruby, java, c#, javascript. I'd suggest looking at some of these for ideas.
It depends on which language you wish to use, for the best way to implement it, there will be idiomatic and non idiomatic ways to do it.
Regexes work in perl, because perl and regex are best friends.

Markdown is a JAWL (just another wiki language)
There are plenty of open source wiki's out there that you can examine the code of the parser. Most use REGEX
Check out the screwturn wiki, is has an interesting multi pass formatter pipeline, a very nice technique - see /core/Formatter.cs and /core/FormatterPipeline.cs
Best is to use/join an existing project, these sorts of things are always much harder than they appear

Here you can find a JavaScript-implementation of Markdown. It also relies heavily on regular expressions, as this is just the fastest and easiest way to parse the text.
But it spares the MD5 part.
I cannot help directly with the coding of the parsing, but maybe this link can help you one way or another.

Javascript library to manage translation forms

Is anybody aware of any javascript tool (compatible with jQuery, tinymce or any other clientside library) able to manage the following requirements?
I need to show translation forms in which every field (either input or textarea) could contain some segment variables or code sections (mostly HTML).
For example:
"Hello {{firstname}}, this is your personal page."
or
"You improved your personal score of <strong>{{n}} points</strong>."
Of course I obtain these segments from a template parser and I need to show them to a set of translators that will perform localization towards many languages. I know that in many cases I can (and should!) avoid variables and code inside translation segments, but in many other cases I really can't.
The problem is: I would like to manage coherence about variables and code directly on the browser (I trust my translators but a bit more of UI/UX help is always a good thing!).
A nice approach could be providing the set of variables and code tags, ready to be inserted by means of a single click (in order to avoid mispelled variables or incorrect code syntax) and a bit of pre-submit validation to be sure everything was inserted.
I've seen this approach in other websites, such as Facebook or Freelancer.com (who have the power and the ability to reimplement the whole thing from scratch!).
Do you know about any almost-ready tool/library for this purpose?
Thank you all in advance for any suggestion.

If you are asking for a library to translate text - here is Google Translate API: https://developers.google.com/translate/?csw=1
If you are asking for a library which can take user input, perform validation, and insert into the DOM - then Jquery has everything you need.
If you are asking for something else, let me know and I'll edit my question.

Jquery/Javascript solution for converting wiki-text to HTML and vice versa?

For my web front end I have to implement subsets of the wiki-syntax in my system. Do I need to manually specify rules and reinvent the wheel? Is there an existing javascript library or jquery plugin that could help out with it?
For example a user enters == Header == Since this needs to get converted to a medium header for example (assuming medium is defined in this context as a span as below)
<span class="mediumHeader" id = "Header">Header</span>
Now when the user edits the above text I'm guessing it'll involve replacing the
<span...> ... </span> with ==...==
Now for every system I design this will be as per 'my rules' and will almost always have to reinvent the wheel. Is there something that I could use to ease this wiki to/from HTML transformation using Jquery/Javascript? I'm sure it's a problem with a known solution.
I would prefer to customize what's acceptable and what isn't i.e. I don't everything to be translated into wiki syntax (or HTML) only subsets of it. Should I just roll my own for my application?

It's been long enough that you may not need this, but yours was the top SO hit when I started looking into it.
There are a couple javascript options - you're probably looking at instaview (check out test/test.js), or maybe Wiky.js (the less fully documented).
If you aren't limited to Javascript, check out the exhaustive list of MediaWiki parsers at http://www.mediawiki.org/wiki/Alternative_parsers - lots of tools for C++, Java, Perl, ruby, and more. That's the link to watch for new developments.

At the time of writing, Parsoid seems to be the only one which translates in both directions. This one also powers the visual editor on Wikipedia. But this is no handy client-site lib to include in your app, but a full-blown parsing and transformation server suite. A production version of Parsoid on the Wikimedia cluster can be accessed at http://parsoid-lb.eqiad.wikimedia.org/.
Other JavaScript Libraries, which are translating from WikiText to HTML only (ordered by popularity), are:
Wiky.js - doesn't support full WikiText syntax. (by tanin47, not to be confused with Wiki.js from Requarks - a different project completely)
wtf_wikipedia - isn't directly translating to HTML but JSON, which results in much more powerful possiblities (e.g. info-boxes as key-value pairs). This is the most up-to-date library and "its a combination of instaview, txtwiki, and uses the inter-language data from Parsoid javascript parser."
instaview - no updates in the last 2 years.
Also checkout the current and full list of alternative MediaWiki parsers.

Syntax / Logical checker In Javascript?

I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you

If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).

You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.

Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.

Develop Reference

JavaScript is the programming language of the Web.