Special characters (and MooTools) are ruining my life

Special characters (and MooTools) are ruining my life - javascript

I'm working on localization for my toolkit.
My goal is that if you were a German web developer and you wanted to use a forEach loop, rather then type ['hey', 'there'].forEach(function () {}); they could type ['hey', 'there'].fürJeder(function () {});
I have all the words stored in an object at $.i18n.de.
In my JavaScript file I have
de: {
extend: 'verlänger',
forEach: 'fürJeder'
}
but when I go into the object to get the words they turn into verlÃ¤nger and fÃ¼rJeder.
I have no idea why.
Some details:
I'm on a MacBook Pro running 10.6.7
I'm using Kod as my editor.
I'm using Google Chrome as my browser.
I'm using Option + U + letter to type the ä and ü.
My question: How do I get the browser to handle these correctly?
I've tried using backslashes before them but it stays the same.
EDIT: Screw it. I just found out that the people who inspired me to do this did it as an April Fool's day joke. I really should've clicked on some of those links. It would've saved me 2 hours of trying to set up an API.

Turns out that this is a really bad idea to try and do.
Programming languages are almost exclusively written in English (JavaScript being one of those) which means that even if you write your program in a different language, keywords like return, var, function are still going to be in English and you're still going to have to use them which would get confusing when using functions, constants etc. that have non-English names.
The best solution is to just avoid using non-latin characters in variable names all together.
Even thought it works in most modern browsers, it makes your code harder to write and more confusing.
Leave the coding to the English speakers.

Although I don't get the point of that (especially for german people generally comfortable with english and sharing the same germanic linguistic origin), the simplest workaround to avoid encoding issues is to replace those special characters with their latin counterparts:
fürJeder -> fuerJeder
verlänger -> verlaenger
Uncommon spelling but still correct

Related

JavaScript equivalent of C#'s Char.IsSymbol

I'm trying to strip all 'Unicode Symbols' from a string. That is, keeping all multilingual characters but removing dingbats, arrows, and all of that stuff.
C# has a very handy function called Char.IsSymbol that can be run on all characters of a string, stripping the character when the functions returns true.
I've been searching on doing something similar in JavaScript. If it's a regex then how can I compile a list of all the unicode ranges of the symbol characters? I looked at XRegExp but couldn't find something that only filters symbols.

XRegExp does have support for what you're looking for - http://xregexp.com/plugins/#unicode
You'd probably match either for \pL or \pS. You can find a nice list of the typical unicode categories in http://www.regular-expressions.info/unicode.html#category
Overall, Unicode is quite tricky. It gives plenty of opportunities for giving you trouble, especially with software that isn't fully Unicode compatible (sadly, this includes JavaScript - see https://mathiasbynens.be/notes/javascript-unicode for a nice set of example). This is further exacerbated by the fact that JS often runs with double-encoding (HTML+JS, and there's worse cases as well). Somebody will probably find a way to bypass your checks, but I'm afraid there's no easy way to prevent that. Just be on the lookout :)

Native options for obscuring/encrypting a string?

As an exercise, I've been working on replicating this game. In case it becomes inaccessible, the premise of the game is to take a quote that's been scrambled by swapping pairs of letters (eg replace A with M and vice versa), and unscramble it to its original arrangement.
As I'm studying this game, I realize it's almost trivial to extract the solution from the source - there are any number of breakpoints you can place to access it. I've been trying to come up with a way to obscure the string in a way that it isn't immediately accessible, and the only thing I can think of is some kind of native obscuring function before the quote even has a chance to land in a variable. Something like this:
var litmus, quotes = [
"String One",
"String Two",
....
"String n",
];
litmus = obscureString(quotes[Math.floor(Math.random()*(n-1))]);
This way the user can't summon up the raw quote, or even the random integer that was used - they're gone by the time the breakpoint hits.
My question is this: is there any kind of native function that would fit the role of obscureString() in the above example, even loosely? I'm aware JavaScript doesn't have any native encryption/hash methods, and any libraries that provide that functionality just provide a chance to drop a breakpoint. Thus, I'm hoping someone here can come up with a creative way to natively obscure a string, if it's even possible in JS.

Been crunching on it for a while, and I found a very makeshift solution.
The only native (read: non-user-corruptible) transformation/hash function I was able to find was window.btoa. It does exactly what I need, in letting me obscure a string before the user ever has a chance to get their hands on it. The problem, however, is that it has a counterpart window.atob, whose only purpose is to reverse the process.
To solve that, I was able to neutralize window.atob with the following line of code, essentially making window.btoa a one-way trip:
window.atob = function(f){ return f; };
Don't make a habit of this.
This is horrific practice, and I feel dirty for writing it. It's passable in this case because my application is small, self-contained, and won't ever need to rely on that function elsewhere - but I can't in good conscience recommend this as a general solution. Many browsers won't even let you override native functions in the first place.
Just wanted to post the answer in case someone found themselves in a similar situation needing a similar answer - this may be the closest we can get to a one-way native hash function for now.

How to obfuscate variable names in JavaScript?

I'm an artist that's written a simple game in Javascript. Yah! But go easy on me because I bruise like a peach!
I'm looking into difficult to cheat at the game. So code obfuscation will make it difficult to cheat, right? Difficult, not impossible. I realise that, and could accidentally open a can of worms here...
Essentially, I'm Looking for an online tool that renames variables; and don't say search and replace in textpad :).
For example using http://packer.50x.eu/ on one line of code
var loopCounter = 0;
we get the result:
eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--){d[c]=k[c]||c}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('1 2=0;',3,3,'|var|loopCounter'.split('|'),0,{}))
The above looks like a mess, which is great; but it's quite easy to pick out English words like loopCounter. I would have expected it to make variable names obscure (single letter? words without nouns? look very similar?? Or should have that been my task anyway as part of writing the code. Or is this a waste of time trying to make variable names since a variable declaration is preceded by var and therefore there's no point to disguise it?

After a lot of searching (and links to the above) I found this which allows obfuscated string variables. And that is what I was after.

there are a few online tools available for this: javascript compressor and then theres javascript minifier that you can use for large images also. otherwise you could just google some offline tools, pretty sure they're easy to find

You could use the Javascript Obfuscator... your code will be very difficult to decode!
Hope it helps! ^_^

Is this safe? eval or new Function for simple arithmetic expression

I have heard so many bad things about eval that I've never even tried to use it. However today I have a situation where it seems to be the right answer.
I need a script that can do simple calculations by combining variables. For example, if value=5 and max=8, I want to evaluate value*100/max. Both the values and the formulas will be retrieved from external sources, which is why I am concerned with eval.
I have set up a jsfiddle demo with some sample code:
http://jsfiddle.net/6yzgA/
The values are converted to numbers using parseFloat, so I believe I'm pretty safe here. The characters in the formula are matched again this regular expression:
regex=/[^0-9\.+-\/*<>!=&()]/, // allows numbers (including decimal), operations, comparison
My questions:
Does my regex filter protect me from any attack?
Is there any reason to use eval vs. new Function in this case?
Is there another, safer way to evaluate formulas?

Since you aren't sending anything sending anything to your server, or using anything on anyone else's system, the worst that can happen is that the user crashes his own browser, nothing more. There is nothing unsafe about using eval here, since everything happens user-side.

Escaping and preventing anything on the client-side doesn't make sense at all. User can alter any piece of JS code and run it just as easy as I can change the jsfiddle you posted. Trust me, it's just that simple and you cannot rely on the client-side security.
If you remember to escape input fields on the server-side it's nothing to be worried about. There are plenty of functions for that by default, depending on which language you're using.
If user wants to type in <script>haxx(l33t);</script> - let him do it. Just remember to escape special characters so you'll have <script>haxx(l33t);</script>.

Syntax / Logical checker In Javascript?

I'm building a solution for a client which allows them to create very basic code,
now i've done some basic syntax validation but I'm stuck at variable verification.
I know JSLint does this using Javascript and i was wondering if anyone knew of a good way to do this.
So for example say the user wrote the code
moose = "barry"
base = 0
if(moose == "barry"){base += 100}
Then i'm trying to find a way to clarify that the "if" expression is in the correct syntax, if the variable moose has been initialized etc etc
but I want to do this without scanning character by character,
the code is a mini language built just for this application so is very very basic and doesn't need to manage memory or anything like that.
I had thought about splitting first by Carriage Return and then by Space but there is nothing to say the user won't write something like moose="barry" or if(moose=="barry")
and there is nothing to say the user won't keep the result of a condition inline.
Obviously compilers and interpreters do this on a much more extensive scale but i'm not sure if they do do it character by character and if they do how have they optimized?
(Other option is I could send it back to PHP to process which would then releave the browser of responsibility)
Any suggestions?
Thanks
The use case is limited, the syntax will never be extended in this case, the language is a simple scripted language to enable the client to create a unique cost based on their users input the end result will be processed by PHP regardless to ensure the calculation can't be adjusted by the end user and to ensure there is some consistency.
So for example, say there is a base cost of £1.00
and there is a field on the form called "Additional Cost", the language will allow them manipulate the base cost relative to the "additional cost" field.
So
base = 1;
if(additional > 100 && additional < 150){base += 50}
elseif(additional == 150){base *= 150}
else{base += additional;}
This is a basic example of how the language would be used.
Thank you for all your answers,
I've investigated a parser and creating one would be far more complex than is required
having run several tests with 1000's of lines of code and found that character by character it only takes a few seconds to process even on a single core P4 with 512mb of memory (which is far less than the customer uses)
I've decided to build a PHP based syntax checker which will check the information and convert the variables etc into valid PHP code whilst it's checking it (so that it's ready to be called later without recompilation) using this instead of javascript this seems more appropriate and will allow for more complex code to arise without hindering the validation process
It's only taken an hour and I have code which is able to check the validity of an if statement and isn't confused by nested if's, spaces or odd expressions, there is very little left to be checked whereas a parser and full blown scripting language would have taken a lot longer
You've all given me a lot to think about and i've rated relevant answers thank you

If you really want to do this — and by that I mean if you really want your software to work properly and predictably, without a bunch of weird "don't do this" special cases — you're going to have to write a real parser for your language. Once you have that, you can transform any program in your language into a data structure. With that data structure you'll be able to conduct all sorts of analyses of the code, including procedures that at least used to be called use-definition and definition-use chain analysis.
If you concoct a "programming language" that enables some scripting in an application, then no matter how trivial you think it is, somebody will eventually write a shockingly large program with it.
I don't know of any readily-available parser generators that generate JavaScript parsers. Recursive descent parsers are not too hard to write, but they can get ugly to maintain and they make it a little difficult to extend the syntax (esp. if you're not very experienced crafting the original version).

You might want to look at JS/CC which is a parser generator that generates a parser for a grammer, in Javascript. You will need to figure out how to describe your language using a BNF and EBNF. Also, JS/CC has its own syntax (which is somewhat close to actual BNF/EBNF) for specifying the grammar. Given the grammer, JS/CC will generate a parser for that grammar.
Your other option, as Pointy said, is to write your own lexer and recursive-descent parser from scratch. Once you have a BNF/EBNF, it's not that hard. I recently wrote a parser from an EBNF in Javascript (the grammar was pretty simple so it wasn't that hard to write one YMMV).
To address your comments about it being "client specific". I will also add my own experience here. If you're providing a scripting language and a scripting environment, there is no better route than an actual parser.
Handling special cases through a bunch of if-elses is going to be horribly painful and a maintenance nightmare. When I was a freshman in college, I tried to write my own language. This was before I knew anything about recursive-descent parsers, or just parsers in general. I figured out by myself that code can be broken down into tokens. From there, I wrote an extremely unwieldy parser using a bunch of if-elses, and also splitting the tokens by spaces and other characters (exactly what you described). The end result was terrible.
Once I read about recursive-descent parsers, I wrote a grammar for my language and easily created a parser in a 10th of the time it took me to write my original parser. Seriously, if you want to save yourself a lot of pain, write an actual parser. If you go down your current route, you're going to be fixing issues forever. You're going to have to handle cases where people put the space in the wrong place, or perhaps they have one too many (or one too little) spaces. The only other alternative is to provide an extremely rigid structure (i.e, you must have exactly x number of spaces following this statement) which is liable to make your scripting environment extremely unattractive. An actual parser will automatically fix all these problems.

Javascript has a function 'eval'.
var code = 'alert(1);';
eval(code);
It will show alert. You can use 'eval' to execute basic code.

Develop Reference

JavaScript is the programming language of the Web.