Compiling regular expressions for google-code-prettify

Compiling regular expressions for google-code-prettify - javascript

We are working on an extension to google-code-prettify which does the code-coloring for source-code on a webpage. We have a very long list of keywords (approx 4000) in Mathematica and while the performance is still very good, I wondered whether I can speed things up.
The regular expression for our keyword list looks like this
var keywords = 'AbelianGroup|Abort|AbortKernels|AbortProtect|Above|Abs|Absolute|\
AbsoluteCurrentValue|AbsoluteDashing|AbsoluteFileName|AbsoluteOptions|\
AbsolutePointSize|AbsoluteThickness|AbsoluteTime|AbsoluteTiming|AccountingForm';
new RegExp('^(?:' + keywords + ')\\b')
Can such an or-ed regex be made faster when it is compiled? Would it in the first place make sense to compile it, since google-code-prettify is a JavaScript running on the server. I don't know whether this script is loaded freshly every time a web-page is loaded. In this case, it is maybe not worth the overhead to compile it.

google-code-prettify runs on the client (it's a script; the source is requested from the server by the browser).
Creating the RegExp object does compile it, at runtime.
In other words, just leave it as-is.

Related

How to debug javascript implementation In V8

I'm learning v8 now, but I have encountered some problems.
How to set a breakpoint a method's start address in memory if I want to debug a method's C++ implementation.
e.g. var a= new Array(0,1); a.indexOf(1) ; I want to set a breakpoint at slice's beginning, or are there other ways to track the assembler code ?
There are a lot of functions will be complied and writed into a file named snapshot.bin. so I can't set a breakpoint at the beginning of these functions.

You need to check the source code and find the implementation of slice. Then set a gdb/lldb break point in that .cc file: byiltins-typedarray.cc
A lot of functions are defined as builtin or runtime functions.

It depends on the kind of function you want to inspect.
You can compile without snapshot to get around snapshot-related debugging difficulties (at the cost of making startup quite a bit slower: several seconds in Debug mode).
You can modify the respective code generator to emit a break instruction at the beginning of the function. For the example of Array.indexOf, that's probably the easiest solution; the CodeStubAssembler instruction is called DebugBreak().
You can break somewhere else using GDB, find your way to the function in question (e.g. via isolate->builtins), and set a breakpoint on the address of its entry. (This requires a bit of V8 knowledge and/or code reading skills, but it's not difficult.)
You can use various --print-*-code flags to print code to stdout (without breaking on it).

Is eval in Javascript considered safe if not using variable code? [duplicate]

I'm writing some JavaScript code to parse user-entered functions (for spreadsheet-like functionality). Having parsed the formula I could convert it into JavaScript and run eval() on it to yield the result.
However, I've always shied away from using eval() if I can avoid it because it's evil (and, rightly or wrongly, I've always thought it is even more evil in JavaScript, because the code to be evaluated might be changed by the user).
So, when it is OK to use it?

I'd like to take a moment to address the premise of your question - that eval() is "evil". The word "evil", as used by programming language people, usually means "dangerous", or more precisely "able to cause lots of harm with a simple-looking command". So, when is it OK to use something dangerous? When you know what the danger is, and when you're taking the appropriate precautions.
To the point, let's look at the dangers in the use of eval(). There are probably many small hidden dangers just like everything else, but the two big risks - the reason why eval() is considered evil - are performance and code injection.
Performance - eval() runs the interpreter/compiler. If your code is compiled, then this is a big hit, because you need to call a possibly-heavy compiler in the middle of run-time. However, JavaScript is still mostly an interpreted language, which means that calling eval() is not a big performance hit in the general case (but see my specific remarks below).
Code injection - eval() potentially runs a string of code under elevated privileges. For example, a program running as administrator/root would never want to eval() user input, because that input could potentially be "rm -rf /etc/important-file" or worse. Again, JavaScript in a browser doesn't have that problem, because the program is running in the user's own account anyway. Server-side JavaScript could have that problem.
On to your specific case. From what I understand, you're generating the strings yourself, so assuming you're careful not to allow a string like "rm -rf something-important" to be generated, there's no code injection risk (but please remember, it's very very hard to ensure this in the general case). Also, if you're running in the browser then code injection is a pretty minor risk, I believe.
As for performance, you'll have to weight that against ease of coding. It is my opinion that if you're parsing the formula, you might as well compute the result during the parse rather than run another parser (the one inside eval()). But it may be easier to code using eval(), and the performance hit will probably be unnoticeable. It looks like eval() in this case is no more evil than any other function that could possibly save you some time.

eval() isn't evil. Or, if it is, it's evil in the same way that reflection, file/network I/O, threading, and IPC are "evil" in other languages.
If, for your purpose, eval() is faster than manual interpretation, or makes your code simpler, or more clear... then you should use it. If neither, then you shouldn't. Simple as that.

When you trust the source.
In case of JSON, it is more or less hard to tamper with the source, because it comes from a web server you control. As long as the JSON itself contains no data a user has uploaded, there is no major drawback to use eval.
In all other cases I would go great lengths to ensure user supplied data conforms to my rules before feeding it to eval().

Let's get real folks:
Every major browser now has a built-in console which your would-be hacker can use with abundance to invoke any function with any value - why would they bother to use an eval statement - even if they could?
If it takes 0.2 seconds to compile 2000 lines of JavaScript, what is my performance degradation if I eval four lines of JSON?
Even Crockford's explanation for 'eval is evil' is weak.
eval is Evil, The eval function is the most misused feature of
JavaScript. Avoid it
As Crockford himself might say "This kind of statement tends to generate irrational neurosis. Don't buy it."
Understanding eval and knowing when it might be useful is way more important. For example, eval is a sensible tool for evaluating server responses that were generated by your software.
BTW: Prototype.js calls eval directly five times (including in evalJSON() and evalResponse()). jQuery uses it in parseJSON (via Function constructor).

I tend to follow Crockford's advice for eval(), and avoid it altogether. Even ways that appear to require it do not. For example, the setTimeout() allows you to pass a function rather than eval.
setTimeout(function() {
alert('hi');
}, 1000);
Even if it's a trusted source, I don't use it, because the code returned by JSON might be garbled, which could at best do something wonky, at worst, expose something bad.

Bottom Line
If you created or sanitized the code you eval, it is never evil.
Slightly More Detailed
eval is evil if running on the server using input submitted by a client that was not created by the developer or that was not sanitized by the developer.
eval is not evil if running on the client, even if using unsanitized input crafted by the client.
Obviously you should always sanitize the input, as to have some control over what your code consumes.
Reasoning
The client can run any arbitrary code they want to, even if the developer did not code it; This is true not only for what is evaled, but the call to eval itself.

Eval is complementary to compilation which is used in templating the code. By templating I mean that you write a simplified template generator that generates useful template code which increases development speed.
I have written a framework, where developers don't use EVAL, but they use our framework and in turn that framework has to use EVAL to generate templates.
Performance of EVAL can be increased by using the following method; instead of executing the script, you must return a function.
var a = eval("3 + 5");
It should be organized as
var f = eval("(function(a,b) { return a + b; })");
var a = f(3,5);
Caching f will certainly improve the speed.
Also Chrome allows debugging of such functions very easily.
Regarding security, using eval or not will hardly make any difference,
First of all, the browser invokes the entire script in a sandbox.
Any code that is evil in EVAL, is evil in the browser itself. The attacker or anyone can easily inject a script node in DOM and do anything if he/she can eval anything. Not using EVAL will not make any difference.
It is mostly poor server-side security that is harmful. Poor cookies validation or poor ACL implementation on the server causes most attacks.
A recent Java vulnerability, etc. was there in Java's native code. JavaScript was and is designed to run in a sandbox, whereas applets were designed to run outside a sandbox with certificates, etc. that lead to vulnerabilities and many other things.
Writing code for imitating a browser is not difficult. All you have to do is make a HTTP request to the server with your favourite user agent string. All testing tools mock browsers anyway; if an attacker want to harm you, EVAL is their last resort. They have many other ways to deal with your server-side security.
The browser DOM does not have access to files and not a user name. In fact nothing on the machine that eval can give access to.
If your server-side security is solid enough for anyone to attack from anywhere, you should not worry about EVAL. As I mentioned, if EVAL would not exist, attackers have many tools to hack into your server irrespective of your browser's EVAL capability.
Eval is only good for generating some templates to do complex string processing based on something that is not used in advance. For example, I will prefer
"FirstName + ' ' + LastName"
As opposed to
"LastName + ' ' + FirstName"
As my display name, which can come from a database and which is not hardcoded.

When debugging in Chrome (v28.0.1500.72), I found that variables are not bound to closures if they are not used in a nested function that produces the closure. I guess, that's an optimization of the JavaScript engine.
BUT: when eval() is used inside a function that causes a closure, ALL the variables of outer functions are bound to the closure, even if they are not used at all. If someone has the time to test if memory leaks can be produced by that, please leave me a comment below.
Here's my test code:
(function () {
var eval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is visible in debugger
eval("1");
})();
}
evalTest();
})();
(function () {
var eval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is NOT visible in debugger
var noval = eval;
noval("1");
})();
}
evalTest();
})();
(function () {
var noval = function (arg) {
};
function evalTest() {
var used = "used";
var unused = "not used";
(function () {
used.toString(); // Variable "unused" is NOT visible in debugger
noval("1");
})();
}
evalTest();
})();
What I like to point out here is, that eval() must not necessarily refer to the native eval() function. It all depends on the name of the function. So when calling the native eval() with an alias name (say var noval = eval; and then in an inner function noval(expression);) then the evaluation of expression may fail when it refers to variables that should be part of the closure, but is actually not.

I saw people advocate to not use eval, because is evil, but I saw the same people use Function and setTimeout dynamically, so they use eval under the hoods :D
BTW, if your sandbox is not sure enough (for example, if you're working on a site that allow code injection) eval is the last of your problems. The basic rule of security is that all input is evil, but in case of JavaScript even JavaScript itself could be evil, because in JavaScript you can overwrite any function and you just can't be sure you're using the real one, so, if a malicious code start before you, you can't trust any JavaScript built-in function :D
Now the epilogue to this post is:
If you REALLY need it (80% of the time eval is NOT needed) and you're sure of what you' re doing, just use eval (or better Function ;) ), closures and OOP cover the 80/90% of the case where eval can be replaced using another kind of logic, the rest is dynamically generated code (for example, if you're writing an interpreter) and as you already said evaluating JSON (here you can use the Crockford safe evaluation ;) )

The only instance when you should be using eval() is when you need to run dynamic JS on the fly. I'm talking about JS that you download asynchronously from the server...
...And 9 times of 10 you could easily avoid doing that by refactoring.

On the server side eval is useful when dealing with external scripts such as sql or influxdb or mongo. Where custom validation at runtime can be made without re-deploying your services.
For example an achievement service with following metadata
{
"568ff113-abcd-f123-84c5-871fe2007cf0": {
"msg_enum": "quest/registration",
"timely": "all_times",
"scope": [
"quest/daily-active"
],
"query": "`SELECT COUNT(point) AS valid from \"${userId}/dump/quest/daily-active\" LIMIT 1`",
"validator": "valid > 0",
"reward_external": "ewallet",
"reward_external_payload": "`{\"token\": \"${token}\", \"userId\": \"${userId}\", \"amountIn\": 1, \"conversionType\": \"quest/registration:silver\", \"exchangeProvider\":\"provider/achievement\",\"exchangeType\":\"payment/quest/registration\"}`"
},
"efdfb506-1234-abcd-9d4a-7d624c564332": {
"msg_enum": "quest/daily-active",
"timely": "daily",
"scope": [
"quest/daily-active"
],
"query": "`SELECT COUNT(point) AS valid from \"${userId}/dump/quest/daily-active\" WHERE time >= '${today}' ${ENV.DAILY_OFFSET} LIMIT 1`",
"validator": "valid > 0",
"reward_external": "ewallet",
"reward_external_payload": "`{\"token\": \"${token}\", \"userId\": \"${userId}\", \"amountIn\": 1, \"conversionType\": \"quest/daily-active:silver\", \"exchangeProvider\":\"provider/achievement\",\"exchangeType\":\"payment/quest/daily-active\"}`"
}
}
Which then allow,
Direct injection of object/values thru literal string in a json, useful for templating texts
Can be use as a comparator, say we make rules how to validate quest or events in CMS
Con of this:
Can be errors in the code and break up things in the service, if not fully tested.
If a hacker can write script on your system, then you are pretty much screwed.
One way to validate your script is keep the hash of your scripts somewhere safe, so you can check them before running.

Eval isn't evil, just misused.
If you created the code going into it or can trust it, it's alright.
People keep talking about how user input doesn't matter with eval. Well sort of~
If there is user input that goes to the server, then comes back to the client, and that code is being used in eval without being sanitized. Congrats, you've opened pandora's box for user data to be sent to whoever.
Depending on where the eval is, many websites use SPAs, and eval could make it easier for the user to access application internals that otherwise wouldn't have been easy. Now they can make a bogus browser extension that can tape into that eval and steal data again.
Just gotta figure what's the point of you using the eval. Generating code isn't really ideal when you could simply make methods to do that sort of thing, use objects, or the like.
Now a nice example of using eval.
Your server is reading the swagger file that you have created. Many of the URL params are created in the format {myParam}. So you'd like to read the URLs and then convert them to template strings without having to do complex replacements because you have many endpoints. So you may do something like this.
Note this is a very simple example.
const params = { id: 5 };
const route = '/api/user/{id}';
route.replace(/{/g, '${params.');
// use eval(route); to do something

eval is rarely the right choice. While there may be numerous instances where you can accomplish what you need to accomplish by concatenating a script together and running it on the fly, you typically have much more powerful and maintainable techniques at your disposal: associative-array notation (obj["prop"] is the same as obj.prop), closures, object-oriented techniques, functional techniques - use them instead.

As far as client script goes, I think the issue of security is a moot point. Everything loaded into the browser is subject to manipulation and should be treated as such. There is zero risk in using an eval() statement when there are much easier ways to execute JavaScript code and/or manipulate objects in the DOM, such as the URL bar in your browser.
javascript:alert("hello");
If someone wants to manipulate their DOM, I say swing away. Security to prevent any type of attack should always be the responsibility of the server application, period.
From a pragmatic standpoint, there's no benefit to using an eval() in a situation where things can be done otherwise. However, there are specific cases where an eval SHOULD be used. When so, it can definitely be done without any risk of blowing up the page.
<html>
<body>
<textarea id="output"></textarea><br/>
<input type="text" id="input" />
<button id="button" onclick="execute()">eval</button>
<script type="text/javascript">
var execute = function(){
var inputEl = document.getElementById('input');
var toEval = inputEl.value;
var outputEl = document.getElementById('output');
var output = "";
try {
output = eval(toEval);
}
catch(err){
for(var key in err){
output += key + ": " + err[key] + "\r\n";
}
}
outputEl.value = output;
}
</script>
<body>
</html>

Since no one has mentioned it yet, let me add that eval is super useful for Webassembly-Javascript interop. While it's certainly ideal to have pre-made scripts included in your page that your WASM code can invoke directly, sometimes it's not practicable and you need to pass in dynamic Javascript from a Webassembly language like C# to really accomplish what you need to do.
It's also safe in this scenario because you have complete control over what gets passed in. Well, I should say, it's no less safe than composing SQL statements using C#, which is to say it needs to be done carefully (properly escaping strings, etc.) whenever user-supplied data is used to generate the script. But with that caveat it has a clear place in interop situations and is far from "evil".

It's okay to use it if you have complete control over the code that's passed to the eval function.

Code generation. I recently wrote a library called Hyperbars which bridges the gap between virtual-dom and handlebars. It does this by parsing a handlebars template and converting it to hyperscript. The hyperscript is generated as a string first and before returning it, eval() it to turn it into executable code. I have found eval() in this particular situation the exact opposite of evil.
Basically from
<div>
{{#each names}}
<span>{{this}}</span>
{{/each}}
</div>
To this
(function (state) {
var Runtime = Hyperbars.Runtime;
var context = state;
return h('div', {}, [Runtime.each(context['names'], context, function (context, parent, options) {
return [h('span', {}, [options['#index'], context])]
})])
}.bind({}))
The performance of eval() isn't an issue in a situation like this too because you only need to interpret the generated string once and then reuse the executable output many times over.
You can see how the code generation was achieved if you're curious here.

There is no reason not to use eval() as long as you can be sure that the source of the code comes from you or the actual user. Even though he can manipulate what gets sent into the eval() function, that's not a security problem, because he is able to manipulate the source code of the web site and could therefore change the JavaScript code itself.
So... when to not use eval()? Eval() should only not be used when there is a chance that a third party could change it. Like intercepting the connection between the client and your server (but if that is a problem use HTTPS). You shouldn't eval() for parsing code that is written by others like in a forum.

If it's really needed eval is not evil. But 99.9% of the uses of eval that I stumble across are not needed (not including setTimeout stuff).
For me the evil is not a performance or even a security issue (well, indirectly it's both). All such unnecessary uses of eval add to a maintenance hell. Refactoring tools are thrown off. Searching for code is hard. Unanticipated effects of those evals are legion.

My example of using eval: import.
How it's usually done.
var components = require('components');
var Button = components.Button;
var ComboBox = components.ComboBox;
var CheckBox = components.CheckBox;
...
// That quickly gets very boring
But with the help of eval and a little helper function it gets a much better look:
var components = require('components');
eval(importable('components', 'Button', 'ComboBox', 'CheckBox', ...));
importable might look like (this version doesn't support importing concrete members).
function importable(path) {
var name;
var pkg = eval(path);
var result = '\n';
for (name in pkg) {
result += 'if (name !== undefined) throw "import error: name already exists";\n'.replace(/name/g, name);
}
for (name in pkg) {
result += 'var name = path.name;\n'.replace(/name/g, name).replace('path', path);
}
return result;
}

I think any cases of eval being justified would be rare. You're more likely to use it thinking that it's justified than you are to use it when it's actually justified.
The security issues are the most well known. But also be aware that JavaScript uses JIT compilation and this works very poorly with eval. Eval is somewhat like a blackbox to the compiler, and JavaScript needs to be able to predict code ahead of time (to some extent) in order to safely and correctly apply performance optimisations and scoping. In some cases, the performance impact can even affect other code outside eval.
If you want to know more:
https://github.com/getify/You-Dont-Know-JS/blob/master/scope%20%26%20closures/ch2.md#eval

Only during testing, if possible. Also note that eval() is much slower than other specialized JSON etc. evaluators.

My belief is that eval is a very powerful function for client-side web applications and safe... As safe as JavaScript, which are not. :-) The security issues are essentially a server-side problem because, now, with tool like Firebug, you can attack any JavaScript application.

When is JavaScript's eval() not evil?
I'm always trying to discourage from using eval. Almost always, a more clean and maintainable solution is available. Eval is not needed even for JSON parsing. Eval adds to maintenance hell. Not without reason, it is frowned upon by masters like Douglas Crockford.
But I found one example where it should be used:
When you need to pass the expression.
For example, I have a function that constructs a general google.maps.ImageMapType object for me, but I need to tell it the recipe, how should it construct the tile URL from the zoom and coord parameters:
my_func({
name: "OSM",
tileURLexpr: '"http://tile.openstreetmap.org/"+b+"/"+a.x+"/"+a.y+".png"',
...
});
function my_func(opts)
{
return new google.maps.ImageMapType({
getTileUrl: function (coord, zoom) {
var b = zoom;
var a = coord;
return eval(opts.tileURLexpr);
},
....
});
}

Eval is useful for code generation when you don't have macros.
For (a stupid) example, if you're writing a Brainfuck compiler, you'll probably want to construct a function that performs the sequence of instructions as a string, and eval it to return a function.

While there may be numerous instances where you can accomplish what you need to accomplish by concatenating a script together and running it on the fly, you typically have much more powerful and maintainable techniques at your disposal. eval is rarely the right choice.: associative-array notation (obj["prop"] is the same as obj.prop), closures, object-oriented techniques, functional techniques - use them instead.

When you parse a JSON structure with a parse function (for example, jQuery.parseJSON), it expects a perfect structure of the JSON file (each property name is in double quotes). However, JavaScript is more flexible. Therefore, you can use eval() to avoid it.

Javascript build system to handle large objects

I have a huge web app in Javascript and its starting to become something of a hassle to manage everything. I broke everything up into little files each with their own subsection of the app.
eg. if the app is named "myApp" I have a file called
myApp.ajax.js which contains
myApp.ajax = (function(){return {/*stuff*/}})();
and one called
myApp.canvas.js which contains
myApp.canvas = (function(){return {/*stuff*/}})();
so on and so forth. When I concatenate all the files and minify them I get a huge garbled mess of all of this together. So I was wondering, Is there a build system that would turn all of this int one single
var myApp = {
ajax: /*stuff*/,
canvas: /*stuff*/,
/*etc*/
}
when it compiles everything?
I ask because I ran a small test and noticed a serious perfomance decay when having each part of the object seperate. Test is here: http://jsperf.com/single-object-vs-multiples

I'm not sure if I get the point of this. Concatenating and minifying JavaScript will always end up in a fairly garbled mess (at least to read). Just make sure that you concatenate first and then minify, then the compiler you are using can optimize the whole thing.
And as for performance concerns. The JSPerf test told me, that the way you attach your modules is roughly 12% slower (at least in Firefox, seems to be different for V8). But you are doing it only once at application load - not 1000000 times. That can only make a difference somewhere in the microseconds at page load.

From what I gather from your question and what I have seen people tend to use make to collate multiple js files into one and then run compression against it etc see this

Yep, there is http://brunch.io/ which handles concatenation and stuff like that for you.

Does Javascript compile or two-pass interpret?

I'm an admitted novice JavaScript programmer and am attempting to learn more. So I turn to you folks for help, with this easy question :). The O'Reilly book that I'm reading keeps referring to the compile-time of the JavaScript code. My knowledge of functional programming (scheme and the likes) tells me that the JavaScript is actually interpreted by the browser, most likely requiring two passes through the JavaScript.
Am I incorrect in my assessment? Or is the compile-time that the book references actually just the first pass of the interpreter, similar to how Perl or Python would function? Thanks!

It is browser-dependent. Look up WebKit's SquirrelFish Extreme and Google V8 to see what's at the fastest end of things, and look at Mozilla's JaegerMonkey for that implementation.
AFIAK V8 and SFX are JITs, so they compile JS code to native. JaegerMonkey and TraceMonkey combine in Firefox to form a system where if code would be faster traced, TraceMonkey executes it, and if code were faster native, JaegerMonkey compiles it, just like SFX.

Do you have a sentence that you could quote to help with context?
Javascript is compiled at the browser (it's sent to the browser in plain source). But it only gets compiled as it is loaded. So if you have a script tag followed by a div tag followed by a script tag then it will load those things sequentially. The browser will stop loading the entire page (it still downloads resources, just doesn't load HTML) until your script has been loaded (this is because the script may have 'document.write' within it).
<script>
var someVariable = 'hello world';
alert(document.getElementById('someid')); //alerts undefined
</script>
<div id='someid'></div>
<script>
alert(document.getElementById('someid')); //alerts 'someid'
alert(someVariable); //alerts 'hello world'
</script>

There's read-time and run-time in JS (as I like to think of it, since it's not really compiled, but interpreted). It sounds like the O'Reilly book is using compile-time as a synonym for read-time.
Read-time is when the engine reads all of the code and evaluates everything at the global scope. Usually this sets up hooks on events that will trigger code execution.
Run-time is everything else.

What are the arguments against the inclusion of server side scripting in JavaScript code blocks?

I've been arguing for some time against embedding server-side tags in JavaScript code, but was put on the spot today by a developer who seemed unconvinced
The code in question was a legacy ASP application, although this is largely unimportant as it could equally apply to ASP.NET or PHP (for example).
The example in question revolved around the use of a constant that they had defined in ServerSide code.
'VB
Const MY_CONST: MY_CONST = 1
If sMyVbVar = MY_CONST Then
'Do Something
End If
//JavaScript
if (sMyJsVar === "<%= MY_CONST%>"){
//DoSomething
}
My standard arguments against this are:
Script injection: The server-side tag could include code that can break the JavaScript code
Unit testing. Harder to isolate units of code for testing
Code Separation : We should keep web page technologies apart as much as possible.
The reason for doing this was so that the developer did not have to define the constant in two places. They reasoned that as it was a value that they controlled, that it wasn't subject to script injection. This reduced my justification for (1) to "We're trying to keep the standards simple, and defining exception cases would confuse people"
The unit testing and code separation arguments did not hold water either, as the page itself was a horrible amalgam of HTML, JavaScript, ASP.NET, CSS, XML....you name it, it was there. No code that was every going to be included in this page could possibly be unit tested.
So I found myself feeling like a bit of a pedant insisting that the code was changed, given the circumstances.
Are there any further arguments that might support my reasoning, or am I, in fact being a bit pedantic in this insistence?

Script injection: The server-side tag could include code that can break the JavaScript code
So write the code properly and make sure that values are correctly escaped when introduced into the JavaScript context. If your framework doesn't include a JavaScript "quoter" tool (hint: the JSON support is probably all you need), write one.
Unit testing. Harder to isolate units of code for testing
This is a good point, but if it's necessary for the server to drop things into the page for code to use, then it's necessary. I mean, there are times when this simply has to be done. A good way to do it is for the page to contain some sort of minimal block of data. Thus the server-munged JavaScript on the page really isn't "code" to be tested, it's just data. The real client code included from .js files can find the data and use it.
Thus, the page may contain:
<script>
(function(window) {
window['pageData'] = {
companyName: '<%= company.name %>',
// etc
};
})(this);
</script>
Now your nicely-encapsulated pure JavaScript code in ".js" files just has to check for window.pageData, and it's good to go.
Code Separation : We should keep web page technologies apart as much as possible.
Agreed, but it's simply a fact that sometimes server-side data needs to drive client-side behavior. To create hidden DOM nodes solely for the purpose of storing data and satisfying your rules is itself a pretty ugly practice.
Coding rules and aesthetics are Good Things. However, one should be pragmatic and take everything in perspective. It's important to remember that the context of such rules is not always a Perfect Divine Creation, and in the case of HTML, CSS, and JavaScript I think that fact is glaringly clear. In such an imperfect environment, hard-line rules can force you into unnecessary work and code that's actually harder to maintain.
edit — oh here's something else I just thought of; sort-of a compromise. A "trick" popularized (in part) by the jQuery gang with their "micro template" facility (apologies to the web genius who actually hit upon this first) is to use <script> tags that are sort-of "neutered":
<script id='pageData' type='text/plain'>
{
'companyName': '<%= company.name %>',
'accountType': '<%= user.primaryAccount.type %>',
// etc
}
</script>
Now the browser itself will not even execute that script - the "type" attribute isn't something it understands as being code, so it just ignores it. However, browsers do make the content of such scripts available, so your code can find the script by "id" value and then, via some safe JSON library or a native browser API if available, parse the notation and extract what it needs. The values still have to be properly quoted etc, but you're somewhat safer from XSS holes because it's being parsed as JSON and not as "live" full-blown JavaScript.

The reason for doing this was so that the developer did not have to define the constant in two places.
To me, this is a better argument than any argument you can make against it. It is the DRY principle. And it greatly enhances code maintainability.
Every style guide/rule taken to extreme leads to an anti-pattern. In this case your insistence of separation of technology breaks the DRY principle and can potentially make code harder to maintain. Even DRY itself if taken to extreme can lead to an anti-pattern: softcoding.
Code maintainability is a fine balance. Style guides are there to help maintain that balance. But you have to know when those very guides help and when they themselves become a problem.
Note that in the example you have given the code would not break syntax hilighting or parsing (even stackoverflow hilights it correctly) so the IDE argument would not work since the IDE can still parse that code correctly.

it simply gets unreadable. You have to take a closer look to divide the different languages. If JavaScript and the mixed-in language use the same variable names, things are getting even worse. This is especially hard for people that have to look at others people code.
Many IDEs have problems with syntax highlighting of heavily mixed documents, which can lead to the loss of Auto-Completion, proper Syntax Highlighting and so on.
It makes the code less re-usable. Think of a JavaScript function that does a common task, like echoing an array of things. If you separate the JavaScript-logic from the data it's iterating over, you can use the same function all over your application, and changes to this function have to be done only once. If the data it's iterating over is mixed with the JavaScript output loop you probably end up repeating the JavaScript code just because the mixed in language has an additional if-statement before each loop.

Develop Reference

JavaScript is the programming language of the Web.