How to change the don't-escape HTML delimiter in Mustache.js - javascript

I know I can change the default delimiter using Mustache.tags('[[', ']]');
I dig into the source code, but I can't find and figure out how to change the don't-escape HTML delimiter, which is {{{ }}} by default. Any help is appreciated.

I believe your question is how to turn off the default html entity escaping behaviour of a mustache template when you have specified custom delimiters. This can be a bit confusing since the default behaviour, that you will see if you look this up, is to use triple braces such as {{{some-value}}}. I'm going to assume you mean from a users point of view and not a developers point of view - despite the reference to the source code.
There are two ways:
Mustache provides an alternative syntax for turning off HTML escaping using the & character. So with your custom delimiters of '[[' and ']]' you would specify your placeholder as
[[&some-value]]
Simply use '{ }' within your custom delimiters. E.g.
[[{some-value}]]
I don't believe there is any way to change either of these inner syntaxes. Some templating systems are a lot more flexible (e.g. doT uses regexes for all matching), but mustache is less flexible (which many will see as an advantage)
Hope that clears things up. I know this is an old question, but perhaps this might still
help you or anyone else also looking this up.

Changing don't-escape HTML delimiter is only possible by modifying the source, because it's hardcoded into the parser and defined as openingTag + "{" and "}" + closingTag. And with hardcoded I mean, that you'd possibly have to change logic, not just a (few) regex. Thanks to #Thomas to dedicate his time for me.

Related

angular translate sanitize / escape

I got a strange or maybe intended behavior with angular translate.
Our value strategie is
$translateProvider.useSanitizeValueStrategy('sanitize');
We use mostly the translate filter in our application, but when it comes to special characters we get for example instead of Überschrift something like &#220 ;berschrift.
If I use the directive it works.
If I use the filter this only works when the sanitize strategy is set to "escaped".
Is there another solution than to rewrite ALL the translation filters to directives?
Here is my plnkr http://plnkr.co/edit/QIMVQcyH5APeYxNnS82v
For your information,
I can't simply use the "escaped" strategy, because we use angular translate variables as well and these variables contain sometimes even html tags.
Thanks!
Use sanitizeParameters instead of sanitize. Here is the fixed plnkr: http://plnkr.co/edit/qicVqPXn3qo6hMNa1fY2?p=preview
(EDIT: 07/10/2016): There is a significant difference between the two sanitization strategies. sanitizeParameters sanitizes the interpolation parameters and not the translated output. That means that it doesn't allow for changes in those parameters, but the translated content is still vulnerable since it's not sanitized.
The problem with sanitize and UTF-8 characters is a known issue and I believe it's being worked on.
$translateProvider.useSanitizeValueStrategy(['escape', 'sanitizeParameters']);
This works for my project. I hope this is secure enough.
Source: https://stackoverflow.com/a/39118996/9798484

Java Regex replace function not working as intended

I need some help with a JS Regex.
Here's the string I'm passing, I want to delete everything before 'Hanyuu-sama' with JS Replace.
Hanyuu","dj":{"id":18,"djname":"Hanyuu-sama
The first and second "Hanyuu" can change, the id number can change. This has already been cropped quite a bit with regular expressions.
Now I've tried a few and surprisingly it's failing when I do simple and complex regexes:
I've tried:
.*\"
And it does nothing, I've tried disgusting stuff in my desperation:
.*\","dj\":{\"id":.*,\"djname\":\"
And nada.
Here's a JS Fiddle and here's a http://regex101.com/r/tE2uY0/1 Regex JS matching platform.
Does anyone know why this isn't working?
I know this is likely bad practice, I'm just trying to learn Regexes.
Bonus points if anyone can refer me to a good source to learn Regular expressions. I'd love a solution but I'd like to learn how to do this myself in the future and why this one failed even more.
Your method call should look like this:
source = source.replace(/.*"/, "");
Regular expression in javascript are written between /.../ and not "/.../" like they are in many other languages.
If your string is always structured like that and it does not contain any more characters, your regex should do the trick. That's because the * quantifier acts greedy by default, thus always matching the last " in the string.

Jison / Flex: Trying to capture anything (.*) between two tokens but having problems

I'm currently working on a small little dsl, not unlike rabl. I'm struggling with the implementation of one of my rules. Before we get to the problem, I'll explain a bit about my syntax/grammar.
In my little language you can define properties, object/array blocks, or custom blocks (these are all used to build a json object/array). A "custom block" can either be one that contains my standard expressions (property, object/array block, etc) or some JavaScript. These expressions are written as such -
-- An object block
object #model
-- A property node
property some, key(name="value")
-- A custom node
object custom_obj as
property some(name="key")
end
-- A custom script node
property full_name as (u)
// This is JavaScript
return u.first_name + ' ' + u.last_name;
end
end
The problem I'm running into is with my custom script node. I'm having a real hard defining the script token so that JISON can properly capture the stuff inside the block.
In my lexer, I currently have...
# script_param is basically a regex to match "(some_ident)"
{script_param} { this.begin('js'); return 'SCRIPT_PARAM'; }
<js>(.|\n|\r)*?"end" %{
this.popState();
yytext = yytext.substr(0, yyleng - 3).trim();
return 'SCRIPT';
%}
That SCRIPT token will basically match anything after (u) up to (and including) the end token (which usually ends a block). I really dislike this because my usual block terminator (end) is actually part of the script token, which feels totally hacky to me. Unfortunately, I'm not able to find a better way to capture ANYTHING between (..) and end.
I've tried writing a regex that captures anything that ends with a ";", but that poses problems when I have multiple script nodes in my dsl code. I've only been able to make this work by including the "end" keyword as part of my capture.
Here are the links to my grammar and lexer files.
I'd greatly appreciate any insight into solving my problem! If I didn't explain my problem clearly, let me know and I'll try my best to clarify!
Many thanks in advance!!
I will also happily accept any advice as to how to clean up my grammar. I'm still fairly new at this stuff and feel like my stuff is a mess right now :)
It's easy enough to match a string up to but not including the first instance of end:
([^e]|e[^n]|en[^d])*
(And it doesn't even need non-greedy repetition.)
However, that's not what you want. The included JavaScript might include:
variables whose names happen to include the characters end (tendency)
comments (/* Take the values up to the end of the line */)
character strings (if (word == "end"))
and, indeed, the word end itself, which is not a reserved word in js.
Really, the only clean solution is to be able to lex javascript. Fortunately, you don't have to do it precisely, because you're not interpreting it, but even so it is a bit of work. The most annoying part of javascript lexing, like other similar languages, is identifying when / is the beginning of a regular expression, and when it is just division; getting that right requires most of a javascript parser, particularly since it also interacts with the semicolon rule.
To deal with the fact that the included javascript might actually use a variable named end, you have a couple of choices, as far as I can see:
Document the fact that end is a reserved word.
Only recognize end when it appears outside of parentheses and in a place where a statement might start (not too difficult if you end up building enough of a JS parser to correctly identify regular expressions)
Only recognize end when it appears by itself on a line.
This last choice would really simplify your problem a lot, so you might want to think about it, although it's not really very elegant.

Finding comments in HTML

I have an HTML file and within it there may be Javascript, PHP and all this stuff people may or may not put into their HTML file.
I want to extract all comments from this html file.
I can point out two problems in doing this:
What is a comment in one language may not be a comment in another.
In Javascript, remainder of lines are commented out using the // marker. But URLs also contain // within them and I therefore may well eliminate parts of URLs if I
just apply substituting // and then the
remainder of the line, with nothing.
So this is not a trivial problem.
Is there anywhere some solution for this already available?
Has anybody already done this?
Problem 2: Isn't every url quoted, with either "www.url.com" or 'www.url.com', when you write it in either language? I'm not sure. If that's the case then all you haft to do is to parse the code and check if there's any quote marks preceding the backslashes to know if it's a real url or just a comment.
Look into parser generators like ANTLR which has grammars for many languages and write a nesting parser to reliably find comments. Regular expressions aren't going to help you if accuracy is important. Even then, it won't be 100% accurate.
Consider
Problem 3, a comment in a language is not always a comment in a language.
<textarea><!-- not a comment --></textarea>
<script>var re = /[/*]not a comment[*/]/, str = "//not a comment";</script>
Problem 4, a comment embedded in a language may not obviously be a comment.
<button onclick="// this is a comment//
notAComment()">
Problem 5, what is a comment may depend on how the browser is configured.
<noscript><!-- </noscript> Whether this is a comment depends on whether JS is turned on -->
<!--[if IE 8]>This is a comment, except on IE 8<![endif]-->
I had to solve this problem partially for contextual templating systems that elide comments from source code to prevent leaking software implementation details.
https://github.com/mikesamuel/html-contextual-autoescaper-java/blob/master/src/tests/com/google/autoesc/HTMLEscapingWriterTest.java#L1146 shows a testcase where a comment is identified in JavaScript, and later testcases show comments identified in CSS and HTML. You may be able to adapt that code to find comments. It will not handle comments in PHP code sections.
It seems from your word that you are pondering some approach based on regular expressions: it is a pain to do so on the whole file, try to use some tools to highlight or to discard interesting or uninteresting text and then work on what is left from your sieve according to the keep/discard criteria. Have a look at HTML::Tree and TreeBuilder, it could be very useful to deal with the HTML markup.
I would convert the HTML file into a character array and parse it. You can detect key strings like "<", "--" ,"www", "http", as you move forward and either skip or delete those segments.
The start/end indices will have to be identified properly, which is a challenge but you will have full power.
There are also other ways to simplify the process if performance is not a problem. For example, all tags can be grabbed with XML::Twig and the string can be parsed to detect JS comments.

Parsing Custom JavaScript Annotations

Implementing a large JavaScript application with a lot of scripts, its become necessary to put together a build script. JavaScript labels being ubiquitous, I've decided to use them as annotations for a custom script collator. So far, I'm just employing the use statement, like this:
use: com.example.Class;
However, I want to support an 'optional quotes' syntax, so the following would be parsed correctly as well
use: 'com.example.Class';
I'm currently using this pattern to parse the first form:
/\s*use:\s*(\S+);\s*/g
The '\S+' gloms all characters between the annotation name declaration and the terminating semi colon. What rule can I write to substitute for \S+ that will return an annotation value without quotes, no matter if it was quoted or not to begin with? I can do it in two steps, but I want to do it in one.
Thanks- I know I've put this a little awkwardly
Edit 1.
I've been able to use this, but IMHO its a mess- any more elegant solutions? (By the way, this one will parse ALL label names)
/\s*([a-z]+):\s*(?:['])([a-zA-Z0-9_.]+)(?:['])|([a-zA-Z0-9_.]+);/g
Edit 2.
The logic is the same, but expresses a little more succinctly. However, it poses a problem as it seems to pull in all sorts of javascript code as well.
/\s*([a-z]+):\s*'([\w_\.]+)'|([\w_\.]+);/g
Ok -this seemed to do it. Hope someone can improve on it.
/\s*([a-z]+): *('[\w_\/\.]+'|[\w_\/\.]+);/g

Categories

Resources