Escaping raw, unescaped strings in bookmarklet - javascript

I’m trying to write a search engine bookmarklet (for Chrome), but I’m having trouble escaping the string.
For example if the search engine bookmarklet is the following:
javascript:alert("%s"); //%s is the search engine query, passed literally by chrome.
Then running it on the following string will give incorrect results:
c:\zebra
c:zebra instead of c:\zebra
If the character after the slash happens to be an actual escape character, then the results will vary depending on the character.
I’ve tried escaping and unescaping the string, I’ve tried reg-ex’ing it, and replacing the slash with a double-slash, but I cannot figure out a way to get this to work because the first time that the raw string enters the script, it is unescaped, and any operation after that will see it incorrectly.
How can this be handled correctly?

So far I can only make this work in chrome:
javascript: var str = (function(){STARTOFSTRING:/*%s*/ENDOFSTRING:;}).toString().match( /STARTOFSTRING:\/\*([\s\S]*)\*\/ENDOFSTRING:/ )[1]; alert(str);
writing c:\zebra will alert c:\zebra.
Firefox doesn't sustain the comments inside the function body when decompiled, unfortunately.
You also can't write the sequence */ in the string, but everything else should be passed literally, including quotes " ' etc

Related

 appearing in textarea elements but not in string

I am working on an autocomplete used inside a textarea. I know there is some autocompletes already created, but anyway.
It works well, but if when I'm typing something and I select one or many characters and delete it, a  appears at the end of my string (or where I was inside it). I tried to replace it while retrieving my html with replaceAll, but it doesn't work (There is not this special char when I use an indexOf). The problem is he doesn't find any result because of this char. Let's see an exemple :
This is my array (a little bit cut but we don't really care)
let array = [{
name: "test",
value: "I'm a test value"
},
{
name: "valueorange",
value: "I'm just an orange"
},
// This is how I get the contents of my span (I tried both innerHTML and innerText, same results).
// Same while using .text() or .html() with jquery
let value = jqElement.find("#searching-span")[0].innerHTML.substring(1).toLowerCase();
value = value.replaceAll(" ", " ");
value = value.replaceAll("", "");
I can replace every without any problems. Finally I check with a loop if there is some value with indexOf on each value, and if it returns anything I push it and get it in a new array. But when I have  I have no results.
Any idea how I can resolve it ?
I tried to be clear, I hope my english wasn't so bad, sorry if I made many mistakes !
Character entities and HTML escaped characters like and  appearing in HTML source code are converted by the HTML parser into unicode characters like \u00a0 and \ufeff before being inserted into the DOM.
If replacing them in JavaScript, use their unicode characters, not HTML escape sequences, to match them in DOM strings. For example:
p.textContent = p.textContent.replaceAll("\ufeff", '*'); // zwj
p.textContent = p.textContent.replaceAll("\xa0", '-'); // nbsp
<p id="p">   </p>
Note that zero width joiners are uses a lot in emoji character sequences and arbitrarily removing may break emoji character decoding (although decoding badly formed emoji strings is almost a prerequisite for handling emojis in the wild).
Second note: I am not suggesting this as a means of circumventing badly decoding characters that have been encoded using a Unicode Transform Format. Making sure decoding is performed correctly is always a better option.

Javascript How to escape \u in string literal

Strange thing...
I have a string literal that is passed to my source code as a constant token (I cannot prehandle or escape it beforehand).
Example
var username = "MYDOMAIN\tom";
username = username.replace('MYDOMAIN','');
The string somewhere contains a backslash followed by a character.
It's too late to escape the backslash at this point, so I have to escape these special characters individually like
username = username.replace(/\t/ig, 't');
However, that does not work in the following scenario:
var username = "MYDOMAIN\ulrike";
\u seems to introduce a unicode character sequence. \uLRIK cannot be interpreted as a unicode sign so the Javascript engine stops interpreting at this point and my replace(/\u/ig,'u') comes too late.
Has anybody a suggestion or workaround on how to escape such a non-unicode character sequence contained in a given string literal? It seems a similar issue with \b like in "MYDOMAIN\bernd".
I have a string literal that is passed to my source code
Assuming you don't have any < or >, move this to inside an HTML control (instead of inside your script block) or element and use Javacript to read the value. Something like
<div id="myServerData">
MYDOMAIN\tom
</div>
and you retrieve it so
alert(document.getElementById("myServerData").innerText);
IMPORTANT : injecting unescaped content, where the user can control the content (say this is data entered in some other page) is a security risk. This goes for whether you are injecting it in script or HTML
Writing var username = "MYDOMAIN\ulrike"; will throw a syntax error. I think you have this string coming from somewhere.
I would suggest creating some html element and setting it's innerHTML to the received value, and then picking it up.
Have something like:
<div id="demo"></div>
Then do document.getElementById("demo").innerHTML = username;
Then read the value from there as document.getElementById("demo").innerHTML;
This should work I guess.
Important: Please make sure this does not expose the webpage to script injections. If it does, this method is bad, don't use it.

JS - JSON.parse - preserve special characters

I'm running a NodeJS app that gets certain posts from an API.
When trying to JSON.parse with special characters in, the JSON.parse would fail.
Special characters can be just any other language, emojis etc.
Parsing works fine when posts don't have special characters.
I need to preserve all of the text, I can't just ignore those characters since I need to handle every possible language.
I'm getting the following error:
"Unexpected token �"
Example of a text i'm supposed to be able to handle:
"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"
How can I properly parse such a text?
Thanks
You have misdiagnosed your problem, it has nothing to do with that character.
Your code contains an unescaped " immediately before the special character you think is causing the problem. The early " is prematurely terminating the string.
If you insert a backslash to escape the ", your string can be parsed as JSON just fine:
x = '{"summary": "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦\\"�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}';
console.log(JSON.parse(x));
You need to pass a string not as an object.
Example
JSON.parse('{"summary" : "a"}');
In your case it should be like this
JSON.parse(
'{"summary" : "★リプライは殆ど見てません★ Tokyo-based E-J translator. ここは流れてくるニュースの自分用記録でRT&メモと他人の言葉の引用、ブログのフィード。ここで意見を述べることはしません。「交流」もしません。関心領域は匦�アイルランドと英国(他は専門外)※Togetterコメ欄と陰謀論が嫌いです。"}')

Escaping JavaScript special characters from ASP.NET

I have following C# code in my ASP.NET application:
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
CompoundErrStr is an error message generated by SQL Server (exception text bubbled up from the stored procedure). If it contains any table column names they are enclosed in single quotes and JavaScript breaks during execution because single quotes are considered a string terminator.
As a fix for single quotes I changed my code to this:
CompoundErrStr = CompoundErrStr.Replace("'", #"\'");
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
and it now works fine.
However, are there any other special characters that need to be escaped like this? Is there a .Net function that can be used for this purpose? Something similar to HttpServerUtility.HtmlEncode but for JavaScript.
EDIT I use .Net 3.5
Note: for this task you can't (and you shouldn't) use HTML encoders (like HttpServerUtility.HtmlEncode()) because rules for HTML and for JavaScript strings are pretty different. One example: string "Check your Windows folder c:\windows" will be encoded as "Check your Windows folder c:'windows" and it's obviously wrong. Moreover it follows HTML encoding rules then it won't perform any escaping for \, " and '. Simply it's for something else.
If you're targeting ASP.NET Core or .NET 5 then you should use System.Text.Encodings.Web.JavaScriptEncoder class.
If you're targeting .NET 4.x you can use HttpUtility.JavaScriptStringEncode() method.
If you're targeting .NET 3.x and 2.x:
What do you have to encode? Some characters must be escaped (\, " and ') because they have special meaning for JavaScript parser while others may interfere with HTML parsing so should escaped too (if JS is inside an HTML page). You have two options for escaping: JavaScript escape character </kbd> or \uxxxx Unicode code points (note that \uxxxx may be used for them all but it won't work for characters that interferes with HTML parser).
You may do it manually (with search and replace) like this:
string JavaScriptEscape(string text)
{
return text
.Replace("\\", #"\u005c") // Because it's JS string escape character
.Replace("\"", #"\u0022") // Because it may be string delimiter
.Replace("'", #"\u0027") // Because it may be string delimiter
.Replace("&", #"\u0026") // Because it may interfere with HTML parsing
.Replace("<", #"\u003c") // Because it may interfere with HTML parsing
.Replace(">", #"\u003e"); // Because it may interfere with HTML parsing
}
Of course </kbd> should not be escaped if you're using it as escape character! This blind replacement is useful for unknown text (like input from users or text messages that may be translated). Note that if string is enclosed with double quotes then single quotes don't need to be escaped and vice-versa). Be careful to keep verbatim strings on C# code or Unicode replacement will be performed in C# and your client will receive unescaped strings. A note about interfere with HTML parsing: nowadays you seldom need to create a <script> node and to inject it in DOM but it was a pretty common technique and web is full of code like + "</s" + "cript>" to workaround this.
Note: I said blind escaping because if your string contains an escape sequence (like \uxxxx or \t) then it should not be escaped again. For this you have to do some tricks around this code.
If your text comes from user input and it may be multiline then you should also be ready for that or you'll have broken JavaScript code like this:
alert("This is a multiline
comment");
Simply add .Replace("\n", "\\n").Replace("\r", "") to previous JavaScriptEscape() function.
For completeness: there is also another method, if you encode your string Uri.EscapeDataString() then you can decode it in JavaScript with decodeURIComponent() but this is more a dirty trick than a solution.
While the original question mentions .NET 3.5, it should be known to users of 4.0+ that you can use HttpUtility.JavaScriptStringEncode("string")
A second bool parameter specifies whether to include quotation marks (true) or not (false) in the result.
All too easy:
#Html.Raw(myString)

Regex won't find '\u2028' unicode characters

We're having a lot of trouble tracking down the source of \u2028 (Line Separator) in user submitted data which causes the 'unterminated string literal' error in Firefox.
As a result, we're looking at filtering it out before submitting it to the server (and then the database).
After extensive googling and reading of other people's problems, it's clear I have to filter these characters out before submitting to the database.
Before writing the filter, I attempted to search for the character just to ensure it can find it using:
var index = content.search("/\u2028/");
alert("Index: [" + index + "]");
I get -1 as the result everytime, even when I know the character is in the content variable (I've confirmed via a Java jUnit test on the server side).
Assuming that content.replace() would work the same way as search(), is there something I'm doing wrong or anything I'm missing in order to find and strip these line separators?
Your regex syntax is incorrect. You only use the two forward slashes when using a regex literal. It should be just:
var index = content.search("\u2028");
or:
var index = content.search(/\u2028/); // regex literal
But this should really be done on the server, if anywhere. JavaScript sanitization can be trivially bypassed. It's only useful for user convenience, and I don't think accidentally entering line separator is that common.

Categories

Resources