I am looking at an example in dotnet which looks like the following: https://dotnetfiddle.net/t0y8yD.
The output for the HttpServerUtility.UrlTokenEncode method is:
Pn55YBwEH2S2BEM5qlNrq-LMNE8BDdHYwbWKFEHiPZo1
When I try to complete the same in NodeJS with encodeURI, encodeURIComponent or any other attempt I get the following:
Pn55YBwEH2S2BEM5qlNrq+LMNE8BDdHYwbWKFEHiPZo=
As you can see from the above the '-' should be a '+' and the last character part is different. The hash is created the same and outputs the same buffer.
var hmac = crypto.createHmac("sha256", buf);
hmac.update("9644873");
var hash = hmac.digest("base64");
How can I get the two to match? One other important note is that this is one use case and I am unsure if there are other chars that do the same.
I am unsure if the dotnet variant is incorrect or the NodeJS version is. However, the comparison will be done on the dotnet side, so I need node to match that.
The difference of the two results is caused by the use of Base64URL encoding in the C# code vs. Base64 encoding in node.js.
Base64URL and Base64 are almost identical, but Base64 encoding uses the characters +, / and =, which have a special meaning in URLs and thus have to be avoided. In Base64URL encoding + is replaced with -, / with _ and = (the padding character on the end) is either replaced with %20 or simply omitted.
In your code you're calculating a HMAC-SHA256 hash, so you get a 256 bit result, which can be encoded in 32 bytes. In Base64/Base64URL every character represents 6 bits, therefore you would need 256/6 = 42,66 => 43 Base64 characters. With 43 characters you would have 2 'lonesome' bits on the end, therefore a padding char (=) is added.
The question now is why HttpServerUtility.UrlTokenEncode adds a 1 as a replacement for the padding char on the end. I didn't find anything in the documentation. But you you should keep in mind that it's insignificant anyway.
To to get the same in node.js, you can use the package base64url, or just use simple replace statements on the base64 encoded hash.
With base64url package:
const base64url = require('base64url');
var hmacB64 = "Pn55YBwEH2S2BEM5qlNrq+LMNE8BDdHYwbWKFEHiPZo="
var hmacB64url = base64url.fromBase64(hmacb64)
console.log(hmacB64url)
The result is:
Pn55YBwEH2S2BEM5qlNrq-LMNE8BDdHYwbWKFEHiPZo
as you can see, this library just omits the padding char.
With replace, also replacing the padding = with 1:
var hmacB64 = "Pn55YBwEH2S2BEM5qlNrq+LMNE8BDdHYwbWKFEHiPZo="
console.log(hmacb64.replace(/\//g,'_').replace(/\+/g,'-').replace(/\=+$/m,'1'))
The result is:
Pn55YBwEH2S2BEM5qlNrq-LMNE8BDdHYwbWKFEHiPZo1
I tried the C# code with different data and always got '1' on the end, so to replace = with 1 seems to be ok, though it doesn't seem to be conform to the RFC.
The other alternative, if this is an option for you, is to change the C# code. Use normal base64 encoding plus string replace to get base64url output instead of using HttpServerUtility.UrlTokenEncode
A possible solution for that is described here
I'm new here so I can't comment (need 50 reputation), but I would like to add to #jqs answer that if the string ends with two "=", the replace needs to be done with "2". So my replace looks like:
hmacb64.replace(///g,'_').replace(/+/g,'-').replace(/\=\=$/m,'2').replace(/\=$/m,'1')
Related
I have a JSON object where one attribute contains a static special character - https://www.compart.com/en/unicode/U+1F514
I have tried to store the string both as encoded UTF-8 "\xF0\x9F\x94\x94"
or tried to print it using its HEX value - String.fromCharCode(0x1F514) or decimal value String.fromCharCode(128276)
But it all results in an empty charater/empty square character in Google Chrome.
How can I please store this character properly, statically in a simple JSON {header1:"____"} and then echo it?
Also not able to display it in IntelliJ - so if you have a comment regarding this side issue would be very thankful.
For historial reasons, JavaScript doesn't have full Unicode support because language creators assumed that UTF-16 would never need more than 2-bytes to encode a single character. JSON inherits that and \u entities only accept 4 hexadecimal characters.
You need to use a workaround that basically consists on splitting the actual 4-byte UTF-16 character in two 2-byte characters, as in:
var raw = "🔔";
var doesNotWork = "\u1F514";
var works = "\uD83D\uDD14";
console.log(raw, doesNotWork, works);
... or get rid of entities and just dump the actual binary character:
var data = ["🔔"];
var json = JSON.stringify(data);
console.log(json, JSON.parse(json));
I think that the problem is that the font doesn't have support for such symbol, hence the square character being drawn. If there is not an specific reason as why you are using this character, you could draw it with an icon, or using a character in an icon font.
I need to insert an Omega (Ω) onto my html page. I am using its HTML escaped code to do that, so I can write Ω and get Ω. That's all fine and well when I put it into a HTML element; however, when I try to put it into my JS, e.g. var Omega = Ω, it parses that code as JS and the whole thing doesn't work. Anyone know how to go about this?
I'm guessing that you actually want Omega to be a string containing an uppercase omega? In that case, you can write:
var Omega = '\u03A9';
(Because Ω is the Unicode character with codepoint U+03A9; that is, 03A9 is 937, except written as four hexadecimal digits.)
Edited to add (in 2022): There now exists an alternative form that better supports codepoints above U+FFFF:
let Omega = '\u{03A9}';
let desertIslandEmoji = '\u{1F3DD}';
Judging from https://caniuse.com/mdn-javascript_builtins_string_unicode_code_point_escapes, most or all browsers added support for it in 2015, so it should be reasonably safe to use.
Although #ruakh gave a good answer, I will add some alternatives for completeness:
You could in fact use even var Omega = 'Ω' in JavaScript, but only if your JavaScript code is:
inside an event attribute, as in onclick="var Omega = 'Ω';
alert(Omega)" or
in a script element inside an XHTML (or XHTML + XML) document
served with an XML content type.
In these cases, the code will be first (before getting passed to the JavaScript interpreter) be parsed by an HTML parser so that character references like Ω are recognized. The restrictions make this an impractical approach in most cases.
You can also enter the Ω character as such, as in var Omega = 'Ω', but then the character encoding must allow that, the encoding must be properly declared, and you need software that let you enter such characters. This is a clean solution and quite feasible if you use UTF-8 encoding for everything and are prepared to deal with the issues created by it. Source code will be readable, and reading it, you immediately see the character itself, instead of code notations. On the other hand, it may cause surprises if other people start working with your code.
Using the \u notation, as in var Omega = '\u03A9', works independently of character encoding, and it is in practice almost universal. It can however be as such used only up to U+FFFF, i.e. up to \uffff, but most characters that most people ever heard of fall into that area. (If you need “higher” characters, you need to use either surrogate pairs or one of the two approaches above.)
You can also construct a character using the String.fromCharCode() method, passing as a parameter the Unicode number, in decimal as in var Omega = String.fromCharCode(937) or in hexadecimal as in var Omega = String.fromCharCode(0x3A9). This works up to U+FFFF. This approach can be used even when you have the Unicode number in a variable.
One option is to put the character literally in your script, e.g.:
const omega = 'Ω';
This requires that you let the browser know the correct source encoding, see Unicode in JavaScript
However, if you can't or don't want to do this (e.g. because the character is too exotic and can't be expected to be available in the code editor font), the safest option may be to use new-style string escape or String.fromCodePoint:
const omega = '\u{3a9}';
// or:
const omega = String.fromCodePoint(0x3a9);
This is not restricted to UTF-16 but works for all unicode code points. In comparison, the other approaches mentioned here have the following downsides:
HTML escapes (const omega = 'Ω';): only work when rendered unescaped in an HTML element
old style string escapes (const omega = '\u03A9';): restricted to UTF-16
String.fromCharCode: restricted to UTF-16
The answer is correct, but you don't need to declare a variable.
A string can contain your character:
"This string contains omega, that looks like this: \u03A9"
Unfortunately still those codes in ASCII are needed for displaying UTF-8, but I am still waiting (since too many years...) the day when UTF-8 will be same as ASCII was, and ASCII will be just a remembrance of the past.
I found this question when trying to implement a font-awesome style icon system in html. I have an API that provides me with a hex string and I need to convert it to unicode to match with the font-family.
Say I have the string const code = 'f004'; from my API. I can't do simple string concatenation (const unicode = '\u' + code;) since the system needs to recognize that it's unicode and this will in fact cause a syntax error if you try.
#coldfix mentioned using String.fromCodePoint but it takes a number as an argument, not a string.
To finally cross the finish line, just add parseInt and pass 16 (since hex is base 16) to it's second parameter. You'll finally get a unicode string from a simple hex string.
This is what I did:
const code = 'f004';
const toUnicode = code => String.fromCodePoint(parseInt(code, 16));
toUnicode(code);
// => '\uf004'
Try using Function(), like this:
var code = "2710"
var char = Function("return '\\u"+code+"';")()
It works well, just do not add any 's or "s or spaces.
In the example, char is "✐".
My page state can be described by a JavaScript object that can be serialized into JSON. But I don't think a JSON string is suitable for use in a fragment ID due to, for example, the spaces and double-quotes.
Would encoding the JSON string into a base64 string be sensible, or is there a better way? My goal is to allow the user to bookmark the page and then upon returning to that bookmark, have a piece of JavaScript read window.location.hash and change state accordingly.
I think you are on a good way. Let's write down the requirements:
The encoded string must be usable as hash, i.e. only letters and numbers.
The original value must be possible to restore, i.e. hashing (md5, sha1) is not an option.
It shouldn't be too long, to remain usable.
There should be an implementation in JavaScript, so it can be generated in the browser.
Base64 would be a great solution for that. Only problem: base64 also contains characters like - and +, so you win nothing compared to simply attaching a JSON string (which also would have to be URL encoded).
BUT: Luckily, theres a variant of base64 called base64url which is exactly what you need. It is specifically designed for the type of problem you're describing.
However, I was not able to find a JS implementation; maybe you have to write one youself – or do a bit more research than my half-assed 15 seconds scanning the first 5 Google results.
EDIT: On a second thought, I think you don't need to write an own implementation. Use a normal implementation, and simply replace the “forbidden” characters with something you find appropriate for your URLs.
Base64 is an excellent way to store binary data in text. It uses just 33% more characters/bytes than the original data and mostly uses 0-9, a-z, and A-Z. It also has three other characters that would need encoded to be stored in the URL, which are /, =, and +. If you simply used URL encoding, it would take up 300% (3x) the size.
If you're only storing the characters in the fragment of the URL, base64-encoded text it doesn't need to be re-encoded and will not change. But if you want to send the data as part of the actual URL to visit, then it matters.
As referenced by lxg, there there is a base64url variant for that. This is a modified version of base64 to replace unsafe characters to store in the URL. Here is how to encode it:
function tobase64url(s) {
return btoa(x).replace(/\+/g,'-').replace(/\//g,'_').replace(/=/g,'');
}
console.log(tobase64url('\x00\xff\xff\xf1\xf1\xf1\xff\xff\xfe'));
// Returns "AP__8fHx___-" instead of "AP//8fHx///+"
And to decode a base64 string from the URL:
function frombase64url(s) {
return atob(x.replace(/-/g,'+').replace(/_/g, '/'));
}
Use encodeURIComponent and decodeURIComponent to serialize data for the fragment (aka hash) part of the URL.
This is safe because the character set output by encodeURIComponent is a subset of the character set allowed in the fragment. Specifically, encodeURIComponent escapes all characters except:
A - Z
a - z
0 - 9
- . _ ~ ! ' ( ) *
So the output includes the above characters, plus escaped characters, which are % followed by hexadecimal digits.
The set of allowed characters in the fragment is:
A - Z
a - z
0 - 9
? / : # - . _ ~ ! $ & ' ( ) * + , ; =
percent-encoded characters (a % followed by hexadecimal digits)
This set of allowed characters includes all the characters output by encodeURIComponent, plus a few other characters.
In my app a user submits text through a form's textarea and this text is passed on to the app and is then processed by jsesc library, which escapes javascript strings.
The problem is that when I type in a text in Russian, such as
нам #интересны наши #идеи
what i get is
'\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
I then need to pass this data through FlowDock to extract hashtags and FlockDock just does not recognize it.
Can someone please tell me
1) What is the need for converting it into that representation;
2) If it makes sense to convert it back to cyrillic encoding for FlowDock and for the database, or shall I keep it in Unicode and try to make FlowDock work with it?
Thanks!
UPDATE
The complete script is:
result = getField(req, field);
result = S(result).trim().collapseWhitespace().s;
// at this point result = "нам #интересны наши #идеи"
result = jsesc(result, {
'quotes': 'double'
});
// now i end up with Unicode as above above (\u....)
var hashtags = FlowdockText.extractHashtags(result);
FlowDock receives the result which is
\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438
And doesn't extract hashtags from it...
These are 2 representations of the same string:
'нам #интересны наши #идеи' === '\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438'
looks like flowdock-text doesn't work well with non-ASCII characters
UPD: Tried, actually works well:
fdt.extractHashtags('\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438');
You shouldn't have used escaping in the first place, it gives you string literal representation (suits for eval, etc), not a string.
UPD2: I've reduced you code to the following:
var jsesc = require('jsesc');
var fdt = require('flowdock-text');
var result = 'нам #интересны наши #идеи';
result = jsesc(result, {
'quotes': 'double'
});
var hashtags = fdt.extractHashtags(result);
console.log(hashtags);
As I said, the problem is with jsesc: you don't need it. It returns javascript-encoded string. You need when you are doing eval with concatenation to protect from code injection, or something like this. For example if you add result = eval('"' + result + '"');, it will work.
What is the need for converting it into that representation?
jsesc is a JavaScript library for escaping JavaScript strings while generating the shortest possible valid ASCII-only output. Here’s an online demo.
This can be used to avoid mojibake and other encoding issues, or even to avoid errors when passing JSON-formatted data (which may contain U+2028 LINE SEPARATOR, U+2029 PARAGRAPH SEPARATOR, or lone surrogates) to a JavaScript parser or an UTF-8 encoder, respectively.
Sounds like in this case you don’t intend to use jsesc at all.
Try this:
decodeURIComponent("\u043D\u0430\u043C #\u0438\u043D\u0442\u0435\u0440\u0435\u0441\u043D\u044B \u043D\u0430\u0448\u0438 #\u0438\u0434\u0435\u0438");
I have a question about encoding special/extended UTF-8 characters in URLs in JavaScript. The same question applies to many characters like the Registered R-circle, but my example uses an umlaut:
ü = %C3%BC in UTF-8 (four rows from bottom of http://www.utf8-chartable.de/)
If the url contains an umlaut represented as UTF-8 (ü = %C3%BC), and I run it through encodeURIComponent, the %s are encode, the string now looks like "%25C3%25BC" and it gets correctly processed by my system. This is good.
url = "http://foo.com/bar.html?%C3%BC"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%25C3%25BC"
However, the bad: If the pre-encoded string has an unencoded character, the actual umlaut, the after encoding is looks like "%C3%BC" and fails because, I believe, the %s should be encoded, too.:
url = "http://foo.com/bar.html?ü"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%C3%BC"
I think it fails because it is less thoroughly encoded than the rest of the url.
So, beyond general advice or answers to questions I don't know to ask, what I think i want to know is how to get the raw umlaut (and all other special characters) to fully encode. Is that what is incorrect?
Thanks for your help!
Nate
You cannot encode a URL all at once. If you have already concatenated the host, path, parameters, etc., together then it's impossible to correctly determine which characters actually need to be encoded and which characters are separators that need to be left alone.
The only reliable way to build a URL is by concatenating already-encoded values:
"http://foo.com/bar.html?" + encodeURIComponent("%C3%BC")