Question: can I be certain that Base64 encoded URLs won't output '/' characters?
Background: Firebase uses a key/value structure, and its key names, per the docs,
"can include any unicode characters except for . $ # [ ] / and ASCII
control characters 0-31 and 127"
I'd like to use URLs as a key for one of my collection, but obviously the '/' and '.' make raw strings a no-go.
My plan (to which I'm not married) is to convert the URLs into Base64, using either the browser's functions (atob() and btoa()) or a dedicated function/NPM module (as discussed here).
However, Base64 outputs can include '/', which breaks Firebase rules.
Would the characters a URL might contain ever produce a '/'?
If so, is there any reason I shouldn't just add a simple String.replace() to the front/back of the Base64 encoding function?
Taking suggestions in OP comments (thanks ceejayoz, Derek), it looks something like this will work:
let rawUrl = "http://stackoverflow.com/questions/38679286/how-can-i-convert-urls-to-base64-without-outputting-characters?noredirect=1#comment64738129_38679286";
let key = btoa(encodeURIComponent(url));
let decodedUrl = decodeURIComponent(atob(key));
rawUrl == decodedUrl // True
You can always just replace those characters after base64 encoding them, and then replace again when decoding.
const fireBase64 = {
encode: (str) => btoa(str).replace(/\//g, '_'),
decode: (b64) => atob(b64.replace(/_/g, '/'))
}
The possible characters in a base64 string are
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
If you don't want /, just replace it with another character not present in the list above, e.g. _.
var string = "ü";
var encoded = btoa(string).replace(/\//g, '_');
var decoded = atob(encoded.replace(/_/g, '/'));
console.log(btoa(string).indexOf('/') > -1); // true :(
console.log(encoded.indexOf('/') > -1); // false :)
console.log(string == decoded); // true :)
Related
I have a string in JS in this format:
http\x3a\x2f\x2fwww.url.com
How can I get the decoded string out of this? I tried unescape(), string.decode but it doesn't decode this. If I display that encoded string in the browser it looks fine (http://www.url.com), but I want to manipulate this string before displaying it.
Thanks.
You could write your own replacement method:
String.prototype.decodeEscapeSequence = function() {
return this.replace(/\\x([0-9A-Fa-f]{2})/g, function() {
return String.fromCharCode(parseInt(arguments[1], 16));
});
};
"http\\x3a\\x2f\\x2fwww.example.com".decodeEscapeSequence()
There is nothing to decode here. \xNN is an escape character in JavaScript that denotes the character with code NN. An escape character is simply a way of specifying a string - when it is parsed, it is already "decoded", which is why it displays fine in the browser.
When you do:
var str = 'http\x3a\x2f\x2fwww.url.com';
it is internally stored as http://www.url.com. You can manipulate this directly.
If you already have:
var encodedString = "http\x3a\x2f\x2fwww.url.com";
Then decoding the string manually is unnecessary. The JavaScript interpreter would already be decoding the escape sequences for you, and in fact double-unescaping can cause your script to not work properly with some strings. If, in contrast, you have:
var encodedString = "http\\x3a\\x2f\\x2fwww.url.com";
Those backslashes would be considered escaped (therefore the hex escape sequences remain unencoded), so keep reading.
Easiest way in that case is to use the eval function, which runs its argument as JavaScript code and returns the result:
var decodedString = eval('"' + encodedString + '"');
This works because \x3a is a valid JavaScript string escape code. However, don't do it this way if the string does not come from your server; if so, you would be creating a new security weakness because eval can be used to execute arbitrary JavaScript code.
A better (but less concise) approach would be to use JavaScript's string replace method to create valid JSON, then use the browser's JSON parser to decode the resulting string:
var decodedString = JSON.parse('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
// or using jQuery
var decodedString = $.parseJSON('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
You don't need to decode it. You can manipulate it safely as it is:
var str = "http\x3a\x2f\x2fwww.url.com";
alert(str.charAt(4)); // :
alert("\x3a" === ":"); // true
alert(str.slice(0,7)); // http://
maybe this helps: http://cass-hacks.com/articles/code/js_url_encode_decode/
function URLDecode (encodedString) {
var output = encodedString;
var binVal, thisString;
var myregexp = /(%[^%]{2})/;
while ((match = myregexp.exec(output)) != null
&& match.length > 1
&& match[1] != '') {
binVal = parseInt(match[1].substr(1),16);
thisString = String.fromCharCode(binVal);
output = output.replace(match[1], thisString);
}
return output;
}
2019
You can use decodeURI or decodeURIComponent and not unescape.
console.log(
decodeURI('http\x3a\x2f\x2fwww.url.com')
)
REGEX ONLY
I exclusively need Javascript regex code to convert URLs like
https://hello.romeo-juliet.fr
https://hello.romeojuliet.co.uk
https://hello.romeo-jul-iet.fr
https://hello.romeo-juliet.com
into this string romeojuliet
Basically want to get the alphabetic domain name with removing all other characters and https://, com/co.uk/fr etc Top Level Domains
Would be helpful if done using JS replace.
I tried till here
let url="https://hello.romeo-juliet.fr";
const test=url.replace(/(^\w+:|^)\/\/(\w+.)/, '');
console.log(test);
A non regex solution:
Get the host of the URL (by parsing the string with the URL() constructor and getting its host property), split by a period and get the second item in the resulting array, then remove all occurences of -:
let url="https://hello.romeo-juliet.fr";
const test = new URL(url).host.split(".")[1].replaceAll("-", '');
console.log(test);
You can use it with no regex as the following:
let url="https://hello.romeo-juliet.fr";
url.substring(url.indexOf(".")+1, url.lastIndexOf("."));
// result: romeo-juliet
I hope this answers your question
I have a string which contains xml. It has the following substring
<Subject>������������������</subject>
I'm pulling the xml from a server and I need to display it to the user. I've noticed the ampersand has been escaped and there are utf-16 surrogate pairs. How do I ensure the emojis/emoticons are displayed correctly in a browser.
Currently I'm just getting these characters: �������������� instead of the actual emojis.
I'm looking for a simple way to fix this without any external libraries or any 3rd party code if possible just plain old javascript, html or css.
You can convert UTF-16 code units including surrogates to a JavaScript string with String.fromCharCode. The following code snippet should give you an idea.
var str = '��ABC����������������';
// Regex matching either a surrogate or a character.
var re = /&#(\d+);|([^&])/g;
var match;
var charCodes = [];
// Find successive matches
while (match = re.exec(str)) {
if (match[1] != null) {
// Surrogate
charCodes.push(match[1]);
}
else {
// Unescaped character (assuming the code point is below 0x10000),
charCodes.push(match[2].charCodeAt(0));
}
}
// Create string from UTF-16 code units.
var result = String.fromCharCode.apply(null, charCodes);
console.log(result);
Context
I have code that takes an url path and replaces path params with '*'.
All my urls follow JSON API naming convention.
All valid url resource parts follow next rules:
Member names SHOULD start and end with the characters “a-z” (U+0061
to U+007A)
Member names SHOULD contain only the characters “a-z”
(U+0061 to U+007A), “0-9” (U+0030 to U+0039), and the hyphen minus
(U+002D HYPHEN-MINUS, “-“) as separator between multiple words.
The pass param usually is an id (number, uuid, guid, etc).
Here are several examples of transformations:
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info -> /user/*/info
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f -> /user/*
/user/1 -> /user/*
What I have
/^[a-z][a-z0-9-]*[a-z]$/
The issues is that it doesn't handle uuid as a path param.
Here is my function that parses the url (sorry don't have time to create a jsfiddle):
const escapeResourcePathParameters = resource => resource
.substr(resource.startsWith('/') ? 1 : 0)
.split('/')
.reduce((url, member) => {
const match = member.match(REGEX.JSONAPI_RESOURCE_MEMBER);
const part = match
? member
: '*';
return `${url}/${part}`;
}, '');
Questions
I need a regex that follows the rules above and works for the examples above.
UPD:
I've added my function that I use to parse urls. To test your regex, just replace it with REGEX.JSONAPI_RESOURCE_MEMBER and pass the url like
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info, it should return /user/*/info
i am guessing you are looking for a regex to capture the UUID :
this should be working in javascript :
/[a-z][a-z0-9]+[-]+[a-z0-9-]+[a-z]/
I suppose a UUID Should have at least two words, so at least one "-"
let a = "/user/e09e4f9f-cfcd-4a23-a88a/info"
const match = a.match(/[a-z][a-z0-9]+[-]+[a-z0-9-]+[a-z]/)
console.log(match[0])
So for your code, it should be something like
const escapeResourcePathParameters = resource => resource
.substr(resource.startsWith('/') ? 1 : 0)
.split('/')
.reduce((url, member) => {
// with REGEX.JSONAPI_RESOURCE_MEMBER = /[a-z][a-z0-9]*[-]+[a-z0-9-]+[a-z]/
return `${url}/${member.replace(REGEX.JSONAPI_RESOURCE_MEMBER, '*')}`;
}, '');
You could use look-arounds:
(?<=\/)[a-z][a-z0-9-]*[a-z](?=\/)
As noted in my comment, no need to use the anchors ^ nor $. Also escape slash \/. The regex wil match the pattern [a-z][a-z0-9-]*[a-z] only when surrounded by slashes.
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f/info # result: /*/*/info
/user/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f # result: /*/e09e4f9f-cfcd-4a23-a88f-b9f2f265167f
/user/1 # result: /*/1
To match UUIDs use:
(?<=\/)[a-z0-9-]{8}-(?:[a-z0-9-]{4}-){3}[a-z0-9-]{12}(?=\/)
The UUID format is described here: https://en.wikipedia.org/wiki/Universally_unique_identifier#Format
I want to convert given Unicode Chars into Emojis.
From a function, I get a string and sometimes there are emojis in it but as Unicode (like this \ud83c\uddee\ud83c\uddf9).
So I need to check first if this functions contains these Unicode emoji chars and then I need to convert them into emojis.
With this line of code, I tried to remove these Unicode chars, but it doesn't work.
But now I need to find a method to convert these Unicode chars into Emojis and not removing them!
var fullnameWOE = fullname[1].replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
EDIT: still found no method to solve this problem...
I have to add that I'm getting this string with name containing emoji from a php file, so if there are any opportunities to use php to solve it?
just refere to this one How to find whether a particular string has unicode characters (esp. Double Byte characters) to check if unicodes are present.
But in order to replace with emoticons I suggest you to use a dictionary because it will be more controlled.
const raw = 'This string could contain an emoticon: {{example of emoticon unicode}}';
const EMOJS = {
'example of emoticon unicode': 'REPLACED WITH CORRESPONDING VALUE'
};
function compile(input, dict = EMOJS) {
return Object
.keys(dict)
.reduce(
(res, emojId) => {
let tmp;
do {
tmp = res;
res = res.replace(emojId, dict[emojId]);
} while(tmp !== res);
return res;
},
input
)
}
console.log({input: raw, output: compile(raw)});
here you go:
function containsNonLatinCodepoints(s) {
return /[^\u0000-\u00ff]/.test(s);
}