Encode and decode a string in JavaScript - javascript

Let's say I have a string, called str, which is equal to "Hello, world!"
Is there a way to choose some unicode characters like "azertyuiopqsdfghjklmwxcvbn1234567890-" and return an encoded string that contains only the choosen characters?
It should return something like "hvfebi iehfhe" (well, something encoded and not human readable, but which is decodable)?
Thanks

There's no such a thing as a function that you configure an arbitrary set of characters to enconde and decode, unless you find a library that does that or you implement it yourself.
But, you can use base64 encoding, which uses "A-Z", "a-z", "0-9", "+", "/" and "=" characters to encode the string.
The native browser functions are on window: btoa() to enconde and atob() to decode.
Edit after your comments:
Any function you make up to solve this will be decodable anyway analysing the code, so no point in hiding it. If you don't want to be simply base64, you can make a function that encodes it several times.

Related

Text encoding that produces legible encodings suitable as Javascript identifiers?

I'm working on a tool that reads arbitrary data files and creates a table out of its data which I then store in a database. I'd like to preserve the column headers. The column headers are already ASCII text (or maybe latin1), but they have characters that aren't valid variable names (e.g., spaces, %), so I need to encode them somehow. I'm looking for an encoding for the column titles that has these properties:
Legible: it would be nice if the encoded text looked as similar as possible to the unencoded text (i.e., for debugging).
Legal identifier: I'd like the encoded text to be a valid JavaScript identifier (ECMA-262 Section 7.6).
Invertible: I'd like to be able to get the exact original text back from the encoded text.
I can think of approaches that work for 2 of the 3 cases, but I don't know how to get all 3. E.g., url encoding doesn't produce legal identifier names, I think I could transform base64 to be legal, but it isn't legible, what I've got currently just does some substitutions so it's not invertible.
Efficiency isn't a concern, so if necessary, I could store the encoded and unencoded texts together. The best option I can think of is to use url encoding and then swap percents for $. I thought there would be better options than this though, but I can't find anything. Is there anything better?
This pair of methods relying on Guava's PercentEscaper seems to meet my requirements. Guava doesn't provide an unescaper, but given my simple needs here, I can just use a simple URLDecoder.
private static PercentEscaper escaper = new PercentEscaper('',false)
static String getIdentifier(String str) {
//minimal safe characters, but leaves letters alone, so it's somewhat legible
String escaped = escaper.escape(str);
//javascript identifiers can't start with a digit, and the escaper doesn't know the first
//character has different rules. so prepend a "%3" to encode the digit
if(Character.isDigit(escaped.charAt(0))){
escaped = "%3"+escaped
}
//a percent isn't a valid in a javascript identifier, so we'll use _ as our special character
escaped = escaped.replace('%','_');
return escaped;
}
static String invertIdentifier(String str){
String unescaped = str.replace('_','%');
unescaped = URLDecoder.decode(unescaped, "UTF-8");
return unescaped;
}

Why are atob and btoa not reversible

I'm trying to find a simple way to record and temporarily obfuscate answers to "quiz" questions I'm writing in Markdown. (I'll tell the students the quiz answers during the presentation, so I'm not looking for any kind of secure encryption.)
I thought I could use atob('message I want to obfuscate') then tell students they can use btoa() in their developer tools panel to reverse the process. However the following does not return 'one':
btoa( atob('one') )
Does anyone know why this doesn't return 'one'? Are there other methods built into JavaScript that will allow one to loosely encrypt and decrypt a message? (I'm working with absolute beginners who might be confused by functions and who would be very confused trying to add libraries to a page).
That is the reason.
In Base64 encoding, the length of output encoded String must be a
multiple of 3. If it's not, the output will be padded with additional
pad characters (=). On decoding, these extra padding characters will
be discarded.
var string1 = "one",
string2 = "one2";
console.log("Value of string1", string1)
console.log("Decoded string1", atob(string1))
console.log("Encoded string1", btoa(atob(string1)))
console.log("-------------------------------------")
console.log("Value of string2", string2)
console.log("Decoded string2", atob(string2))
console.log("Encoded string2", btoa(atob(string2)))
As #george pointed out, one must use btoa() before using atob():
atob( btoa( 'hello' ) )
btoa means binary to ascii: input is Binary=any kind of data: text, images, audio. Output is Ascii=its base64 encoding, which is an ascii subset, i.e. a text string containing only upper and lowercase letters, numbers, comma, plus, slash, equal sign (only for padding at end).
atob means ascii to binary: input MUST be a subset of Ascii, i.e. the result of a base64 encoded string. Output is Binary=any type of data (text, image, audio, ...).

Encoding in C# and Decoding in Javascript

I have encoded some text in C# like below:
var encodedCredential = Convert.ToBase64String(Encoding.Unicode.GetBytes(JsonConvert.SerializeObject("Sample text")));
The encoded String is :IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=
I want to decode the encoded String in java script.
I have tried the below
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA="))
decodeURIComponent(atob("IgBTAGEAbQBwAGwAZQAgAHQAZQB4AHQAIgA=").replace(' ',''))
The result is something different, There are some spaces in each letter. I cant even replace the spaces.
You need to use UTF-8 encoding in C#. Export base64 by this command
Convert.ToBase64String(Encoding.UTF8.GetBytes("Sample text"))
#King_Fisher, you shouldn't be getting additional spaces, also the replace method will replace a single occurrence.
Here's what I did with your code (see attached screenshot)

javascript escape problem with unicode characters

I use the following jquery code to load some date on a specific event from external file:
$("#container").load("/include/data.php?name=" + escape(name));
if the javascript "name" variable contains unicode characters it sends some encoded symbols to data.php file, something like this: %u10E1
How can I deal with this encoded symbols? I need to convert them back to readable one.
When I remove the escape function and leave just "name" variable the code doesn't work any more...
Can anyone please help?
If you want to do this manually, then you should be using encodeURIComponent, not escape (which is deprecated)
The jQuery way, however, would be:
$("#container").load("/include/data.php", { "name": name });
Either way PHP should decode it automatically when it populates $_GET.
This may help you.
javascript - how to convert unicode string to ascii

Character/URI encoding in JavaScript getting out of sync?

I have a question about encoding special/extended UTF-8 characters in URLs in JavaScript. The same question applies to many characters like the Registered R-circle, but my example uses an umlaut:
ü = %C3%BC in UTF-8 (four rows from bottom of http://www.utf8-chartable.de/)
If the url contains an umlaut represented as UTF-8 (ü = %C3%BC), and I run it through encodeURIComponent, the %s are encode, the string now looks like "%25C3%25BC" and it gets correctly processed by my system. This is good.
url = "http://foo.com/bar.html?%C3%BC"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%25C3%25BC"
However, the bad: If the pre-encoded string has an unencoded character, the actual umlaut, the after encoding is looks like "%C3%BC" and fails because, I believe, the %s should be encoded, too.:
url = "http://foo.com/bar.html?ü"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%C3%BC"
I think it fails because it is less thoroughly encoded than the rest of the url.
So, beyond general advice or answers to questions I don't know to ask, what I think i want to know is how to get the raw umlaut (and all other special characters) to fully encode. Is that what is incorrect?
Thanks for your help!
Nate
You cannot encode a URL all at once. If you have already concatenated the host, path, parameters, etc., together then it's impossible to correctly determine which characters actually need to be encoded and which characters are separators that need to be left alone.
The only reliable way to build a URL is by concatenating already-encoded values:
"http://foo.com/bar.html?" + encodeURIComponent("%C3%BC")

Categories

Resources