How to RegExp this Base64 encoded 256-bit number without padding - javascript

I'm trying to regex on the client as well as the server with this validation of a Base64 encoded 256-bit number without the = padding.
^[A-Za-z0-9+/]{42}[AEIMQUYcgkosw048]$
This is my code which isn't working as expected as any value seems to return true:
$.fn.validateKey = function() {
var re = /^[A-Za-z0-9+/]{42}[AEIMQUYcgkosw048]$/
var re = new RegExp($(this).val());
return re;
};
How can I validate Base 64 encoded 256-bit signing keys without padding with javascript?

You're returning a RegExp object. You want to return its evaluation with an input string instead.
$.fn.validateKey = function() {
var re = /^[A-Za-z0-9+/]{42}[AEIMQUYcgkosw048]$/;
return re.test($(this).val());
};
Jan in the comments pointed out something interesting, in which the / doesn't need to be escaped in the regex (at least in my browser).
I believe it's due to being part of a character class.

Related

How to convert string from ENV to hex in JavaScript? [duplicate]

I have a string in JS in this format:
http\x3a\x2f\x2fwww.url.com
How can I get the decoded string out of this? I tried unescape(), string.decode but it doesn't decode this. If I display that encoded string in the browser it looks fine (http://www.url.com), but I want to manipulate this string before displaying it.
Thanks.
You could write your own replacement method:
String.prototype.decodeEscapeSequence = function() {
return this.replace(/\\x([0-9A-Fa-f]{2})/g, function() {
return String.fromCharCode(parseInt(arguments[1], 16));
});
};
"http\\x3a\\x2f\\x2fwww.example.com".decodeEscapeSequence()
There is nothing to decode here. \xNN is an escape character in JavaScript that denotes the character with code NN. An escape character is simply a way of specifying a string - when it is parsed, it is already "decoded", which is why it displays fine in the browser.
When you do:
var str = 'http\x3a\x2f\x2fwww.url.com';
it is internally stored as http://www.url.com. You can manipulate this directly.
If you already have:
var encodedString = "http\x3a\x2f\x2fwww.url.com";
Then decoding the string manually is unnecessary. The JavaScript interpreter would already be decoding the escape sequences for you, and in fact double-unescaping can cause your script to not work properly with some strings. If, in contrast, you have:
var encodedString = "http\\x3a\\x2f\\x2fwww.url.com";
Those backslashes would be considered escaped (therefore the hex escape sequences remain unencoded), so keep reading.
Easiest way in that case is to use the eval function, which runs its argument as JavaScript code and returns the result:
var decodedString = eval('"' + encodedString + '"');
This works because \x3a is a valid JavaScript string escape code. However, don't do it this way if the string does not come from your server; if so, you would be creating a new security weakness because eval can be used to execute arbitrary JavaScript code.
A better (but less concise) approach would be to use JavaScript's string replace method to create valid JSON, then use the browser's JSON parser to decode the resulting string:
var decodedString = JSON.parse('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
// or using jQuery
var decodedString = $.parseJSON('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
You don't need to decode it. You can manipulate it safely as it is:
var str = "http\x3a\x2f\x2fwww.url.com";
​alert(str.charAt(4)); // :
alert("\x3a" === ":"); // true
alert(str.slice(0,7))​; // http://
maybe this helps: http://cass-hacks.com/articles/code/js_url_encode_decode/
function URLDecode (encodedString) {
var output = encodedString;
var binVal, thisString;
var myregexp = /(%[^%]{2})/;
while ((match = myregexp.exec(output)) != null
&& match.length > 1
&& match[1] != '') {
binVal = parseInt(match[1].substr(1),16);
thisString = String.fromCharCode(binVal);
output = output.replace(match[1], thisString);
}
return output;
}
2019
You can use decodeURI or decodeURIComponent and not unescape.
console.log(
decodeURI('http\x3a\x2f\x2fwww.url.com')
)

How to decode utf-16 emoji surrogate pairs into uf8-8 and display them correctly in html?

I have a string which contains xml. It has the following substring
<Subject>&#55357;&#56898;&#55357;&#56838;&#55357;&#56846;&#55357;&#56838;&#55357;&#56843;&#55357;&#56838;&#55357;&#56843;&#55357;&#56832;&#55357;&#56846;</subject>
I'm pulling the xml from a server and I need to display it to the user. I've noticed the ampersand has been escaped and there are utf-16 surrogate pairs. How do I ensure the emojis/emoticons are displayed correctly in a browser.
Currently I'm just getting these characters: �������������� instead of the actual emojis.
I'm looking for a simple way to fix this without any external libraries or any 3rd party code if possible just plain old javascript, html or css.
You can convert UTF-16 code units including surrogates to a JavaScript string with String.fromCharCode. The following code snippet should give you an idea.
var str = '&#55357;&#56898;ABC&#55357;&#56838;&#55357;&#56846;&#55357;&#56838;&#55357;&#56843;&#55357;&#56838;&#55357;&#56843;&#55357;&#56832;&#55357;&#56846;';
// Regex matching either a surrogate or a character.
var re = /&#(\d+);|([^&])/g;
var match;
var charCodes = [];
// Find successive matches
while (match = re.exec(str)) {
if (match[1] != null) {
// Surrogate
charCodes.push(match[1]);
}
else {
// Unescaped character (assuming the code point is below 0x10000),
charCodes.push(match[2].charCodeAt(0));
}
}
// Create string from UTF-16 code units.
var result = String.fromCharCode.apply(null, charCodes);
console.log(result);

How to decode String which contains characters like 'Total\x20Value' my actual value is 'Total Value'

How to decode String which contains characters like 'Total\x20Value' my actual value is 'Total Value'
Using javascript it is getting decoded by the browser like:
if I write on browser console:
var a = 'Total\x20Value';
then I print a then it will print 'Total Value' mean browser decoded this string automatically
Now my question is how can I do this in Java code, I want this string to be decoded in java code but I am not getting a way to decode it in Java.
One more thing I can not go for string replace solution for this case because the given string only contains a space char but at run time I will get different characters so I need a generic solution in which I can decode any string without any replace operation.
One more string example is :
"DIMENSION\x5f13420895086619127059036175667828\x7e\x24\x7e1\x7e\x24\x7e1"
its real string is :
"DIMENSION_13420895086619127059036175667828~$~1~$~1".
Suggest something If it can be achieved in Java using some predefined class I have gone through with many solutions but nothing worked for me.
I suspect that a better way to address the problem you have is to fix the way these strings are created, so they don't have substrings such as \x20 or \x7e to start off with.
However, these strings could well be coming from a third-party API which you might not have any control over. If that's the case, the following method should help. It takes the string value you want to decode, containing such substrings, and replaces them with the appropriate characters:
import java.util.regex.*;
// ...
private static String decode(String input) {
Pattern p = Pattern.compile("\\\\x[0-9A-Fa-f]{2}");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String matchedText = m.group(0);
int characterCode = Integer.parseInt(matchedText.substring(2), 16);
m.appendReplacement(sb,
Matcher.quoteReplacement(Character.toString((char)characterCode)));
}
m.appendTail(sb);
return sb.toString();
}
There are a few things to note about it:
The overall structure of this code is based on example code in the Matcher documentation.
A regexp to match a substring of the form \x24 or \x7e is \\x[0-9A-Fa-f]{2}. Note that we have to double the backslash \ because \ has special meaning in regular expressions and we want to match an actual \ character. However, \ also has a special meaning in Java string literals so we need to double it again.
We need to use Matcher.quoteReplacement to ensure that the string we are replacing with is interpreted as that string and nothing else. In the replacement string, $1 for example will be interpreted as the first matched group, and $ on its own will cause an exception to be thrown. (Fortunately, your second example string contained $ characters - without those I may well have missed this.)
You may want to consider moving the Pattern to a static final constant somewhere, to avoid the regular expression being compiled every time the method is called.
Those \xNN substrings are just the hexadecimal ASCII code of the encoded character. You can find such an ASCII table here.
You can create your own map which holds the mapping hexadecimal to character and use it to manipulate your strings. Example:
import java.util.HashMap;
import java.util.Map;
public class NewClass {
public static void main(String[] args){
String str1 = "Total\\x20Value";
String str2 = "DIMENSION\\x5f13420895086619127059036175667828\\x7e\\x24\\x7e1\\x7e\\x24\\x7e1";
System.out.println(decode(str1));
System.out.println(decode(str2));
}
public static String decode(String str){
Map<String,String> map = new HashMap<>();
//you can extend this to x<256 if you expect your strings to contain special characters like (Ã,Ç,Æ,§,¾ ...)
for(int i = 0; i< 128; i++){
map.put((i<16?"\\x0":"\\x")+Integer.toHexString(i), Character.toString((char)i));
}
for(String key: map.keySet()){
if(str.contains(key)){
str = str.replace(key, map.get(key));
}
}
return str;
}
}
This solution involves conversion of Regular expression to Hexadecimal String and then getting the equivalent ASCII value representing the Hexadecimal String. Finally replacing the Hex string with equivalent ASCII character.
public static void main(String[] args) {
String input = "Total\\x20Value\\x7e";
String modifiedInput = input.replace("\\", "0");
for (int i = 0 ; i<modifiedInput.length() ; i++)
{
if(i<=modifiedInput.length()-3){
if (modifiedInput.charAt(i) == '0' && modifiedInput.charAt(i+1)=='x')
{
String subString = modifiedInput.substring(i, i+4) ;
String ascii = convert(subString);
modifiedInput = modifiedInput.replace(subString.toString(), ascii);
}
}
}
System.out.println(modifiedInput);
}
public static String convert(String hexDigits){
// byte[] bytes = new byte[hexDigits.length];
byte[] bytes = new byte[1];
bytes[0] = Integer.decode(hexDigits).byteValue();
String result;
result = new String(bytes);
return result;
}
}

Convert this unicode to string with javascript (Thai Language)

มอเตอร์ไซค์
Can I convert this unicode to string with JS. (It is Thailand Language)
I use
console.log(String.fromCharCode("มอเตอร์ไซค์"));
And It's not correct. if it right it will show มอเตอร์ไซค์
Your Unicode string is encoded using HTML entity notation. Generally that means that whatever encoded the string expected it to end up in the middle of an HTML document, where it would be seen by an HTML parser.
If you've somehow got that string in JavaScript in a browser, you can get to the encoded Unicode by letting the browser parse it:
var str = "มอเตอร์ไซค์";
var elem = document.createElement("div");
elem.innerHTML = str;
alert(elem.textContent);
The string.fromCharCode() function expects one or more numeric arguments; it won't understand HTML entities. Thus if you're not in a browser (like, if you've got the string in a Node.js program or something like that), you could convert the string with your own code:
var str = "มอเตอร์ไซค์";
var thai = String.fromCharCode.apply(String, str.match(/x[^;]*;/g).map(function(n) { return parseInt(n.slice(1, -1), 16); }));
That conversion will only work when the code points involved are within the first 64K values.
You may want something like this :
var input = "มอเตอร์ไซค์";
var output = input.replace(/&#x[0-9A-Fa-f]+;/g,
function(htmlCode) {
var codePoint = parseInt( htmlCode.slice(3, -1), 16 );
return String.fromCharCode( codePoint );
});

What is the memory issue in this RegEx function

I am trying to scrape a web page for email addresses. I almost have it working, but there seems to be some kind of huge memory error that makes the page freeze when my script loads.
This is what I have:
var bodyText = document.body.textContent.replace(/\n/g, " ").split(' '); // Location to pull our text from. In this case it's the whole body
var r = new RegExp("[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])", 'i');
function validateEmail(string) {
return r.test(string);
}
var domains = [];
var domain;
for (var i = 0; i < bodyText.length; i++){
domain = bodyText[i].toString();
if (validateEmail(domain)) {
domains.push(domain);
}
}
The only thing I can think of is that the email validating function I'm using is a 32 step expression and the page I'm running it on returns with over 3,000 parts, but I feel like this should be possible.
Here is a script that reproduces the error:
var str = "help.yahoo.com/us/tutorials/cg/mail/cg_addressguard2.html";
var r = new RegExp("[a-z0-9!#$%&'*+\/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_{|}~-]+)*#(?:[a-‌​z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])", 'i');
console.log("before:"+(new Date()));
console.log(r.test(str));
console.log("after:"+(new Date()));`
What can I do to overcome the memory issue?
stribizhev has pointed out the solution in the comment: specify the regex in RegExp literal syntax. Another solution, as shown in the comment by sln, is to escape \ in the string literal properly.
I will not address what is the correct regex to validating/matching email address with regex in this answer, since it has been rehashed many times over.
To demonstrate what causes the problem, let us print the string passed to RegExp constructor to the console. Did you notice that some \ are missing?
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])
^ ^ ^ ^
The string above is what the RegExp constructor sees and compiles.
/ only needs to be escaped in RegExp literal (since RegExp literals are delimited by /), and doesn't need to be escaped in the string passes to RegExp constructor, so the omission doesn't cause any problem.
Below are equivalent examples showing how to write a regex to match / with RegExp literal and RegExp constructor:
/\//;
new RegExp("/");
However, since \ in \. is not properly escaped in the string, instead of matching literal ., it allows any character (except for line separator) to be matched.
As a result, from being perfectly fine solution, these parts in the regex suffers from catastrophic backtracking:
(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+
Since . can match any character, the fragments above degenerates to the classic catastrophic backtracking pattern (A*)*. By reducing the power of the regex to its strict subset, you can see the problem more clearly:
(?:a[a]+)*
(?:[a](?:[a]*[a])?a)+
This is the solution with RegExp literal, which is the same as specified in the string literal in the question. You got the escape for RegExp literal done properly, but instead use it in RegExp constructor:
var r = /[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])/i;
As for equivalent RegExp constructor solution:
var r = new RegExp("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])", "i");
Not exactly an answer to your question, but the first thing you need to do is to reduce the amount of text parts you have to test with your "corrected" pattern. In your html example file, you have about 3300 text strings to test with a regex. Keep in mind that using a regex has a cost, so removing useless text part is a priority:
var textParts = document.body.textContent
.split(/\s+/) // see the note
.filter(function(part) {
return part.length > 4 && part.length < 255 && part.indexOf('#') > 1;
});
alert(textParts.join("\n"));
Now you have only ~50 text parts to test.
note: if you want to take in account email addresses with spaces inside double quotes, you can try to change:
.split(/\s+/)
to
.split(/(?=[\s"])((?:"[^"\n\\]*(?:\\.[^"\n\\]*)*"[^"\s]*)*)(?:\s+|$)/)
(without any warranty)
About your pattern: the mistake in your pattern is already pointed by other answers and comments, but note that you can probably obtain the same result (the same matches) faster with this one:
/\b\w[!#-'*+\/-9=?^-~-]*(?:\.[!#-'*+\/-9=?^-~-]+)*#[a-z0-9]+(?:-[a-z0-9]+)*\.[a-z0-9]+(?:[-.][a-z0-9]+)*\b/i
Here's an example with a less strict regex that's fast.
function getEmails(str) {
var r = /\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/ig;
var emails = [];
var e = null;
var n = 0;
while ((e = r.exec(str)) !== null) {
emails[n++] = e[0];
}
return emails;
}
function emailTest() {
var str = document.getElementsByTagName('body')[0].innerHTML;
var emails = getEmails(str);
document.getElementById('found').innerHTML=emails.join("\n");
}
emailTest();
#found {
color:green;
font-weight:bold;
}
<pre id="email_test">
test#test.test
foo#bar.baz.test
foo#bar.baz.longdomain
foo-bar#foo.bar
foo_bar99#foo.bar
foo#foo#foo.bar
foo$bar#33#test.test
foo+bar-baz%99#someplace.top
</pre>
<pre id="found"></pre>

Categories

Resources