How is a non-breaking space represented in a JavaScript string?

How is a non-breaking space represented in a JavaScript string? - javascript

This apparently is not working:
X = $td.text();
if (X == ' ') {
X = '';
}
Is there something about a non-breaking space or the ampersand that JavaScript doesn't like?

is a HTML entity. When doing .text(), all HTML entities are decoded to their character values.
Instead of comparing using the entity, compare using the actual raw character:
var x = td.text();
if (x == '\xa0') { // Non-breakable space is char 0xa0 (160 dec)
x = '';
}
Or you can also create the character from the character code manually it in its Javascript escaped form:
var x = td.text();
if (x == String.fromCharCode(160)) { // Non-breakable space is char 160
x = '';
}
More information about String.fromCharCode is available here:
fromCharCode - MDC Doc Center
More information about character codes for different charsets are available here:
Windows-1252 Charset
UTF-8 Charset

Remember that .text() strips out markup, thus I don't believe you're going to find in a non-markup result.
Made in to an answer....
var p = $('<p>').html(' ');
if (p.text() == String.fromCharCode(160) && p.text() == '\xA0')
alert('Character 160');
Shows an alert, as the ASCII equivalent of the markup is returned instead.

That entity is converted to the char it represents when the browser renders the page. JS (jQuery) reads the rendered page, thus it will not encounter such a text sequence. The only way it could encounter such a thing is if you're double encoding entities.

The jQuery docs for text() says
Due to variations in the HTML parsers
in different browsers, the text
returned may vary in newlines and
other white space.
I'd use $td.html() instead.

Related

Get javascript node raw content

I have a javascript node in a variable, and if I log that variable to the console, I get this:
"asekuhfas eo"
Just some random string in a javascript node. I want to get that literally to be a string. But the problem is, when I use textContent on it, I get this:
asekuhfas eo
The special character is converted. I need to get the string to appear literally like this:
asekuhfas eo
This way, I can deal with the special character (recognize when it exists in the string).
How can I get that node object to be a string LITERALLY as it appears?

As VisionN has pointed out, it is not possible to reverse the UTF-8 encoding.
However by using charCodeAt() you can probably still achieve your goal.
Say you have your textContent. By iterating through each character, retrieving its charCode and prepending "&#" as well as appending ";" you can get your desired result. The downside of this method obviously being that you will have each and every character in this annotation, even those do not require it. By introducing some kind of threshold you can restrict this to only the exotic characters.
A very naive approach would be something like this:
var a = div.textContent;
var result = "";
var treshold = 1000;
for (var i = 0; i < a.length; i++) {
if (a.charCodeAt(i) > 1000)
result += "&#" + a.charCodeAt(i) + ";";
else
result += a[i];
}

textContent returns everything correctly, as  is the Unicode Character 'ZERO WIDTH SPACE' (U+200B), which is:
commonly abbreviated ZWSP
this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification
It can be easily proven with:
var div = document.createElement('div');
div.innerHTML = 'xXx';
console.log( div.textContent ); // "xXx"
console.log( div.textContent.length ); // 4
console.log( div.textContent[0].charCodeAt(0) ); // 8203
As Eugen Timm mentioned in his answer it is a bit tricky to convert UTF characters back to HTML entities, and his solution is completely valid for non standard characters with char code higher than 1000. As an alternative I may propose a shorter RegExp solution which will give the same result:
var result = div.textContent.replace(/./g, function(x) {
var code = x.charCodeAt(0);
return code > 1e3 ? '&#' + code + ';' : x;
});
console.log( result ); // "xXx"
For a better solution you may have a look at this answer which can handle all HTML special characters.

Reveal all non-printing ANSI characters and metacharacters in a Javascript string

I'm receiving piped stdout output from a multitude of fairly random shell processes, all as input (stdin) on a single node.js process. For debugging and for parsing, I need to be handle different special character codes that are being piped into the process. It would really help me to see invisible characters (for debugging mostly) and to deal with them accordingly once I've identified the patterns in which they are used.
Given a javascript string with ANSI special characters \u001b* and/or metacharacters such as \n, \t, \r etc., how can one reveal these special characters so they aren't actually rendered, but rather exposed as their code value instead.
For example, let's say I have the following string printed in green (can't show the green colour on SO):
This is a string.
We are now using the green color.
I would like to be able to do a console.log (for example) on this string and have it replace the non-printing characters, metacharacters/newlines, color codes etc with their ANSI codes:
"\u001b[32m\tThis is a string.\nWe are now using the green color.\n"
I can do something like the following, but it is too specific, hard-coded, and inefficient:
line = line.replace(/[\f]/g, '\\n');
line = line.replace(/\u0008/g, '\\b');
line = line.replace(/\u001b|\u001B/g, '\\u001b');
line = line.replace(/\r\n|\r|\n/g, '\\n');
...

Try this:
var map = { // Special characters
'\\': '\\',
'\n': 'n',
'\r': 'r',
'\t': 't'
};
str = str.replace(/[\\\n\r\t]/g, function(i) {
return '\\'+map[i];
});
str = str.replace(/[^ -~]/g, function(i){
return '\\u'+("000" + i.charCodeAt(0).toString(16)).slice(-4);
});

Here's a version that loops through the string, tests to see if it's a normal printable character and, if not, looks it up in a special table for your own representation of that character and if not found in the table, displays whatever default representation you want:
var tagKeys = {
'\n': 'New Line \n',
'\u0009': 'Tab',
'\u2029': 'Line Separator'
/* and so on */
};
function tagSpecialChars(str) {
var output = "", ch, replacement;
for (var i = 0; i < str.length; i++) {
ch = str.charAt(i);
if (ch < ' ' || ch > '~') {
replacement = tagKeys[ch];
if (replacement) {
ch = replacement;
} else {
// default value
// could also use charCodeAt() to get the numeric value
ch = '*****';
}
}
output += ch;
}
return output;
}
Demo: http://jsfiddle.net/jfriend00/bCYa4/
This is obviously not some fancy regex solution, but you said performance was important and you rarely find the best performing operation using a regex and certainly not if you're going to use a whole bunch of them. Plus every regex replace has to loop through the whole string anyway.
This workman-like solution just loops through the input string once and lets you customize the display conversion for any non-printable character you want and also determine what you want to display when it's a non-printable character that you don't have a special display representation for.

ASCII character not being recognized in if statement

I am trying to get a string from a html page with jquery and this is what I have.
var text = $(this).text();
var key = text.substring(0,1);
if(key == ' ' || key == ' ')
key = text.substring(1,2);
text is this  Home
And I want to skip the space and or the keycode above It appears this code does not work either. It only gets the text.substring(0,1); instead of text.substring(1,2); because the if statement is not catching.= and I am not sure why. Any help would be super awesome! Thanks!

There are several problems with the code in the question. First,   has no special meaning in JavaScript: it is a string literal with six characters. Second, text.substring(1,2) returns simply the second character of text, not all characters from the second one onwards.
Assuming that you wish to remove one leading SPACE or NO-BREAK SPACE (which is what   means in HTML; it is not an Ascii character, by the way), then the following code would work:
var first = text.substring(0, 1);
if(first === ' ' || first === '\u00A0') {
text = text.substring(1, text.length);
}
The notation \u00A0 is a JavaScript escape notation for NO-BREAK SPACE U+00A0.
Should you wish to remove multiple spaces at the start, and perhaps at the end too, some modifications are needed. In that case, using a replace operation with regular expression is probably best.

If you want remove spaces at the beginning (and end) of a string, you can use the trim function
var myvar = " home"
myVar.trim() // --> "home"
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim

How to remove emoji code using javascript?

How do I remove emoji code using JavaScript? I thought I had taken care of it using the code below, but I still have characters like 🔴.
function removeInvalidChars() {
return this.replace(/[\uE000-\uF8FF]/g, '');
}

For me none of the answers completely removed all emojis so I had to do some work myself and this is what i got :
text.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
Also, it should take into account that if one inserting the string later to the database, replacing with empty string could expose security issue. instead replace with the replacement character U+FFFD, see : http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example 🔴 U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ♥, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.

I solved it by using a regex with Unicode property escapes. I got it from this article, it's for Java but still very helpful - Remove Emojis from a Java String.
'Smile😀'.replace(/[^\p{L}\p{N}\p{P}\p{Z}^$\n]/gu, '');
It removes all symbols except:
\p{L} - all letters from any language
\p{N} - numbers
\p{P} - punctuation
\p{Z} - whitespace separators
^$\n - add any symbols you want to keep
This one should be more correct and it works, but for me it leaves some trash symbols in the string:
'Smile😀'.replace(/\p{Emoji}/gu, '');
Edit: added symbols from comments

I've found many suggestions around but the regex that have solved my problem is:
/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g
A short example
function removeEmojis (string) {
var regex = /(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g;
return string.replace(regex, '');
}
Hope it can help you

Just an addition to #hababr answer.
If you need to get rid of complicated emojis, you have to remove also additional things like modifiers and etc:
'👨🏿‍🎤'.replace(/[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
update:
*#0-9 - are Emoji characters with a text representation by default, per the Unicode Standard.
so, my current solution is next:
'👨🏿‍🎤'.replace(/(?![*#0-9]+)[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)

I know this post is a bit old, but I stumbled across this very problem at work and a colleague came up with an interesting idea. Basically instead of stripping emoji character only allow valid characters in. Consulting this ASCII table:
http://www.asciitable.com/
A function such as this could only keep legal characters (the range itself dependent on what you are after)
function (input) {
var result = '';
if (input.length == 0)
return input;
for (var indexOfInput = 0, lengthOfInput = input.length; indexOfInput < lengthOfInput; indexOfInput++) {
var charAtSpecificIndex = input[indexOfInput].charCodeAt(0);
if ((32 <= charAtSpecificIndex) && (charAtSpecificIndex <= 126)) {
result += input[indexOfInput];
}
}
return result;
};
This should preserve all numbers, letters and special characters of the Alphabet for a situation where you wish to preserve the English alphabet + number + special characters. Hope it helps someone :)

#bobince's solution didn't work for me. Either the Emojis stayed there or they were swapped by a different Emoji.
This solution did the trick for me:
var ranges = [
'\ud83c[\udf00-\udfff]', // U+1F300 to U+1F3FF
'\ud83d[\udc00-\ude4f]', // U+1F400 to U+1F64F
'\ud83d[\ude80-\udeff]' // U+1F680 to U+1F6FF
];
$('#mybtn').on('click', function() {
removeInvalidChars();
})
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'g'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput"/>
<input type="submit" id="mybtn" value="clear"/>
Source

After searching and trying lots of unicode regex, I suggest you try this, it can cover all of emojis:
function removeEmoji(str) {
let strCopy = str;
const emojiKeycapRegex = /[\u0023-\u0039]\ufe0f?\u20e3/g;
const emojiRegex = /\p{Extended_Pictographic}/gu;
const emojiComponentRegex = /\p{Emoji_Component}/gu;
if (emojiKeycapRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiKeycapRegex, '');
}
if (emojiRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiRegex, '');
}
if (emojiComponentRegex.test(strCopy)) {
// eslint-disable-next-line no-restricted-syntax
for (const emoji of (strCopy.match(emojiComponentRegex) || [])) {
if (/[\d|*|#]/.test(emoji)) {
continue;
}
strCopy = strCopy.replace(emoji, '');
}
}
return strCopy;
}
let a = "1️⃣aa🤹‍♂️b#️⃣🔤✅❎23#!^*bb🤹🏾🤹‍♀️🚴🏻ccc";
console.log(removeEmoji(a))
Refrence: Unicode Emoij Document

None of the answers here worked for all the unicode characters I tested (specifically characters in the miscellaneous range such as ⛽ or ☯️).
Here is one that worked for me, (heavily) inspired from this SO PHP answer:
function _removeEmojis(str) {
return str.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?/g, '');
}
(My use case is sorting in a data grid where emojis can come first in a string but users want the text ordered by the actual words.)

sandre89's answer is good but not perfect.
I spent some time on the subject and have a working solution.
var ranges = [
'[\u00A0-\u269f]',
'[\u26A0-\u329f]',
// The following characters could not be minified correctly
// if specifed with the ES6 syntax \u{1F400}
'[🀄-🧀]'
//'[\u{1F004}-\u{1F9C0}]'
];
$('#mybtn').on('click', function() {
removeInvalidChars();
});
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'ug'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput" />
<input type="submit" id="mybtn" value="clear" />
Here is my CodePen
There are some points to note, though.
Unicode characters from U+1F000 up need a special notation, so you can use sandre89's way, or opt for the \u{1F000} ES6 notation, which may or may not work with your minificator. I succeeded pasting the emojis directly in the UTF-8 encoded script.
Don't forget the u flag in the regex, or your Javascript engine may throw an error.
Beware that things may not be working due to the file encoding, character set, or minificator. In my case nothing worked until I took the script off an .isml file (Demandware) and pasted it into a .js file.
You may gain some insight by referring to Wikipedia Emoji page and How many bytes does one Unicode character take?, and by tinkering with this Online Unicode converter, as I did.

var emoji =/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g;
str.replace(emoji, "");
i add this '\uD83E[\udd00-\uddff]'
these emojis were updated when 2018 june
if u want block emojis after other update then use this
str.replace(/[^0-9a-zA-Zㄱ-힣+×÷=%♤♡☆♧)(*&^/~##!-:;,?`_|<>{}¥£€$◇■□●○•°※¤《》¡¿₩\[\]\"\' \\]/g ,"");
u can block all emojis and u can only use eng, num, hangle, and some Characters
thx :)

You can use this function to replace emojis with nothing:
function msgAfterClearEmojis(msg)
{
var new_msg = msg.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g, '').trim();
return new_msg;
}

You can check here with emoji..
😊 , 😌 , 👽
function removeEmoji() {
var y = document.getElementById('textbox_id1');
y.value = y.value.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
}
input {
padding: 5px;
}
<input type="text" id="textbox_id1" placeholder="Remove emoji..." oninput="removeEmoji()">
You can take more emojis from here: Emoji Keyboard Online

This is the iteration on #hababr's answer.
His answer removes lots of standard chars like $, +, < and so on.
This version keeps all of them (except for the \ backslash - dunno how to properly escape it).
"hey😁 hau💓 ahoy🏴‍☠️ !##$%^&*()-_=+±§;:'\|`~/?[]{},.<>".replace(/[^\p{L}\p{N}\p{P}\p{Z}{^$=+±\\'|`\\~<>}]/gu, "")
// "hey hau ahoy !##$%^&*()-_=+±§;:'|`~/?[]{},.<>"

I have this regex and it works for all emojis i found on this page
try this regex
<:[^:\s]+:\d+>|<a:[^:\s]+:\d+>|(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)

var emojiRegex = /\uD83C\uDFF4(?:\uDB40\uDC67\uDB40\uDC62(?:\uDB40\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDB40\uDC77\uDB40\uDC6C\uDB40\uDC73|\uDB40\uDC73\uDB40\uDC63\uDB40\uDC74)\uDB40\uDC7F|\u200D\u2620\uFE0F)|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC68(?:\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D)?\uD83D\uDC68|(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3]))|\uD83D\uDC69\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D(?:\uD83D[\uDC68\uDC69])|\uD83D[\uDC68\uDC69])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66\u200D\uD83D\uDC66|(?:\uD83D\uDC41\uFE0F\u200D\uD83D\uDDE8|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\uD83D\uDC68(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\u200D[\u2695\u2696\u2708])|(?:(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)\uFE0F|\uD83D\uDC6F|\uD83E[\uDD3C\uDDDE\uDDDF])\u200D[\u2640\u2642]|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|\u200D[\u2640\u2642])|\uD83D\uDC69\u200D[\u2695\u2696\u2708])\uFE0F|\uD83D\uDC69\u200D\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC68(?:\u200D(?:(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D[\uDC66\uDC67])|\uD83C[\uDFFB-\uDFFF])|\uD83C\uDFF3\uFE0F\u200D\uD83C\uDF08|\uD83D\uDC69\u200D\uD83D\uDC67|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66|\uD83C\uDDF6\uD83C\uDDE6|\uD83C\uDDFD\uD83C\uDDF0|\uD83C\uDDF4\uD83C\uDDF2|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])|\uD83C\uDDED(?:\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA])|\uD83C\uDDEC(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE])|\uD83C\uDDEA(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA])|\uD83C\uDDE8(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF])|\uD83C\uDDF2(?:\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF])|\uD83C\uDDF3(?:\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF])|\uD83C\uDDFC(?:\uD83C[\uDDEB\uDDF8])|\uD83C\uDDFA(?:\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF])|\uD83C\uDDF0(?:\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF])|\uD83C\uDDEF(?:\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5])|\uD83C\uDDF8(?:\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF])|\uD83C\uDDEE(?:\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9])|\uD83C\uDDFF(?:\uD83C[\uDDE6\uDDF2\uDDFC])|\uD83C\uDDEB(?:\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7])|\uD83C\uDDF5(?:\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE])|\uD83C\uDDE9(?:\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF])|\uD83C\uDDF9(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF])|\uD83C\uDDE7(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF])|[#\*0-9]\uFE0F\u20E3|\uD83C\uDDF1(?:\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE])|\uD83C\uDDE6(?:\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF])|\uD83C\uDDF7(?:\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC])|\uD83C\uDDFB(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA])|\uD83C\uDDFE(?:\uD83C[\uDDEA\uDDF9])|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:\uD83C[\uDFFB-\uDFFF])|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u261D\u270A-\u270D]|\uD83C[\uDF85\uDFC2\uDFC7]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC70\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDCAA\uDD74\uDD7A\uDD90\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD30-\uDD36\uDDB5\uDDB6\uDDD1-\uDDD5])(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u231A\u231B\u23E9-\u23EC\u23F0\u23F3\u25FD\u25FE\u2614\u2615\u2648-\u2653\u267F\u2693\u26A1\u26AA\u26AB\u26BD\u26BE\u26C4\u26C5\u26CE\u26D4\u26EA\u26F2\u26F3\u26F5\u26FA\u26FD\u2705\u270A\u270B\u2728\u274C\u274E\u2753-\u2755\u2757\u2795-\u2797\u27B0\u27BF\u2B1B\u2B1C\u2B50\u2B55]|\uD83C[\uDC04\uDCCF\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF7C\uDF7E-\uDF93\uDFA0-\uDFCA\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF4\uDFF8-\uDFFF]|\uD83D[\uDC00-\uDC3E\uDC40\uDC42-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDD7A\uDD95\uDD96\uDDA4\uDDFB-\uDE4F\uDE80-\uDEC5\uDECC\uDED0-\uDED2\uDEEB\uDEEC\uDEF4-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])|(?:[#\*0-9\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u261D\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7-\u26FA\u26FD\u2702\u2705\u2708-\u270D\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299]|\uD83C[\uDC04\uDCCF\uDD70\uDD71\uDD7E\uDD7F\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE02\uDE1A\uDE2F\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF21\uDF24-\uDF93\uDF96\uDF97\uDF99-\uDF9B\uDF9E-\uDFF0\uDFF3-\uDFF5\uDFF7-\uDFFF]|\uD83D[\uDC00-\uDCFD\uDCFF-\uDD3D\uDD49-\uDD4E\uDD50-\uDD67\uDD6F\uDD70\uDD73-\uDD7A\uDD87\uDD8A-\uDD8D\uDD90\uDD95\uDD96\uDDA4\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA-\uDE4F\uDE80-\uDEC5\uDECB-\uDED2\uDEE0-\uDEE5\uDEE9\uDEEB\uDEEC\uDEF0\uDEF3-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])\uFE0F|(?:[\u261D\u26F9\u270A-\u270D]|\uD83C[\uDF85\uDFC2-\uDFC4\uDFC7\uDFCA-\uDFCC]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66-\uDC69\uDC6E\uDC70-\uDC78\uDC7C\uDC81-\uDC83\uDC85-\uDC87\uDCAA\uDD74\uDD75\uDD7A\uDD90\uDD95\uDD96\uDE45-\uDE47\uDE4B-\uDE4F\uDEA3\uDEB4-\uDEB6\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD26\uDD30-\uDD39\uDD3D\uDD3E\uDDB5\uDDB6\uDDB8\uDDB9\uDDD1-\uDDDD])/g;
console.log(text.replace(emojiRegex,'');

<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>
function isEmoji(str) {
var ranges = [
'[\uE000-\uF8FF]',
'\uD83C[\uDC00-\uDFFF]',
'\uD83D[\uDC00-\uDFFF]',
'[\u2011-\u26FF]',
'\uD83E[\uDD10-\uDDFF]'
];
if (str.match(ranges.join('|'))) {
return true;
} else {
return false;
}
}
$(document).ready(function(){
$('input').on('input',function(){
var $th = $(this);
console.log("Value of Input"+$th.val());
emojiInput= isEmoji($th.val());
if (emojiInput==true) {
$th.val("");
}
});
});
</script>
</head>
<body>
Enter your name: <input type="text">
</body>
</html>

There is a modern solution using categories
Modern browsers support Unicode property, which allows you to match emojis based on their belonging in the Emoji Unicode category. For example, you can use Unicode property escapes like \p{Emoji} or \P{Emoji} to match/no match emoji characters. Note that 0123456789#* and other characters are interpreted as emojis using the previous Unicode category. Therefore, a better way to do this is to use the {Extended_Pictographic} Unicode category that denotes all the characters typically understood as emojis instead of the {Emoji} category.
const withEmojis = /\p{Extended_Pictographic}/u
withEmojis.test('😀😀');
//true
withEmojis.test('ab');
//false
withEmojis.test('1');
//false
or with negation
const noEmojis = /\P{Extended_Pictographic}/u
noEmojis.test('😀');
//false
noEmojis.test('1212');
//false

You can use mathiasbynens/emoji-regex package to remove or replace emojis.
You can see the latest build's content to grab the regex by visiting following url:
http://unpkg.com/emoji-regex/index.js

In detail, this function first uses TextEncoder to convert content into a byte array with utf-8 encoding, then loops through this array, if it finds a byte whose first five bits are 11110 (i.e. 0xF0), it means this is an emoji start, then it replaces this byte and the next three bytes with 0x30 (i.e. number 0). Finally, it uses TextDecoder to convert the modified byte array back to a string, and uses replaceAll method to remove extra 0s.
function removeEmoji (content) {
let conByte = new TextEncoder("utf-8").encode(content);
for (let i = 0; i < conByte.length; i++) {
if ((conByte[i] & 0xF8) == 0xF0) {
for (let j = 0; j < 4; j++) {
conByte[i+j]=0x30;
}
i += 3;
}
}
content = new TextDecoder("utf-8").decode(conByte);
return content.replaceAll("0000", "");
}

How to replace whitespaces using javascript?

I'm trying to remove the whitespaces from a textarea . The below code is not appending the text i'm selecting from two dropdowns. Can somebody tell me where i'd gone wrong? I'm trying to remove multiple spaces within the string as well, will that work with the same? Dont know regular expressions much. Please help.
function addToExpressionPreview() {
var reqColumnName = $('#ddlColumnNames')[0].value;
var reqOperator = $('#ddOperator')[0].value;
var expressionTextArea = document.getElementById("expressionPreview");
var txt = document.createTextNode(reqColumnName + reqOperator.toString());
if (expressionTextArea.value.match(/^\s+$/) != null)
{
expressionTextArea.value = (expressionTextArea.value.replace(/^\W+/, '')).replace(/\W+$/, '');
}
expressionTextArea.appendChild(txt);
}

> function addToExpressionPreview() {
> var reqColumnName = $('#ddlColumnNames')[0].value;
> var reqOperator = $('#ddOperator')[0].value;
You might as well use document.getElementById() for each of the above.
> var expressionTextArea = document.getElementById("expressionPreview");
> var txt = document.createTextNode(reqColumnName + reqOperator.toString());
reqOperator is already a string, and in any case, the use of the + operator will coerce it to String unless all expressions or identifiers involved are Numbers.
> if (expressionTextArea.value.match(/^\s+$/) != null) {
There is no need for match here. I seems like you are trying to see if the value is all whitespace, so you can use:
if (/^\s*$/.test(expressionTextArea.value)) {
// value is empty or all whitespace
Since you re-use expressionTextArea.value several times, it would be much more convenient to store it an a variable, preferably with a short name.
> expressionTextArea.value = (expressionTextArea.value.replace(/^\W+/,
> '')).replace(/\W+$/, '');
That will replace one or more non-word characters at the end of the string with nothing. If you want to replace multiple white space characters anywhere in the string with one, then (note wrapping for posting here):
expressionTextArea.value = expressionTextArea.value.
replace(/^\s+/,'').
replace(/\s+$/, '').
replace(/\s+/g,' ');
Note that \s does not match the same range of 'whitespace' characters in all browsers. However, for simple use for form element values it is probably sufficient.

Whitespace is matched by \s, so
expressionTextArea.value.replace(/\s/g, "");
should do the trick for you.
In your sample, ^\W+ will only match leading characters that are not a word character, and ^\s+$ will only match if the entire string is whitespace. To do a global replace(not just the first match) you need to use the g modifier.

Refer this link, you can get some idea. Try .replace(/ /g,"UrReplacement");
Edit: or .split(' ').join('UrReplacement') if you have an aversion to REs

Develop Reference

JavaScript is the programming language of the Web.

How is a non-breaking space represented in a JavaScript string? - javascript

This apparently is not working: X = $td.text(); if (X == ' ') { X = ''; } Is there something about a non-breaking space or the ampersand that JavaScript doesn't like?

That entity is converted to the char it represents when the browser renders the page. JS (jQuery) reads the rendered page, thus it will not encounter such a text sequence. The only way it could encounter such a thing is if you're double encoding entities.

The jQuery docs for text() says Due to variations in the HTML parsers in different browsers, the text returned may vary in newlines and other white space. I'd use $td.html() instead.

Related

Get javascript node raw content

Reveal all non-printing ANSI characters and metacharacters in a Javascript string

ASCII character not being recognized in if statement

How to remove emoji code using javascript?

How to replace whitespaces using javascript?

Categories

Resources