Correct way of converting unicode to emoji - javascript

I'm using String.formCodePoint to convert Unicode to emoji, but some emojis don't convert as expected. They display like line icons. Please check the example below, first two emojis render correctly, but the last two don't.
for example:
const unicode = ["1f976", "1f97a", "263a-fe0f", "2639"]
unicode.forEach((val) => {
document.body.innerHTML += String.fromCodePoint(parseInt(val, 16))
});
Result:

Your code is not correct.
Old Emoji are not coloured by default, so you need to add the variation code 'fe0f`. You tried on the third one (but not on the forth one), but you convert wrongly to numbers, so it will fail.
This code will fix it (if you have emoji fonts installed).
const unicode = ["1f976", "1f97a", "263a", "fe0f", "2639", "fe0f"]
unicode.forEach((val) => {
document.body.innerHTML += String.fromCodePoint(parseInt(val, 16))
});

Related

cannot get utf8 icon with charAt js

let a=["๐Ÿ˜"] ;
let b="๐Ÿ˜"
console.log(a[0],b,b.charAt(0))
"๐Ÿ˜"
"๐Ÿ˜"
"๏ฟฝ"
this charAt prints questions mark ... can someone enlighten me how to get utf8 icon with charAt in a string
Emojis are represented using multiple bytes:
let b="๐Ÿ˜"
console.log(b.length)
With charAt you only get one part of the emoji.
With codePoint you can get the:
a non-negative integer that is the Unicode code point value at the given position.
You however need to know where the emojii starts at, because the index itself still refers to the bytes:
let b="๐Ÿ˜๐Ÿ‘†๐Ÿ‘"
console.dir(String.fromCodePoint(b.codePointAt(0)));
console.dir(String.fromCodePoint(b.codePointAt(2)));
You could split your string using the spread operator ... and then access the visible char in question using the index. This works because the iterator uses the code points and not individual bytes.
let b="๐Ÿ˜test๐Ÿ‘†test๐Ÿ‘"
function splitStringByCodePoint(str) {
return [...str]
}
console.log(splitStringByCodePoint(b)[5])
But that won't work with emojis like ๐Ÿ‘†๐Ÿฝ because those consist of ๐Ÿ‘† + byte(s) representing the variation (color).
let b="๐Ÿ‘†๐Ÿฝ"
console.log(b.length)
function splitStringByCodePoint(str) {
return [...str]
}
console.log(splitStringByCodePoint(b))
console.log(String.fromCodePoint(b.codePointAt(0)));
console.log(String.fromCodePoint(b.codePointAt(2)));
If you want to support all emojis you currently need to write your own parser or look for a library that does that.

Removing everything but emojis in javascript for google sheets script

Hey so I need to create a script using googlespreadsheets (javascript) that takes the input of one cell and outputs all the emojis from that cell into the selected one. I want to do this by removing everything from the cell text except the emoji. This is because if I try to just match emojis my output is not correct.
I'm using this regex to locate emojis.
var re = /[\u1F60-\u1F64]|[\u2702-\u27B0]|[\u1F68-\u1F6C]|[\u1F30-\u1F70]|[\u2600-\u26ff]|[\uD83C-\uDBFF\uDC00-\uDFFF]+/gi;
How can I remove everything from the text except items with this regex. Or how can I remove everything but unicode. I have tried all the other suggestions but the output isn't correct or it doesn't work with spreadsheets.
Currently I have:
function SHOW_EMOJIS(s) {
var re = /[\u1F60-\u1F64]|[\u2702-\u27B0]|[\u1F68-\u1F6C]|[\u1F30-\u1F70]|[\u2600-\u26ff]|[\uD83C-\uDBFF\uDC00-\uDFFF]+/gi;
var result = s.match(re).toString();
return result;
}
This returns all the emojis, but instead of seeing: โš ๏ธ๐Ÿ™Œโ„๏ธ๐Ÿ‘ฉ๐Ÿปโ€โš•๏ธโ˜ƒ๏ธ๐Ÿฅ‚ I see โš ,๐Ÿ™Œ,โ„,๐Ÿ‘ฉ๐Ÿป,โš•,โ˜ƒ,๐Ÿฅ‚ . The doctor is returned as two separate emoji-characters.
Instead of a custom function, Why not try the inbuilt REGEX?
=REGEXREPLACE(A5,"[[:print:]]","")
Emoji is not printable according to Google Re2๐Ÿ˜‹
To replace all emojis from [A1] please try:
=REGEXREPLACE($A$1,"[๐Ÿป๐Ÿผ๐Ÿฝ๐Ÿพ๐Ÿฟยฉยฎโ€ผโ‰โ„ขโ„นโ†”-โ†™โ†ฉ-โ†ชโŒš-โŒ›โŒจโโฉ-โณโธ-โบโ“‚โ–ช-โ–ซโ–ถโ—€โ—ป-โ—พโ˜€-โ˜„โ˜Žโ˜‘โ˜”-โ˜•โ˜˜โ˜โ˜ โ˜ข-โ˜ฃโ˜ฆโ˜ชโ˜ฎ-โ˜ฏโ˜ธ-โ˜บโ™€โ™‚โ™ˆ-โ™“โ™Ÿ-โ™ โ™ฃโ™ฅ-โ™ฆโ™จโ™ปโ™พ-โ™ฟโš’-โš—โš™โš›-โšœโš -โšกโšงโšช-โšซโšฐ-โšฑโšฝ-โšพโ›„-โ›…โ›ˆโ›Ž-โ›โ›‘โ›“-โ›”โ›ฉ-โ›ชโ›ฐ-โ›ตโ›ท-โ›บโ›ฝโœ‚โœ…โœˆ-โœโœโœ’โœ”โœ–โœโœกโœจโœณ-โœดโ„โ‡โŒโŽโ“-โ•โ—โฃ-โคโž•-โž—โžกโžฐโžฟโคด-โคตโฌ…-โฌ‡โฌ›-โฌœโญโญ•ใ€ฐใ€ฝใŠ—ใŠ™๐Ÿ€„๐Ÿƒ๐Ÿ…ฐ-๐Ÿ…ฑ๐Ÿ…พ-๐Ÿ…ฟ๐Ÿ†Ž๐Ÿ†‘-๐Ÿ†š๐Ÿˆ-๐Ÿˆ‚๐Ÿˆš๐Ÿˆฏ๐Ÿˆฒ-๐Ÿˆบ๐Ÿ‰-๐Ÿ‰‘๐ŸŒ€-๐ŸŒก๐ŸŒค-๐ŸŽ“๐ŸŽ–-๐ŸŽ—๐ŸŽ™-๐ŸŽ›๐ŸŽž-๐Ÿฐ๐Ÿณ-๐Ÿต๐Ÿท-๐Ÿบ๐Ÿ€-๐Ÿ“ฝ๐Ÿ“ฟ-๐Ÿ”ฝ๐Ÿ•‰-๐Ÿ•Ž๐Ÿ•-๐Ÿ•ง๐Ÿ•ฏ-๐Ÿ•ฐ๐Ÿ•ณ-๐Ÿ•บ๐Ÿ–‡๐Ÿ–Š-๐Ÿ–๐Ÿ–๐Ÿ–•-๐Ÿ––๐Ÿ–ค-๐Ÿ–ฅ๐Ÿ–จ๐Ÿ–ฑ-๐Ÿ–ฒ๐Ÿ–ผ๐Ÿ—‚-๐Ÿ—„๐Ÿ—‘-๐Ÿ—“๐Ÿ—œ-๐Ÿ—ž๐Ÿ—ก๐Ÿ—ฃ๐Ÿ—จ๐Ÿ—ฏ๐Ÿ—ณ๐Ÿ—บ-๐Ÿ™๐Ÿš€-๐Ÿ›…๐Ÿ›‹-๐Ÿ›’๐Ÿ›•-๐Ÿ›—๐Ÿ›-๐Ÿ›ฅ๐Ÿ›ฉ๐Ÿ›ซ-๐Ÿ›ฌ๐Ÿ›ฐ๐Ÿ›ณ-๐Ÿ›ผ๐ŸŸ -๐ŸŸซ๐ŸŸฐ๐ŸคŒ-๐Ÿคบ๐Ÿคผ-๐Ÿฅ…๐Ÿฅ‡-๐Ÿงฟ๐Ÿฉฐ-๐Ÿฉด๐Ÿฉธ-๐Ÿฉผ๐Ÿช€-๐Ÿช†๐Ÿช-๐Ÿชฌ๐Ÿชฐ-๐Ÿชบ๐Ÿซ€-๐Ÿซ…๐Ÿซ-๐Ÿซ™๐Ÿซ -๐Ÿซง๐Ÿซฐ-๐Ÿซถ๐Ÿ‡ฆ-๐Ÿ‡ฟ#๏ธโƒฃ*๏ธโƒฃ0๏ธโƒฃ1๏ธโƒฃ2๏ธโƒฃ3๏ธโƒฃ4๏ธโƒฃ5๏ธโƒฃ6๏ธโƒฃ7๏ธโƒฃ8๏ธโƒฃ9๏ธโƒฃ]","")
Related Answer:
https://stackoverflow.com/a/70125203/5372400
=REGEXREPLACE(A5,"[[:print:]]","") is a nice solution, but have some drawbacks:
it leaves these chars: 0 1 2 3 4 5 6 7 8 9 # * on the place of respective emojis 0๏ธโƒฃ 1๏ธโƒฃ 2๏ธโƒฃ 3๏ธโƒฃ 4๏ธโƒฃ 5๏ธโƒฃ 6๏ธโƒฃ 7๏ธโƒฃ 8๏ธโƒฃ 9๏ธโƒฃ ๐Ÿ”Ÿ #๏ธโƒฃ *๏ธโƒฃ
it also replaces other not printable chars.
Test sheet

Why multiple unicode conversions String.fromCharCode("๐Ÿ‘‰".charCodeAt(0)) ruin the symbol in Chrome console and how to fix it?

I have found this today and can't make out why it fails:
Basically if you take some obscure symbol like
"๐Ÿ‘‰"
then "๐Ÿ‘‰".charCodeAt(0) in chrome console - you will get the code 55357, but when you revert the operation with String.fromCharCode(55357) it produces "๏ฟฝ"
Even if I do it like this String.fromCharCode("๐Ÿ‘‰".charCodeAt(0)) it produces "๏ฟฝ" however String.fromCharCode("๐Ÿ‘‰".charCodeAt(0)).charCodeAt(0) is still 55357, so information isn't lost, and it implies that it is Chrome that can't find correct symbol to map to 55357.
Why Chrome cannot represent symbol correctly? Is it because it cannot map it to font correctly? How do I make double conversion to be shown as "๐Ÿ‘‰" again?
If you log
"๐Ÿ‘‰".length
you will get 2, that is, the string actually contains 2 characters, not one. This is because JS only supports 16-bit unicode (BMP) and encodes "astral plane" symbols with "surrogate pairs". Your symbol is \uD83D\uDC49 internally, and when you do .charCodeAt(0) you only get \uD83D, which is invalid unicode.
More on https://mathiasbynens.be/notes/javascript-unicode
Following script will get the 'correct' char code (128073)
(("๐Ÿ‘‰".charCodeAt(0)-0xD800)*0x400) + ("๐Ÿ‘‰".charCodeAt(1)-0xDC00) + 0x10000
one then can convert it to HTML char code like this:
"&#x"+(((("๐Ÿ‘‰".charCodeAt(0)-0xD800)*0x400) + ("๐Ÿ‘‰".charCodeAt(1)-0xDC00) + 0x10000)).toString(16)+";"
And string extension:
String.prototype.charCodeUTF32 = function(){
return ((((this.charCodeAt(0)-0xD800)*0x400) + (this.charCodeAt(1)-0xDC00) + 0x10000));
};
Hope this saves you some time.
TypeScript to convert a text containing emojis:
private emoji2html(text: string): string {
const regexAstralSymbols = /([\uD800-\uDBFF])([\uDC00-\uDFFF])/g;
return text.replace(regexAstralSymbols, (m, first, second) =>
`&#x${(first + second).charCodeUTF32().toString(16)};`);
}

return all the letters that occur before the first number in a string

I am trying to retrieve the first one or two letters from the outcode part of a postcode. The outcode can be in the format X1, XX11, XX1X, X1X. What I would like to do is to retrieve the bold part of these outcodes.
I am using a solution thus far which takes the first two characters of the string and then removes the numbers from the result:
PHP:
$str = preg_replace('/[0-9]+/', '', substr($outcode, 0, 2));
JavaScript:
outcode.substr(0, 2).replace(/\d+/g, '')
This works fine, yet I wonder if there is a more efficient way in PHP and JavaScript/jQuery, such as finding the first number in the string and removing it and everything else after it? I have about 3000 outcodes to filter and efficiency is key. Thanks.
EDIT:
Based on some comments, here are some real life examples, with desired results. Remember that the outcode string is only in one of the formats shown above, no other format:
EX13 => EX
EC2M => EC
E1W => E
E2 => E
This works well
"XX01XX".split(/[0-9]/)[0];
"XX".split(/[0-9]/)[0];
Add .slice(0,2) to get the first chars
In Javascript try:
firstDigit = outcode.match(/\d/); will give you the first digit in the string
index = outcode.indexOf(firstDigit);
outcode.substr(0, index);

How to remove emoji code using javascript?

How do I remove emoji code using JavaScript? I thought I had taken care of it using the code below, but I still have characters like ๐Ÿ”ด.
function removeInvalidChars() {
return this.replace(/[\uE000-\uF8FF]/g, '');
}
For me none of the answers completely removed all emojis so I had to do some work myself and this is what i got :
text.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
Also, it should take into account that if one inserting the string later to the database, replacing with empty string could expose security issue. instead replace with the replacement character U+FFFD, see : http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters
The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.
More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300โ€“U+1F5FF, including your example ๐Ÿ”ด U+1F534 Large Red Circle.
You could detect these characters with [\U0001F300-\U0001F5FF] in a regex engine that supported non-BMP characters, but JavaScript's RegExp is not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:
return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character โ™ฅ, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this list for more examples.
I solved it by using a regex with Unicode property escapes. I got it from this article, it's for Java but still very helpful - Remove Emojis from a Java String.
'Smile๐Ÿ˜€'.replace(/[^\p{L}\p{N}\p{P}\p{Z}^$\n]/gu, '');
It removes all symbols except:
\p{L} - all letters from any language
\p{N} - numbers
\p{P} - punctuation
\p{Z} - whitespace separators
^$\n - add any symbols you want to keep
This one should be more correct and it works, but for me it leaves some trash symbols in the string:
'Smile๐Ÿ˜€'.replace(/\p{Emoji}/gu, '');
Edit: added symbols from comments
I've found many suggestions around but the regex that have solved my problem is:
/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g
A short example
function removeEmojis (string) {
var regex = /(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g;
return string.replace(regex, '');
}
Hope it can help you
Just an addition to #hababr answer.
If you need to get rid of complicated emojis, you have to remove also additional things like modifiers and etc:
'๐Ÿ‘จ๐Ÿฟโ€๐ŸŽค'.replace(/[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
update:
*#0-9 - are Emoji characters with a text representation by default, per the Unicode Standard.
so, my current solution is next:
'๐Ÿ‘จ๐Ÿฟโ€๐ŸŽค'.replace(/(?![*#0-9]+)[\p{Emoji}\p{Emoji_Modifier}\p{Emoji_Component}\p{Emoji_Modifier_Base}\p{Emoji_Presentation}]/gu, '').charCodeAt(0)
I know this post is a bit old, but I stumbled across this very problem at work and a colleague came up with an interesting idea. Basically instead of stripping emoji character only allow valid characters in. Consulting this ASCII table:
http://www.asciitable.com/
A function such as this could only keep legal characters (the range itself dependent on what you are after)
function (input) {
var result = '';
if (input.length == 0)
return input;
for (var indexOfInput = 0, lengthOfInput = input.length; indexOfInput < lengthOfInput; indexOfInput++) {
var charAtSpecificIndex = input[indexOfInput].charCodeAt(0);
if ((32 <= charAtSpecificIndex) && (charAtSpecificIndex <= 126)) {
result += input[indexOfInput];
}
}
return result;
};
This should preserve all numbers, letters and special characters of the Alphabet for a situation where you wish to preserve the English alphabet + number + special characters. Hope it helps someone :)
#bobince's solution didn't work for me. Either the Emojis stayed there or they were swapped by a different Emoji.
This solution did the trick for me:
var ranges = [
'\ud83c[\udf00-\udfff]', // U+1F300 to U+1F3FF
'\ud83d[\udc00-\ude4f]', // U+1F400 to U+1F64F
'\ud83d[\ude80-\udeff]' // U+1F680 to U+1F6FF
];
$('#mybtn').on('click', function() {
removeInvalidChars();
})
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'g'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput"/>
<input type="submit" id="mybtn" value="clear"/>
Source
After searching and trying lots of unicode regex, I suggest you try this, it can cover all of emojis:
function removeEmoji(str) {
let strCopy = str;
const emojiKeycapRegex = /[\u0023-\u0039]\ufe0f?\u20e3/g;
const emojiRegex = /\p{Extended_Pictographic}/gu;
const emojiComponentRegex = /\p{Emoji_Component}/gu;
if (emojiKeycapRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiKeycapRegex, '');
}
if (emojiRegex.test(strCopy)) {
strCopy = strCopy.replace(emojiRegex, '');
}
if (emojiComponentRegex.test(strCopy)) {
// eslint-disable-next-line no-restricted-syntax
for (const emoji of (strCopy.match(emojiComponentRegex) || [])) {
if (/[\d|*|#]/.test(emoji)) {
continue;
}
strCopy = strCopy.replace(emoji, '');
}
}
return strCopy;
}
let a = "1๏ธโƒฃaa๐Ÿคนโ€โ™‚๏ธb#๏ธโƒฃ๐Ÿ”คโœ…โŽ23#!^*bb๐Ÿคน๐Ÿพ๐Ÿคนโ€โ™€๏ธ๐Ÿšด๐Ÿปccc";
console.log(removeEmoji(a))
Refrence: Unicode Emoij Document
None of the answers here worked for all the unicode characters I tested (specifically characters in the miscellaneous range such as โ›ฝ or โ˜ฏ๏ธ).
Here is one that worked for me, (heavily) inspired from this SO PHP answer:
function _removeEmojis(str) {
return str.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?/g, '');
}
(My use case is sorting in a data grid where emojis can come first in a string but users want the text ordered by the actual words.)
sandre89's answer is good but not perfect.
I spent some time on the subject and have a working solution.
var ranges = [
'[\u00A0-\u269f]',
'[\u26A0-\u329f]',
// The following characters could not be minified correctly
// if specifed with the ES6 syntax \u{1F400}
'[๐Ÿ€„-๐Ÿง€]'
//'[\u{1F004}-\u{1F9C0}]'
];
$('#mybtn').on('click', function() {
removeInvalidChars();
});
function removeInvalidChars() {
var str = $('#myinput').val();
str = str.replace(new RegExp(ranges.join('|'), 'ug'), '');
$("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput" />
<input type="submit" id="mybtn" value="clear" />
Here is my CodePen
There are some points to note, though.
Unicode characters from U+1F000 up need a special notation, so you can use sandre89's way, or opt for the \u{1F000} ES6 notation, which may or may not work with your minificator. I succeeded pasting the emojis directly in the UTF-8 encoded script.
Don't forget the u flag in the regex, or your Javascript engine may throw an error.
Beware that things may not be working due to the file encoding, character set, or minificator. In my case nothing worked until I took the script off an .isml file (Demandware) and pasted it into a .js file.
You may gain some insight by referring to Wikipedia Emoji page and How many bytes does one Unicode character take?, and by tinkering with this Online Unicode converter, as I did.
var emoji =/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g;
str.replace(emoji, "");
i add this '\uD83E[\udd00-\uddff]'
these emojis were updated when 2018 june
if u want block emojis after other update then use this
str.replace(/[^0-9a-zA-Zใ„ฑ-ํžฃ+ร—รท=%โ™คโ™กโ˜†โ™ง)(*&^/~##!-:;,?`_|<>{}ยฅยฃโ‚ฌ$โ—‡โ– โ–กโ—โ—‹โ€ขยฐโ€ปยคใ€Šใ€‹ยกยฟโ‚ฉ\[\]\"\' \\]/g ,"");
u can block all emojis and u can only use eng, num, hangle, and some Characters
thx :)
You can use this function to replace emojis with nothing:
function msgAfterClearEmojis(msg)
{
var new_msg = msg.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g, '').trim();
return new_msg;
}
You can check here with emoji..
๐Ÿ˜Š , ๐Ÿ˜Œ , ๐Ÿ‘ฝ
function removeEmoji() {
var y = document.getElementById('textbox_id1');
y.value = y.value.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');
}
input {
padding: 5px;
}
<input type="text" id="textbox_id1" placeholder="Remove emoji..." oninput="removeEmoji()">
You can take more emojis from here: Emoji Keyboard Online
This is the iteration on #hababr's answer.
His answer removes lots of standard chars like $, +, < and so on.
This version keeps all of them (except for the \ backslash - dunno how to properly escape it).
"hey๐Ÿ˜ hau๐Ÿ’“ ahoy๐Ÿดโ€โ˜ ๏ธ !##$%^&*()-_=+ยฑยง;:'\|`~/?[]{},.<>".replace(/[^\p{L}\p{N}\p{P}\p{Z}{^$=+ยฑ\\'|`\\~<>}]/gu, "")
// "hey hau ahoy !##$%^&*()-_=+ยฑยง;:'|`~/?[]{},.<>"
I have this regex and it works for all emojis i found on this page
try this regex
<:[^:\s]+:\d+>|<a:[^:\s]+:\d+>|(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)
var emojiRegex = /\uD83C\uDFF4(?:\uDB40\uDC67\uDB40\uDC62(?:\uDB40\uDC65\uDB40\uDC6E\uDB40\uDC67|\uDB40\uDC77\uDB40\uDC6C\uDB40\uDC73|\uDB40\uDC73\uDB40\uDC63\uDB40\uDC74)\uDB40\uDC7F|\u200D\u2620\uFE0F)|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC68(?:\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D)?\uD83D\uDC68|(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67]))|\uD83D\uDC66\u200D\uD83D\uDC66|\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3]))|\uD83D\uDC69\u200D(?:\u2764\uFE0F\u200D(?:\uD83D\uDC8B\u200D(?:\uD83D[\uDC68\uDC69])|\uD83D[\uDC68\uDC69])|\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66\u200D\uD83D\uDC66|(?:\uD83D\uDC41\uFE0F\u200D\uD83D\uDDE8|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\uD83D\uDC68(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2695\u2696\u2708]|\u200D[\u2695\u2696\u2708])|(?:(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)\uFE0F|\uD83D\uDC6F|\uD83E[\uDD3C\uDDDE\uDDDF])\u200D[\u2640\u2642]|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:(?:\uD83C[\uDFFB-\uDFFF])\u200D[\u2640\u2642]|\u200D[\u2640\u2642])|\uD83D\uDC69\u200D[\u2695\u2696\u2708])\uFE0F|\uD83D\uDC69\u200D\uD83D\uDC67\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC69\u200D\uD83D\uDC69\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D\uDC68(?:\u200D(?:(?:\uD83D[\uDC68\uDC69])\u200D(?:\uD83D[\uDC66\uDC67])|\uD83D[\uDC66\uDC67])|\uD83C[\uDFFB-\uDFFF])|\uD83C\uDFF3\uFE0F\u200D\uD83C\uDF08|\uD83D\uDC69\u200D\uD83D\uDC67|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])\u200D(?:\uD83C[\uDF3E\uDF73\uDF93\uDFA4\uDFA8\uDFEB\uDFED]|\uD83D[\uDCBB\uDCBC\uDD27\uDD2C\uDE80\uDE92]|\uD83E[\uDDB0-\uDDB3])|\uD83D\uDC69\u200D\uD83D\uDC66|\uD83C\uDDF6\uD83C\uDDE6|\uD83C\uDDFD\uD83C\uDDF0|\uD83C\uDDF4\uD83C\uDDF2|\uD83D\uDC69(?:\uD83C[\uDFFB-\uDFFF])|\uD83C\uDDED(?:\uD83C[\uDDF0\uDDF2\uDDF3\uDDF7\uDDF9\uDDFA])|\uD83C\uDDEC(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEE\uDDF1-\uDDF3\uDDF5-\uDDFA\uDDFC\uDDFE])|\uD83C\uDDEA(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDED\uDDF7-\uDDFA])|\uD83C\uDDE8(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDEE\uDDF0-\uDDF5\uDDF7\uDDFA-\uDDFF])|\uD83C\uDDF2(?:\uD83C[\uDDE6\uDDE8-\uDDED\uDDF0-\uDDFF])|\uD83C\uDDF3(?:\uD83C[\uDDE6\uDDE8\uDDEA-\uDDEC\uDDEE\uDDF1\uDDF4\uDDF5\uDDF7\uDDFA\uDDFF])|\uD83C\uDDFC(?:\uD83C[\uDDEB\uDDF8])|\uD83C\uDDFA(?:\uD83C[\uDDE6\uDDEC\uDDF2\uDDF3\uDDF8\uDDFE\uDDFF])|\uD83C\uDDF0(?:\uD83C[\uDDEA\uDDEC-\uDDEE\uDDF2\uDDF3\uDDF5\uDDF7\uDDFC\uDDFE\uDDFF])|\uD83C\uDDEF(?:\uD83C[\uDDEA\uDDF2\uDDF4\uDDF5])|\uD83C\uDDF8(?:\uD83C[\uDDE6-\uDDEA\uDDEC-\uDDF4\uDDF7-\uDDF9\uDDFB\uDDFD-\uDDFF])|\uD83C\uDDEE(?:\uD83C[\uDDE8-\uDDEA\uDDF1-\uDDF4\uDDF6-\uDDF9])|\uD83C\uDDFF(?:\uD83C[\uDDE6\uDDF2\uDDFC])|\uD83C\uDDEB(?:\uD83C[\uDDEE-\uDDF0\uDDF2\uDDF4\uDDF7])|\uD83C\uDDF5(?:\uD83C[\uDDE6\uDDEA-\uDDED\uDDF0-\uDDF3\uDDF7-\uDDF9\uDDFC\uDDFE])|\uD83C\uDDE9(?:\uD83C[\uDDEA\uDDEC\uDDEF\uDDF0\uDDF2\uDDF4\uDDFF])|\uD83C\uDDF9(?:\uD83C[\uDDE6\uDDE8\uDDE9\uDDEB-\uDDED\uDDEF-\uDDF4\uDDF7\uDDF9\uDDFB\uDDFC\uDDFF])|\uD83C\uDDE7(?:\uD83C[\uDDE6\uDDE7\uDDE9-\uDDEF\uDDF1-\uDDF4\uDDF6-\uDDF9\uDDFB\uDDFC\uDDFE\uDDFF])|[#\*0-9]\uFE0F\u20E3|\uD83C\uDDF1(?:\uD83C[\uDDE6-\uDDE8\uDDEE\uDDF0\uDDF7-\uDDFB\uDDFE])|\uD83C\uDDE6(?:\uD83C[\uDDE8-\uDDEC\uDDEE\uDDF1\uDDF2\uDDF4\uDDF6-\uDDFA\uDDFC\uDDFD\uDDFF])|\uD83C\uDDF7(?:\uD83C[\uDDEA\uDDF4\uDDF8\uDDFA\uDDFC])|\uD83C\uDDFB(?:\uD83C[\uDDE6\uDDE8\uDDEA\uDDEC\uDDEE\uDDF3\uDDFA])|\uD83C\uDDFE(?:\uD83C[\uDDEA\uDDF9])|(?:\uD83C[\uDFC3\uDFC4\uDFCA]|\uD83D[\uDC6E\uDC71\uDC73\uDC77\uDC81\uDC82\uDC86\uDC87\uDE45-\uDE47\uDE4B\uDE4D\uDE4E\uDEA3\uDEB4-\uDEB6]|\uD83E[\uDD26\uDD37-\uDD39\uDD3D\uDD3E\uDDB8\uDDB9\uDDD6-\uDDDD])(?:\uD83C[\uDFFB-\uDFFF])|(?:\u26F9|\uD83C[\uDFCB\uDFCC]|\uD83D\uDD75)(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u261D\u270A-\u270D]|\uD83C[\uDF85\uDFC2\uDFC7]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66\uDC67\uDC70\uDC72\uDC74-\uDC76\uDC78\uDC7C\uDC83\uDC85\uDCAA\uDD74\uDD7A\uDD90\uDD95\uDD96\uDE4C\uDE4F\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD30-\uDD36\uDDB5\uDDB6\uDDD1-\uDDD5])(?:\uD83C[\uDFFB-\uDFFF])|(?:[\u231A\u231B\u23E9-\u23EC\u23F0\u23F3\u25FD\u25FE\u2614\u2615\u2648-\u2653\u267F\u2693\u26A1\u26AA\u26AB\u26BD\u26BE\u26C4\u26C5\u26CE\u26D4\u26EA\u26F2\u26F3\u26F5\u26FA\u26FD\u2705\u270A\u270B\u2728\u274C\u274E\u2753-\u2755\u2757\u2795-\u2797\u27B0\u27BF\u2B1B\u2B1C\u2B50\u2B55]|\uD83C[\uDC04\uDCCF\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE1A\uDE2F\uDE32-\uDE36\uDE38-\uDE3A\uDE50\uDE51\uDF00-\uDF20\uDF2D-\uDF35\uDF37-\uDF7C\uDF7E-\uDF93\uDFA0-\uDFCA\uDFCF-\uDFD3\uDFE0-\uDFF0\uDFF4\uDFF8-\uDFFF]|\uD83D[\uDC00-\uDC3E\uDC40\uDC42-\uDCFC\uDCFF-\uDD3D\uDD4B-\uDD4E\uDD50-\uDD67\uDD7A\uDD95\uDD96\uDDA4\uDDFB-\uDE4F\uDE80-\uDEC5\uDECC\uDED0-\uDED2\uDEEB\uDEEC\uDEF4-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])|(?:[#\*0-9\xA9\xAE\u203C\u2049\u2122\u2139\u2194-\u2199\u21A9\u21AA\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA\u24C2\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE\u2600-\u2604\u260E\u2611\u2614\u2615\u2618\u261D\u2620\u2622\u2623\u2626\u262A\u262E\u262F\u2638-\u263A\u2640\u2642\u2648-\u2653\u265F\u2660\u2663\u2665\u2666\u2668\u267B\u267E\u267F\u2692-\u2697\u2699\u269B\u269C\u26A0\u26A1\u26AA\u26AB\u26B0\u26B1\u26BD\u26BE\u26C4\u26C5\u26C8\u26CE\u26CF\u26D1\u26D3\u26D4\u26E9\u26EA\u26F0-\u26F5\u26F7-\u26FA\u26FD\u2702\u2705\u2708-\u270D\u270F\u2712\u2714\u2716\u271D\u2721\u2728\u2733\u2734\u2744\u2747\u274C\u274E\u2753-\u2755\u2757\u2763\u2764\u2795-\u2797\u27A1\u27B0\u27BF\u2934\u2935\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55\u3030\u303D\u3297\u3299]|\uD83C[\uDC04\uDCCF\uDD70\uDD71\uDD7E\uDD7F\uDD8E\uDD91-\uDD9A\uDDE6-\uDDFF\uDE01\uDE02\uDE1A\uDE2F\uDE32-\uDE3A\uDE50\uDE51\uDF00-\uDF21\uDF24-\uDF93\uDF96\uDF97\uDF99-\uDF9B\uDF9E-\uDFF0\uDFF3-\uDFF5\uDFF7-\uDFFF]|\uD83D[\uDC00-\uDCFD\uDCFF-\uDD3D\uDD49-\uDD4E\uDD50-\uDD67\uDD6F\uDD70\uDD73-\uDD7A\uDD87\uDD8A-\uDD8D\uDD90\uDD95\uDD96\uDDA4\uDDA5\uDDA8\uDDB1\uDDB2\uDDBC\uDDC2-\uDDC4\uDDD1-\uDDD3\uDDDC-\uDDDE\uDDE1\uDDE3\uDDE8\uDDEF\uDDF3\uDDFA-\uDE4F\uDE80-\uDEC5\uDECB-\uDED2\uDEE0-\uDEE5\uDEE9\uDEEB\uDEEC\uDEF0\uDEF3-\uDEF9]|\uD83E[\uDD10-\uDD3A\uDD3C-\uDD3E\uDD40-\uDD45\uDD47-\uDD70\uDD73-\uDD76\uDD7A\uDD7C-\uDDA2\uDDB0-\uDDB9\uDDC0-\uDDC2\uDDD0-\uDDFF])\uFE0F|(?:[\u261D\u26F9\u270A-\u270D]|\uD83C[\uDF85\uDFC2-\uDFC4\uDFC7\uDFCA-\uDFCC]|\uD83D[\uDC42\uDC43\uDC46-\uDC50\uDC66-\uDC69\uDC6E\uDC70-\uDC78\uDC7C\uDC81-\uDC83\uDC85-\uDC87\uDCAA\uDD74\uDD75\uDD7A\uDD90\uDD95\uDD96\uDE45-\uDE47\uDE4B-\uDE4F\uDEA3\uDEB4-\uDEB6\uDEC0\uDECC]|\uD83E[\uDD18-\uDD1C\uDD1E\uDD1F\uDD26\uDD30-\uDD39\uDD3D\uDD3E\uDDB5\uDDB6\uDDB8\uDDB9\uDDD1-\uDDDD])/g;
console.log(text.replace(emojiRegex,'');
<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>
function isEmoji(str) {
var ranges = [
'[\uE000-\uF8FF]',
'\uD83C[\uDC00-\uDFFF]',
'\uD83D[\uDC00-\uDFFF]',
'[\u2011-\u26FF]',
'\uD83E[\uDD10-\uDDFF]'
];
if (str.match(ranges.join('|'))) {
return true;
} else {
return false;
}
}
$(document).ready(function(){
$('input').on('input',function(){
var $th = $(this);
console.log("Value of Input"+$th.val());
emojiInput= isEmoji($th.val());
if (emojiInput==true) {
$th.val("");
}
});
});
</script>
</head>
<body>
Enter your name: <input type="text">
</body>
</html>
There is a modern solution using categories
Modern browsers support Unicode property, which allows you to match emojis based on their belonging in the Emoji Unicode category. For example, you can use Unicode property escapes like \p{Emoji} or \P{Emoji} to match/no match emoji characters. Note that 0123456789#* and other characters are interpreted as emojis using the previous Unicode category. Therefore, a better way to do this is to use the {Extended_Pictographic} Unicode category that denotes all the characters typically understood as emojis instead of the {Emoji} category.
const withEmojis = /\p{Extended_Pictographic}/u
withEmojis.test('๐Ÿ˜€๐Ÿ˜€');
//true
withEmojis.test('ab');
//false
withEmojis.test('1');
//false
or with negation
const noEmojis = /\P{Extended_Pictographic}/u
noEmojis.test('๐Ÿ˜€');
//false
noEmojis.test('1212');
//false
You can use mathiasbynens/emoji-regex package to remove or replace emojis.
You can see the latest build's content to grab the regex by visiting following url:
http://unpkg.com/emoji-regex/index.js
In detail, this function first uses TextEncoder to convert content into a byte array with utf-8 encoding, then loops through this array, if it finds a byte whose first five bits are 11110 (i.e. 0xF0), it means this is an emoji start, then it replaces this byte and the next three bytes with 0x30 (i.e. number 0). Finally, it uses TextDecoder to convert the modified byte array back to a string, and uses replaceAll method to remove extra 0s.
function removeEmoji (content) {
let conByte = new TextEncoder("utf-8").encode(content);
for (let i = 0; i < conByte.length; i++) {
if ((conByte[i] & 0xF8) == 0xF0) {
for (let j = 0; j < 4; j++) {
conByte[i+j]=0x30;
}
i += 3;
}
}
content = new TextDecoder("utf-8").decode(conByte);
return content.replaceAll("0000", "");
}

Categories

Resources