I need to match interconnected Arabic characters to do expansion like this:
بسم الله الرحمن الرحيم
becomes
بـسـم الـلـه الـرحـمـن الـرحـيـم
is there a way to do that using regular expressions?
How about something like this:
"بسم الله الرحمن الرحيم".replace(/(ب|ت|ث|ج|ح|خ|س|ش|ص|ض|ط|ظ|ع|غ|ف|ق|ك|ل|م|ن|ه|ي)(?=\S)/g, "$1ـ");
returns:
"بـسـم الـلـه الـرحـمـن الـرحـيـم"
Clarification:
We're matching letters that can be interconnected with the proceeding character by doing an OR group between all those characters, then we make sure it's not followed by a white space (not an end of word). then we replace the first matched group (the letter) by itself ($1) followed by an expansion character.
I had a project once in which I had to choose the correct unicode codes to render depending on the position of the letters; so that they appear connected (or disconnected) as appropriate, because I was using a system non-compliant with Unicode.
The unicode values for the disconnected Meem (م) is different than the one that is connected. BUT:
Unfortunately for your case, and most fortunately for many other cases, it is part of the unicode specification that displaying letters be separated from their actual unicode value. This is why you might have the unicode for a disconnected Meem, but it displayed as connected! The specification includes that comparing the connected Meem to a disconnected one always yields the correct value semantically which is true for equivalence. This makes things a lot easier!
What I ended up doing is to create a static data structure (use hard coded dictionaries or arrays) or XML or whatever. This data structure would tell us when each Arabic letter is connected or not (to both after and before).
For example:
//list of chars that can connect before and after
var canConnectBeforeAfter = new List<char>() { 'ع', 'ت', 'ب', 'ي' /*and so on*/ };
//list of chars that can connect only to character before them (of that character can connect to the one after it! watch out for وو)
var cannotConnectAfter = new List<char>() { 'ر', 'و' };
var cannotConnect = new List<char>() { 'ء' });
You will need to add the right characters for the right lists. I hope you don't have to deal with Harakat!!!!
سلام, let me know if you need clarification
Related
I've a little problem.
I'm using NodeJS as backend. Now, an user has a field "biography", where the user can write something about himself.
Suppose that this field has 220 maxlength, and suppose this as input:
👶🏻👦🏻👧🏻👨🏻👩🏻👱🏻♀️👱🏻👴🏻👵🏻👲🏻👳🏻♀️👳🏻👮🏻♀️👮🏻👷🏻♀️👷🏻💂🏻♀️💂🏻🕵🏻♀️👩🏻⚕️👨🏻⚕️👩🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾
As you can see there aren't 220 emojis (there are 37 emojis), but if I do in my nodejs server
console.log(bio.length)
where bio is the input text, I got 221. How could I "parse" the string input to get the correct length? Is it a problem about unicode?
SOLVED
I used this library: https://github.com/orling/grapheme-splitter
I tried that:
var Grapheme = require('grapheme-splitter');
var splitter = new Grapheme();
console.log(splitter.splitGraphemes(bio).length);
and the length is 37. It works very well!
str.length gives the count of UTF-16 units.
Unicode-proof way to get string length in codepoints (in characters) is [...str].length as iterable protocol splits the string to codepoints.
If we need the length in graphemes (grapheme clusters), we have these native ways:
a. Unicode property escapes in RegExp. See for example: Unicode-aware version of \w or Matching emoji.
b. Intl.Segmenter — coming soon, probably in ES2021. Can be tested with a flag in the last V8 versions (realization was synced with the last spec in V8 86). Unflagged (shipped) in V8 87.
See also:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
What every JavaScript developer should know about Unicode
JavaScript has a Unicode problem
Unicode-aware regular expressions in ES2015
ES6 Strings (and Unicode, ❤) in Depth
JavaScript for impatient programmers. Unicode – a brief introduction
TL;DR there are solutions, but they don’t work in every case. Unicode can feel like a dark art.
There seems to be limitations in various solutions I have seen presented, with the issue going beyond emojis and covering other characters in the Unicode range. Consider é can be stored as é or e + ‘, if using combing characters. This can even lead to two strings that look the same not being equal. Also note, in certain cases a single emoji can be 11 characters when stored and as a result 22 bytes, assuming UTF16.
The way this is handled and how characters are combined, or displayed, can even vary between browsers and operating systems. So, while you may think you cracked it, there is a risk another environment breaks this. Be sure to test where it matters.
Now, there is the front-end vs back-end problem: you solved the character count problem so it works well for human users, now your single emoji blows right past the allocated field size in the database. Less of an issue with databases such as mongo, but can be one with SQL databases, where field allocation was conservative. This means how you solve your problem will depend where the hardest limitation comes in.
Note, that a basic solution does involve converting a string to an array and getting the length, accepting limitations:
Array.from(str)
This will fall apart when characters are combined and dealing with astral planes.
A few high level approaches, that take into account limitations:
use approaches that solve the front-end issue, as best as possible, and then ensure storage issues are resolved
be more conservative with the advertised front-end limits, if the database or other storage can’t be adjusted
limit the character types that can be entered
clearly indicate limitations of the length calculation
Additionally, given the complexity of the issue it may be worth seeing if there is a popular JS library that already deals with this? I did not find one at the time of writing. Hopefully this is something that would become core to Javascript at some point.
Other pages to read:
https://blog.jonnew.com/posts/poo-dot-length-equals-two
https://mathiasbynens.be/notes/javascript-unicode
https://www.contentful.com/blog/2016/12/06/unicode-javascript-and-the-emoji-family/
https://dmitripavlutin.com/what-every-javascript-developer-should-know-about-unicode/
I answered to a similar question here
But basically, here it is :
'👍'.match(/./gu).length == 1
As :
'👍'.length == 2
More precision in my original post
function fancyCount2(str){
const joiner = "\u{200D}";
const split = str.split(joiner);
let count = 0;
for(const s of split){
//removing the variation selectors
const num = Array.from(s.split(/[\ufe00-\ufe0f]/).join("")).length;
count += num;
}
//assuming the joiners are used appropriately
return count / split.length;
}
With a regex that can parse emojis, this can be done easily and without the use of external libraries. Please see the code snippets for examples. 👷🏻♀️
Note that grapheme-splitter as suggested in the question will overcount and split apart compound emojis that contain other emojis, such as this one: 🧑🏻🤝🧑🏻. This is reported as three distinct "graphemes", 🧑🏻 and 🤝 and 🧑🏻
Here we are using the 'compact', literal version so it'll fit, but there's a safe, long version that uses Unicode escapes as well.
For more info on the regex see also this answer.
/*the pattern (compact version)*/
var emojiPattern = String.raw`(?:🧑🏻❤️💋🧑🏼|🧑🏻❤️💋🧑🏽|🧑🏻❤️💋🧑🏾|🧑🏻❤️💋🧑🏿|🧑🏼❤️💋🧑🏻|🧑🏼❤️💋🧑🏽|🧑🏼❤️💋🧑🏾|🧑🏼❤️💋🧑🏿|🧑🏽❤️💋🧑🏻|🧑🏽❤️💋🧑🏼|🧑🏽❤️💋🧑🏾|🧑🏽❤️💋🧑🏿|🧑🏾❤️💋🧑🏻|🧑🏾❤️💋🧑🏼|🧑🏾❤️💋🧑🏽|🧑🏾❤️💋🧑🏿|🧑🏿❤️💋🧑🏻|🧑🏿❤️💋🧑🏼|🧑🏿❤️💋🧑🏽|🧑🏿❤️💋🧑🏾|👩🏻❤️💋👨🏻|👩🏻❤️💋👨🏼|👩🏻❤️💋👨🏽|👩🏻❤️💋👨🏾|👩🏻❤️💋👨🏿|👩🏼❤️💋👨🏻|👩🏼❤️💋👨🏼|👩🏼❤️💋👨🏽|👩🏼❤️💋👨🏾|👩🏼❤️💋👨🏿|👩🏽❤️💋👨🏻|👩🏽❤️💋👨🏼|👩🏽❤️💋👨🏽|👩🏽❤️💋👨🏾|👩🏽❤️💋👨🏿|👩🏾❤️💋👨🏻|👩🏾❤️💋👨🏼|👩🏾❤️💋👨🏽|👩🏾❤️💋👨🏾|👩🏾❤️💋👨🏿|👩🏿❤️💋👨🏻|👩🏿❤️💋👨🏼|👩🏿❤️💋👨🏽|👩🏿❤️💋👨🏾|👩🏿❤️💋👨🏿|👨🏻❤️💋👨🏻|👨🏻❤️💋👨🏼|👨🏻❤️💋👨🏽|👨🏻❤️💋👨🏾|👨🏻❤️💋👨🏿|👨🏼❤️💋👨🏻|👨🏼❤️💋👨🏼|👨🏼❤️💋👨🏽|👨🏼❤️💋👨🏾|👨🏼❤️💋👨🏿|👨🏽❤️💋👨🏻|👨🏽❤️💋👨🏼|👨🏽❤️💋👨🏽|👨🏽❤️💋👨🏾|👨🏽❤️💋👨🏿|👨🏾❤️💋👨🏻|👨🏾❤️💋👨🏼|👨🏾❤️💋👨🏽|👨🏾❤️💋👨🏾|👨🏾❤️💋👨🏿|👨🏿❤️💋👨🏻|👨🏿❤️💋👨🏼|👨🏿❤️💋👨🏽|👨🏿❤️💋👨🏾|👨🏿❤️💋👨🏿|👩🏻❤️💋👩🏻|👩🏻❤️💋👩🏼|👩🏻❤️💋👩🏽|👩🏻❤️💋👩🏾|👩🏻❤️💋👩🏿|👩🏼❤️💋👩🏻|👩🏼❤️💋👩🏼|👩🏼❤️💋👩🏽|👩🏼❤️💋👩🏾|👩🏼❤️💋👩🏿|👩🏽❤️💋👩🏻|👩🏽❤️💋👩🏼|👩🏽❤️💋👩🏽|👩🏽❤️💋👩🏾|👩🏽❤️💋👩🏿|👩🏾❤️💋👩🏻|👩🏾❤️💋👩🏼|👩🏾❤️💋👩🏽|👩🏾❤️💋👩🏾|👩🏾❤️💋👩🏿|👩🏿❤️💋👩🏻|👩🏿❤️💋👩🏼|👩🏿❤️💋👩🏽|👩🏿❤️💋👩🏾|👩🏿❤️💋👩🏿|🏴|🏴|🏴|🧑🏻🤝🧑🏻|🧑🏻🤝🧑🏼|🧑🏻🤝🧑🏽|🧑🏻🤝🧑🏾|🧑🏻🤝🧑🏿|🧑🏼🤝🧑🏻|🧑🏼🤝🧑🏼|🧑🏼🤝🧑🏽|🧑🏼🤝🧑🏾|🧑🏼🤝🧑🏿|🧑🏽🤝🧑🏻|🧑🏽🤝🧑🏼|🧑🏽🤝🧑🏽|🧑🏽🤝🧑🏾|🧑🏽🤝🧑🏿|🧑🏾🤝🧑🏻|🧑🏾🤝🧑🏼|🧑🏾🤝🧑🏽|🧑🏾🤝🧑🏾|🧑🏾🤝🧑🏿|🧑🏿🤝🧑🏻|🧑🏿🤝🧑🏼|🧑🏿🤝🧑🏽|🧑🏿🤝🧑🏾|🧑🏿🤝🧑🏿|👩🏻🤝👩🏼|👩🏻🤝👩🏽|👩🏻🤝👩🏾|👩🏻🤝👩🏿|👩🏼🤝👩🏻|👩🏼🤝👩🏽|👩🏼🤝👩🏾|👩🏼🤝👩🏿|👩🏽🤝👩🏻|👩🏽🤝👩🏼|👩🏽🤝👩🏾|👩🏽🤝👩🏿|👩🏾🤝👩🏻|👩🏾🤝👩🏼|👩🏾🤝👩🏽|👩🏾🤝👩🏿|👩🏿🤝👩🏻|👩🏿🤝👩🏼|👩🏿🤝👩🏽|👩🏿🤝👩🏾|👩🏻🤝👨🏼|👩🏻🤝👨🏽|👩🏻🤝👨🏾|👩🏻🤝👨🏿|👩🏼🤝👨🏻|👩🏼🤝👨🏽|👩🏼🤝👨🏾|👩🏼🤝👨🏿|👩🏽🤝👨🏻|👩🏽🤝👨🏼|👩🏽🤝👨🏾|👩🏽🤝👨🏿|👩🏾🤝👨🏻|👩🏾🤝👨🏼|👩🏾🤝👨🏽|👩🏾🤝👨🏿|👩🏿🤝👨🏻|👩🏿🤝👨🏼|👩🏿🤝👨🏽|👩🏿🤝👨🏾|👨🏻🤝👨🏼|👨🏻🤝👨🏽|👨🏻🤝👨🏾|👨🏻🤝👨🏿|👨🏼🤝👨🏻|👨🏼🤝👨🏽|👨🏼🤝👨🏾|👨🏼🤝👨🏿|👨🏽🤝👨🏻|👨🏽🤝👨🏼|👨🏽🤝👨🏾|👨🏽🤝👨🏿|👨🏾🤝👨🏻|👨🏾🤝👨🏼|👨🏾🤝👨🏽|👨🏾🤝👨🏿|👨🏿🤝👨🏻|👨🏿🤝👨🏼|👨🏿🤝👨🏽|👨🏿🤝👨🏾|🧑🏻❤️🧑🏼|🧑🏻❤️🧑🏽|🧑🏻❤️🧑🏾|🧑🏻❤️🧑🏿|🧑🏼❤️🧑🏻|🧑🏼❤️🧑🏽|🧑🏼❤️🧑🏾|🧑🏼❤️🧑🏿|🧑🏽❤️🧑🏻|🧑🏽❤️🧑🏼|🧑🏽❤️🧑🏾|🧑🏽❤️🧑🏿|🧑🏾❤️🧑🏻|🧑🏾❤️🧑🏼|🧑🏾❤️🧑🏽|🧑🏾❤️🧑🏿|🧑🏿❤️🧑🏻|🧑🏿❤️🧑🏼|🧑🏿❤️🧑🏽|🧑🏿❤️🧑🏾|👩🏻❤️👨🏻|👩🏻❤️👨🏼|👩🏻❤️👨🏽|👩🏻❤️👨🏾|👩🏻❤️👨🏿|👩🏼❤️👨🏻|👩🏼❤️👨🏼|👩🏼❤️👨🏽|👩🏼❤️👨🏾|👩🏼❤️👨🏿|👩🏽❤️👨🏻|👩🏽❤️👨🏼|👩🏽❤️👨🏽|👩🏽❤️👨🏾|👩🏽❤️👨🏿|👩🏾❤️👨🏻|👩🏾❤️👨🏼|👩🏾❤️👨🏽|👩🏾❤️👨🏾|👩🏾❤️👨🏿|👩🏿❤️👨🏻|👩🏿❤️👨🏼|👩🏿❤️👨🏽|👩🏿❤️👨🏾|👩🏿❤️👨🏿|👨🏻❤️👨🏻|👨🏻❤️👨🏼|👨🏻❤️👨🏽|👨🏻❤️👨🏾|👨🏻❤️👨🏿|👨🏼❤️👨🏻|👨🏼❤️👨🏼|👨🏼❤️👨🏽|👨🏼❤️👨🏾|👨🏼❤️👨🏿|👨🏽❤️👨🏻|👨🏽❤️👨🏼|👨🏽❤️👨🏽|👨🏽❤️👨🏾|👨🏽❤️👨🏿|👨🏾❤️👨🏻|👨🏾❤️👨🏼|👨🏾❤️👨🏽|👨🏾❤️👨🏾|👨🏾❤️👨🏿|👨🏿❤️👨🏻|👨🏿❤️👨🏼|👨🏿❤️👨🏽|👨🏿❤️👨🏾|👨🏿❤️👨🏿|👩🏻❤️👩🏻|👩🏻❤️👩🏼|👩🏻❤️👩🏽|👩🏻❤️👩🏾|👩🏻❤️👩🏿|👩🏼❤️👩🏻|👩🏼❤️👩🏼|👩🏼❤️👩🏽|👩🏼❤️👩🏾|👩🏼❤️👩🏿|👩🏽❤️👩🏻|👩🏽❤️👩🏼|👩🏽❤️👩🏽|👩🏽❤️👩🏾|👩🏽❤️👩🏿|👩🏾❤️👩🏻|👩🏾❤️👩🏼|👩🏾❤️👩🏽|👩🏾❤️👩🏾|👩🏾❤️👩🏿|👩🏿❤️👩🏻|👩🏿❤️👩🏼|👩🏿❤️👩🏽|👩🏿❤️👩🏾|👩🏿❤️👩🏿|👩❤️💋👨|👨❤️💋👨|👩❤️💋👩|👨👩👧👦|👨👩👦👦|👨👩👧👧|👨👨👧👦|👨👨👦👦|👨👨👧👧|👩👩👧👦|👩👩👦👦|👩👩👧👧|🧑🤝🧑|👩❤️👨|👨❤️👨|👩❤️👩|👨👩👦|👨👩👧|👨👨👦|👨👨👧|👩👩👦|👩👩👧|👨👦👦|👨👧👦|👨👧👧|👩👦👦|👩👧👦|👩👧👧|👁️🗨️|🧔🏻♂️|🧔🏼♂️|🧔🏽♂️|🧔🏾♂️|🧔🏿♂️|🧔🏻♀️|🧔🏼♀️|🧔🏽♀️|🧔🏾♀️|🧔🏿♀️|👨🏻🦰|👨🏼🦰|👨🏽🦰|👨🏾🦰|👨🏿🦰|👨🏻🦱|👨🏼🦱|👨🏽🦱|👨🏾🦱|👨🏿🦱|👨🏻🦳|👨🏼🦳|👨🏽🦳|👨🏾🦳|👨🏿🦳|👨🏻🦲|👨🏼🦲|👨🏽🦲|👨🏾🦲|👨🏿🦲|👩🏻🦰|👩🏼🦰|👩🏽🦰|👩🏾🦰|👩🏿🦰|🧑🏻🦰|🧑🏼🦰|🧑🏽🦰|🧑🏾🦰|🧑🏿🦰|👩🏻🦱|👩🏼🦱|👩🏽🦱|👩🏾🦱|👩🏿🦱|🧑🏻🦱|🧑🏼🦱|🧑🏽🦱|🧑🏾🦱|🧑🏿🦱|👩🏻🦳|👩🏼🦳|👩🏽🦳|👩🏾🦳|👩🏿🦳|🧑🏻🦳|🧑🏼🦳|🧑🏽🦳|🧑🏾🦳|🧑🏿🦳|👩🏻🦲|👩🏼🦲|👩🏽🦲|👩🏾🦲|👩🏿🦲|🧑🏻🦲|🧑🏼🦲|🧑🏽🦲|🧑🏾🦲|🧑🏿🦲|👱🏻♀️|👱🏼♀️|👱🏽♀️|👱🏾♀️|👱🏿♀️|👱🏻♂️|👱🏼♂️|👱🏽♂️|👱🏾♂️|👱🏿♂️|🙍🏻♂️|🙍🏼♂️|🙍🏽♂️|🙍🏾♂️|🙍🏿♂️|🙍🏻♀️|🙍🏼♀️|🙍🏽♀️|🙍🏾♀️|🙍🏿♀️|🙎🏻♂️|🙎🏼♂️|🙎🏽♂️|🙎🏾♂️|🙎🏿♂️|🙎🏻♀️|🙎🏼♀️|🙎🏽♀️|🙎🏾♀️|🙎🏿♀️|🙅🏻♂️|🙅🏼♂️|🙅🏽♂️|🙅🏾♂️|🙅🏿♂️|🙅🏻♀️|🙅🏼♀️|🙅🏽♀️|🙅🏾♀️|🙅🏿♀️|🙆🏻♂️|🙆🏼♂️|🙆🏽♂️|🙆🏾♂️|🙆🏿♂️|🙆🏻♀️|🙆🏼♀️|🙆🏽♀️|🙆🏾♀️|🙆🏿♀️|💁🏻♂️|💁🏼♂️|💁🏽♂️|💁🏾♂️|💁🏿♂️|💁🏻♀️|💁🏼♀️|💁🏽♀️|💁🏾♀️|💁🏿♀️|🙋🏻♂️|🙋🏼♂️|🙋🏽♂️|🙋🏾♂️|🙋🏿♂️|🙋🏻♀️|🙋🏼♀️|🙋🏽♀️|🙋🏾♀️|🙋🏿♀️|🧏🏻♂️|🧏🏼♂️|🧏🏽♂️|🧏🏾♂️|🧏🏿♂️|🧏🏻♀️|🧏🏼♀️|🧏🏽♀️|🧏🏾♀️|🧏🏿♀️|🙇🏻♂️|🙇🏼♂️|🙇🏽♂️|🙇🏾♂️|🙇🏿♂️|🙇🏻♀️|🙇🏼♀️|🙇🏽♀️|🙇🏾♀️|🙇🏿♀️|🤦🏻♂️|🤦🏼♂️|🤦🏽♂️|🤦🏾♂️|🤦🏿♂️|🤦🏻♀️|🤦🏼♀️|🤦🏽♀️|🤦🏾♀️|🤦🏿♀️|🤷🏻♂️|🤷🏼♂️|🤷🏽♂️|🤷🏾♂️|🤷🏿♂️|🤷🏻♀️|🤷🏼♀️|🤷🏽♀️|🤷🏾♀️|🤷🏿♀️|🧑🏻⚕️|🧑🏼⚕️|🧑🏽⚕️|🧑🏾⚕️|🧑🏿⚕️|👨🏻⚕️|👨🏼⚕️|👨🏽⚕️|👨🏾⚕️|👨🏿⚕️|👩🏻⚕️|👩🏼⚕️|👩🏽⚕️|👩🏾⚕️|👩🏿⚕️|🧑🏻🎓|🧑🏼🎓|🧑🏽🎓|🧑🏾🎓|🧑🏿🎓|👨🏻🎓|👨🏼🎓|👨🏽🎓|👨🏾🎓|👨🏿🎓|👩🏻🎓|👩🏼🎓|👩🏽🎓|👩🏾🎓|👩🏿🎓|🧑🏻🏫|🧑🏼🏫|🧑🏽🏫|🧑🏾🏫|🧑🏿🏫|👨🏻🏫|👨🏼🏫|👨🏽🏫|👨🏾🏫|👨🏿🏫|👩🏻🏫|👩🏼🏫|👩🏽🏫|👩🏾🏫|👩🏿🏫|🧑🏻⚖️|🧑🏼⚖️|🧑🏽⚖️|🧑🏾⚖️|🧑🏿⚖️|👨🏻⚖️|👨🏼⚖️|👨🏽⚖️|👨🏾⚖️|👨🏿⚖️|👩🏻⚖️|👩🏼⚖️|👩🏽⚖️|👩🏾⚖️|👩🏿⚖️|🧑🏻🌾|🧑🏼🌾|🧑🏽🌾|🧑🏾🌾|🧑🏿🌾|👨🏻🌾|👨🏼🌾|👨🏽🌾|👨🏾🌾|👨🏿🌾|👩🏻🌾|👩🏼🌾|👩🏽🌾|👩🏾🌾|👩🏿🌾|🧑🏻🍳|🧑🏼🍳|🧑🏽🍳|🧑🏾🍳|🧑🏿🍳|👨🏻🍳|👨🏼🍳|👨🏽🍳|👨🏾🍳|👨🏿🍳|👩🏻🍳|👩🏼🍳|👩🏽🍳|👩🏾🍳|👩🏿🍳|🧑🏻🔧|🧑🏼🔧|🧑🏽🔧|🧑🏾🔧|🧑🏿🔧|👨🏻🔧|👨🏼🔧|👨🏽🔧|👨🏾🔧|👨🏿🔧|👩🏻🔧|👩🏼🔧|👩🏽🔧|👩🏾🔧|👩🏿🔧|🧑🏻🏭|🧑🏼🏭|🧑🏽🏭|🧑🏾🏭|🧑🏿🏭|👨🏻🏭|👨🏼🏭|👨🏽🏭|👨🏾🏭|👨🏿🏭|👩🏻🏭|👩🏼🏭|👩🏽🏭|👩🏾🏭|👩🏿🏭|🧑🏻💼|🧑🏼💼|🧑🏽💼|🧑🏾💼|🧑🏿💼|👨🏻💼|👨🏼💼|👨🏽💼|👨🏾💼|👨🏿💼|👩🏻💼|👩🏼💼|👩🏽💼|👩🏾💼|👩🏿💼|🧑🏻🔬|🧑🏼🔬|🧑🏽🔬|🧑🏾🔬|🧑🏿🔬|👨🏻🔬|👨🏼🔬|👨🏽🔬|👨🏾🔬|👨🏿🔬|👩🏻🔬|👩🏼🔬|👩🏽🔬|👩🏾🔬|👩🏿🔬|🧑🏻💻|🧑🏼💻|🧑🏽💻|🧑🏾💻|🧑🏿💻|👨🏻💻|👨🏼💻|👨🏽💻|👨🏾💻|👨🏿💻|👩🏻💻|👩🏼💻|👩🏽💻|👩🏾💻|👩🏿💻|🧑🏻🎤|🧑🏼🎤|🧑🏽🎤|🧑🏾🎤|🧑🏿🎤|👨🏻🎤|👨🏼🎤|👨🏽🎤|👨🏾🎤|👨🏿🎤|👩🏻🎤|👩🏼🎤|👩🏽🎤|👩🏾🎤|👩🏿🎤|🧑🏻🎨|🧑🏼🎨|🧑🏽🎨|🧑🏾🎨|🧑🏿🎨|👨🏻🎨|👨🏼🎨|👨🏽🎨|👨🏾🎨|👨🏿🎨|👩🏻🎨|👩🏼🎨|👩🏽🎨|👩🏾🎨|👩🏿🎨|🧑🏻✈️|🧑🏼✈️|🧑🏽✈️|🧑🏾✈️|🧑🏿✈️|👨🏻✈️|👨🏼✈️|👨🏽✈️|👨🏾✈️|👨🏿✈️|👩🏻✈️|👩🏼✈️|👩🏽✈️|👩🏾✈️|👩🏿✈️|🧑🏻🚀|🧑🏼🚀|🧑🏽🚀|🧑🏾🚀|🧑🏿🚀|👨🏻🚀|👨🏼🚀|👨🏽🚀|👨🏾🚀|👨🏿🚀|👩🏻🚀|👩🏼🚀|👩🏽🚀|👩🏾🚀|👩🏿🚀|🧑🏻🚒|🧑🏼🚒|🧑🏽🚒|🧑🏾🚒|🧑🏿🚒|👨🏻🚒|👨🏼🚒|👨🏽🚒|👨🏾🚒|👨🏿🚒|👩🏻🚒|👩🏼🚒|👩🏽🚒|👩🏾🚒|👩🏿🚒|👮🏻♂️|👮🏼♂️|👮🏽♂️|👮🏾♂️|👮🏿♂️|👮🏻♀️|👮🏼♀️|👮🏽♀️|👮🏾♀️|👮🏿♀️|🕵🏻♂️|🕵🏼♂️|🕵🏽♂️|🕵🏾♂️|🕵🏿♂️|🕵🏻♀️|🕵🏼♀️|🕵🏽♀️|🕵🏾♀️|🕵🏿♀️|💂🏻♂️|💂🏼♂️|💂🏽♂️|💂🏾♂️|💂🏿♂️|💂🏻♀️|💂🏼♀️|💂🏽♀️|💂🏾♀️|💂🏿♀️|👷🏻♂️|👷🏼♂️|👷🏽♂️|👷🏾♂️|👷🏿♂️|👷🏻♀️|👷🏼♀️|👷🏽♀️|👷🏾♀️|👷🏿♀️|👳🏻♂️|👳🏼♂️|👳🏽♂️|👳🏾♂️|👳🏿♂️|👳🏻♀️|👳🏼♀️|👳🏽♀️|👳🏾♀️|👳🏿♀️|🤵🏻♂️|🤵🏼♂️|🤵🏽♂️|🤵🏾♂️|🤵🏿♂️|🤵🏻♀️|🤵🏼♀️|🤵🏽♀️|🤵🏾♀️|🤵🏿♀️|👰🏻♂️|👰🏼♂️|👰🏽♂️|👰🏾♂️|👰🏿♂️|👰🏻♀️|👰🏼♀️|👰🏽♀️|👰🏾♀️|👰🏿♀️|👩🏻🍼|👩🏼🍼|👩🏽🍼|👩🏾🍼|👩🏿🍼|👨🏻🍼|👨🏼🍼|👨🏽🍼|👨🏾🍼|👨🏿🍼|🧑🏻🍼|🧑🏼🍼|🧑🏽🍼|🧑🏾🍼|🧑🏿🍼|🧑🏻🎄|🧑🏼🎄|🧑🏽🎄|🧑🏾🎄|🧑🏿🎄|🦸🏻♂️|🦸🏼♂️|🦸🏽♂️|🦸🏾♂️|🦸🏿♂️|🦸🏻♀️|🦸🏼♀️|🦸🏽♀️|🦸🏾♀️|🦸🏿♀️|🦹🏻♂️|🦹🏼♂️|🦹🏽♂️|🦹🏾♂️|🦹🏿♂️|🦹🏻♀️|🦹🏼♀️|🦹🏽♀️|🦹🏾♀️|🦹🏿♀️|🧙🏻♂️|🧙🏼♂️|🧙🏽♂️|🧙🏾♂️|🧙🏿♂️|🧙🏻♀️|🧙🏼♀️|🧙🏽♀️|🧙🏾♀️|🧙🏿♀️|🧚🏻♂️|🧚🏼♂️|🧚🏽♂️|🧚🏾♂️|🧚🏿♂️|🧚🏻♀️|🧚🏼♀️|🧚🏽♀️|🧚🏾♀️|🧚🏿♀️|🧛🏻♂️|🧛🏼♂️|🧛🏽♂️|🧛🏾♂️|🧛🏿♂️|🧛🏻♀️|🧛🏼♀️|🧛🏽♀️|🧛🏾♀️|🧛🏿♀️|🧜🏻♂️|🧜🏼♂️|🧜🏽♂️|🧜🏾♂️|🧜🏿♂️|🧜🏻♀️|🧜🏼♀️|🧜🏽♀️|🧜🏾♀️|🧜🏿♀️|🧝🏻♂️|🧝🏼♂️|🧝🏽♂️|🧝🏾♂️|🧝🏿♂️|🧝🏻♀️|🧝🏼♀️|🧝🏽♀️|🧝🏾♀️|🧝🏿♀️|💆🏻♂️|💆🏼♂️|💆🏽♂️|💆🏾♂️|💆🏿♂️|💆🏻♀️|💆🏼♀️|💆🏽♀️|💆🏾♀️|💆🏿♀️|💇🏻♂️|💇🏼♂️|💇🏽♂️|💇🏾♂️|💇🏿♂️|💇🏻♀️|💇🏼♀️|💇🏽♀️|💇🏾♀️|💇🏿♀️|🚶🏻♂️|🚶🏼♂️|🚶🏽♂️|🚶🏾♂️|🚶🏿♂️|🚶🏻♀️|🚶🏼♀️|🚶🏽♀️|🚶🏾♀️|🚶🏿♀️|🧍🏻♂️|🧍🏼♂️|🧍🏽♂️|🧍🏾♂️|🧍🏿♂️|🧍🏻♀️|🧍🏼♀️|🧍🏽♀️|🧍🏾♀️|🧍🏿♀️|🧎🏻♂️|🧎🏼♂️|🧎🏽♂️|🧎🏾♂️|🧎🏿♂️|🧎🏻♀️|🧎🏼♀️|🧎🏽♀️|🧎🏾♀️|🧎🏿♀️|🧑🏻🦯|🧑🏼🦯|🧑🏽🦯|🧑🏾🦯|🧑🏿🦯|👨🏻🦯|👨🏼🦯|👨🏽🦯|👨🏾🦯|👨🏿🦯|👩🏻🦯|👩🏼🦯|👩🏽🦯|👩🏾🦯|👩🏿🦯|🧑🏻🦼|🧑🏼🦼|🧑🏽🦼|🧑🏾🦼|🧑🏿🦼|👨🏻🦼|👨🏼🦼|👨🏽🦼|👨🏾🦼|👨🏿🦼|👩🏻🦼|👩🏼🦼|👩🏽🦼|👩🏾🦼|👩🏿🦼|🧑🏻🦽|🧑🏼🦽|🧑🏽🦽|🧑🏾🦽|🧑🏿🦽|👨🏻🦽|👨🏼🦽|👨🏽🦽|👨🏾🦽|👨🏿🦽|👩🏻🦽|👩🏼🦽|👩🏽🦽|👩🏾🦽|👩🏿🦽|🏃🏻♂️|🏃🏼♂️|🏃🏽♂️|🏃🏾♂️|🏃🏿♂️|🏃🏻♀️|🏃🏼♀️|🏃🏽♀️|🏃🏾♀️|🏃🏿♀️|🧖🏻♂️|🧖🏼♂️|🧖🏽♂️|🧖🏾♂️|🧖🏿♂️|🧖🏻♀️|🧖🏼♀️|🧖🏽♀️|🧖🏾♀️|🧖🏿♀️|🧗🏻♂️|🧗🏼♂️|🧗🏽♂️|🧗🏾♂️|🧗🏿♂️|🧗🏻♀️|🧗🏼♀️|🧗🏽♀️|🧗🏾♀️|🧗🏿♀️|🏌🏻♂️|🏌🏼♂️|🏌🏽♂️|🏌🏾♂️|🏌🏿♂️|🏌🏻♀️|🏌🏼♀️|🏌🏽♀️|🏌🏾♀️|🏌🏿♀️|🏄🏻♂️|🏄🏼♂️|🏄🏽♂️|🏄🏾♂️|🏄🏿♂️|🏄🏻♀️|🏄🏼♀️|🏄🏽♀️|🏄🏾♀️|🏄🏿♀️|🚣🏻♂️|🚣🏼♂️|🚣🏽♂️|🚣🏾♂️|🚣🏿♂️|🚣🏻♀️|🚣🏼♀️|🚣🏽♀️|🚣🏾♀️|🚣🏿♀️|🏊🏻♂️|🏊🏼♂️|🏊🏽♂️|🏊🏾♂️|🏊🏿♂️|🏊🏻♀️|🏊🏼♀️|🏊🏽♀️|🏊🏾♀️|🏊🏿♀️|🏋🏻♂️|🏋🏼♂️|🏋🏽♂️|🏋🏾♂️|🏋🏿♂️|🏋🏻♀️|🏋🏼♀️|🏋🏽♀️|🏋🏾♀️|🏋🏿♀️|🚴🏻♂️|🚴🏼♂️|🚴🏽♂️|🚴🏾♂️|🚴🏿♂️|🚴🏻♀️|🚴🏼♀️|🚴🏽♀️|🚴🏾♀️|🚴🏿♀️|🚵🏻♂️|🚵🏼♂️|🚵🏽♂️|🚵🏾♂️|🚵🏿♂️|🚵🏻♀️|🚵🏼♀️|🚵🏽♀️|🚵🏾♀️|🚵🏿♀️|🤸🏻♂️|🤸🏼♂️|🤸🏽♂️|🤸🏾♂️|🤸🏿♂️|🤸🏻♀️|🤸🏼♀️|🤸🏽♀️|🤸🏾♀️|🤸🏿♀️|🤽🏻♂️|🤽🏼♂️|🤽🏽♂️|🤽🏾♂️|🤽🏿♂️|🤽🏻♀️|🤽🏼♀️|🤽🏽♀️|🤽🏾♀️|🤽🏿♀️|🤾🏻♂️|🤾🏼♂️|🤾🏽♂️|🤾🏾♂️|🤾🏿♂️|🤾🏻♀️|🤾🏼♀️|🤾🏽♀️|🤾🏾♀️|🤾🏿♀️|🤹🏻♂️|🤹🏼♂️|🤹🏽♂️|🤹🏾♂️|🤹🏿♂️|🤹🏻♀️|🤹🏼♀️|🤹🏽♀️|🤹🏾♀️|🤹🏿♀️|🧘🏻♂️|🧘🏼♂️|🧘🏽♂️|🧘🏾♂️|🧘🏿♂️|🧘🏻♀️|🧘🏼♀️|🧘🏽♀️|🧘🏾♀️|🧘🏿♀️|😶🌫️|🕵️♂️|🕵️♀️|🏌️♂️|🏌️♀️|🏋️♂️|🏋️♀️|🏳️🌈|🏳️⚧️|⛹🏻♂️|⛹🏼♂️|⛹🏽♂️|⛹🏾♂️|⛹🏿♂️|⛹🏻♀️|⛹🏼♀️|⛹🏽♀️|⛹🏾♀️|⛹🏿♀️|😮💨|😵💫|❤️🔥|❤️🩹|🧔♂️|🧔♀️|👨🦰|👨🦱|👨🦳|👨🦲|👩🦰|🧑🦰|👩🦱|🧑🦱|👩🦳|🧑🦳|👩🦲|🧑🦲|👱♀️|👱♂️|🙍♂️|🙍♀️|🙎♂️|🙎♀️|🙅♂️|🙅♀️|🙆♂️|🙆♀️|💁♂️|💁♀️|🙋♂️|🙋♀️|🧏♂️|🧏♀️|🙇♂️|🙇♀️|🤦♂️|🤦♀️|🤷♂️|🤷♀️|🧑⚕️|👨⚕️|👩⚕️|🧑🎓|👨🎓|👩🎓|🧑🏫|👨🏫|👩🏫|🧑⚖️|👨⚖️|👩⚖️|🧑🌾|👨🌾|👩🌾|🧑🍳|👨🍳|👩🍳|🧑🔧|👨🔧|👩🔧|🧑🏭|👨🏭|👩🏭|🧑💼|👨💼|👩💼|🧑🔬|👨🔬|👩🔬|🧑💻|👨💻|👩💻|🧑🎤|👨🎤|👩🎤|🧑🎨|👨🎨|👩🎨|🧑✈️|👨✈️|👩✈️|🧑🚀|👨🚀|👩🚀|🧑🚒|👨🚒|👩🚒|👮♂️|👮♀️|💂♂️|💂♀️|👷♂️|👷♀️|👳♂️|👳♀️|🤵♂️|🤵♀️|👰♂️|👰♀️|👩🍼|👨🍼|🧑🍼|🧑🎄|🦸♂️|🦸♀️|🦹♂️|🦹♀️|🧙♂️|🧙♀️|🧚♂️|🧚♀️|🧛♂️|🧛♀️|🧜♂️|🧜♀️|🧝♂️|🧝♀️|🧞♂️|🧞♀️|🧟♂️|🧟♀️|💆♂️|💆♀️|💇♂️|💇♀️|🚶♂️|🚶♀️|🧍♂️|🧍♀️|🧎♂️|🧎♀️|🧑🦯|👨🦯|👩🦯|🧑🦼|👨🦼|👩🦼|🧑🦽|👨🦽|👩🦽|🏃♂️|🏃♀️|👯♂️|👯♀️|🧖♂️|🧖♀️|🧗♂️|🧗♀️|🏄♂️|🏄♀️|🚣♂️|🚣♀️|🏊♂️|🏊♀️|⛹️♂️|⛹️♀️|🚴♂️|🚴♀️|🚵♂️|🚵♀️|🤸♂️|🤸♀️|🤼♂️|🤼♀️|🤽♂️|🤽♀️|🤾♂️|🤾♀️|🤹♂️|🤹♀️|🧘♂️|🧘♀️|👨👦|👨👧|👩👦|👩👧|🐕🦺|🐻❄️|🏴☠️|🐈⬛|🇦🇨|🇦🇩|🇦🇪|🇦🇫|🇦🇬|🇦🇮|🇦🇱|🇦🇲|🇦🇴|🇦🇶|🇦🇷|🇦🇸|🇦🇹|🇦🇺|🇦🇼|🇦🇽|🇦🇿|🇧🇦|🇧🇧|🇧🇩|🇧🇪|🇧🇫|🇧🇬|🇧🇭|🇧🇮|🇧🇯|🇧🇱|🇧🇲|🇧🇳|🇧🇴|🇧🇶|🇧🇷|🇧🇸|🇧🇹|🇧🇻|🇧🇼|🇧🇾|🇧🇿|🇨🇦|🇨🇨|🇨🇩|🇨🇫|🇨🇬|🇨🇭|🇨🇮|🇨🇰|🇨🇱|🇨🇲|🇨🇳|🇨🇴|🇨🇵|🇨🇷|🇨🇺|🇨🇻|🇨🇼|🇨🇽|🇨🇾|🇨🇿|🇩🇪|🇩🇬|🇩🇯|🇩🇰|🇩🇲|🇩🇴|🇩🇿|🇪🇦|🇪🇨|🇪🇪|🇪🇬|🇪🇭|🇪🇷|🇪🇸|🇪🇹|🇪🇺|🇫🇮|🇫🇯|🇫🇰|🇫🇲|🇫🇴|🇫🇷|🇬🇦|🇬🇧|🇬🇩|🇬🇪|🇬🇫|🇬🇬|🇬🇭|🇬🇮|🇬🇱|🇬🇲|🇬🇳|🇬🇵|🇬🇶|🇬🇷|🇬🇸|🇬🇹|🇬🇺|🇬🇼|🇬🇾|🇭🇰|🇭🇲|🇭🇳|🇭🇷|🇭🇹|🇭🇺|🇮🇨|🇮🇩|🇮🇪|🇮🇱|🇮🇲|🇮🇳|🇮🇴|🇮🇶|🇮🇷|🇮🇸|🇮🇹|🇯🇪|🇯🇲|🇯🇴|🇯🇵|🇰🇪|🇰🇬|🇰🇭|🇰🇮|🇰🇲|🇰🇳|🇰🇵|🇰🇷|🇰🇼|🇰🇾|🇰🇿|🇱🇦|🇱🇧|🇱🇨|🇱🇮|🇱🇰|🇱🇷|🇱🇸|🇱🇹|🇱🇺|🇱🇻|🇱🇾|🇲🇦|🇲🇨|🇲🇩|🇲🇪|🇲🇫|🇲🇬|🇲🇭|🇲🇰|🇲🇱|🇲🇲|🇲🇳|🇲🇴|🇲🇵|🇲🇶|🇲🇷|🇲🇸|🇲🇹|🇲🇺|🇲🇻|🇲🇼|🇲🇽|🇲🇾|🇲🇿|🇳🇦|🇳🇨|🇳🇪|🇳🇫|🇳🇬|🇳🇮|🇳🇱|🇳🇴|🇳🇵|🇳🇷|🇳🇺|🇳🇿|🇴🇲|🇵🇦|🇵🇪|🇵🇫|🇵🇬|🇵🇭|🇵🇰|🇵🇱|🇵🇲|🇵🇳|🇵🇷|🇵🇸|🇵🇹|🇵🇼|🇵🇾|🇶🇦|🇷🇪|🇷🇴|🇷🇸|🇷🇺|🇷🇼|🇸🇦|🇸🇧|🇸🇨|🇸🇩|🇸🇪|🇸🇬|🇸🇭|🇸🇮|🇸🇯|🇸🇰|🇸🇱|🇸🇲|🇸🇳|🇸🇴|🇸🇷|🇸🇸|🇸🇹|🇸🇻|🇸🇽|🇸🇾|🇸🇿|🇹🇦|🇹🇨|🇹🇩|🇹🇫|🇹🇬|🇹🇭|🇹🇯|🇹🇰|🇹🇱|🇹🇲|🇹🇳|🇹🇴|🇹🇷|🇹🇹|🇹🇻|🇹🇼|🇹🇿|🇺🇦|🇺🇬|🇺🇲|🇺🇳|🇺🇸|🇺🇾|🇺🇿|🇻🇦|🇻🇨|🇻🇪|🇻🇬|🇻🇮|🇻🇳|🇻🇺|🇼🇫|🇼🇸|🇽🇰|🇾🇪|🇾🇹|🇿🇦|🇿🇲|🇿🇼|👋🏻|👋🏼|👋🏽|👋🏾|👋🏿|🤚🏻|🤚🏼|🤚🏽|🤚🏾|🤚🏿|🖐🏻|🖐🏼|🖐🏽|🖐🏾|🖐🏿|🖖🏻|🖖🏼|🖖🏽|🖖🏾|🖖🏿|👌🏻|👌🏼|👌🏽|👌🏾|👌🏿|🤌🏻|🤌🏼|🤌🏽|🤌🏾|🤌🏿|🤏🏻|🤏🏼|🤏🏽|🤏🏾|🤏🏿|🤞🏻|🤞🏼|🤞🏽|🤞🏾|🤞🏿|🤟🏻|🤟🏼|🤟🏽|🤟🏾|🤟🏿|🤘🏻|🤘🏼|🤘🏽|🤘🏾|🤘🏿|🤙🏻|🤙🏼|🤙🏽|🤙🏾|🤙🏿|👈🏻|👈🏼|👈🏽|👈🏾|👈🏿|👉🏻|👉🏼|👉🏽|👉🏾|👉🏿|👆🏻|👆🏼|👆🏽|👆🏾|👆🏿|🖕🏻|🖕🏼|🖕🏽|🖕🏾|🖕🏿|👇🏻|👇🏼|👇🏽|👇🏾|👇🏿|👍🏻|👍🏼|👍🏽|👍🏾|👍🏿|👎🏻|👎🏼|👎🏽|👎🏾|👎🏿|👊🏻|👊🏼|👊🏽|👊🏾|👊🏿|🤛🏻|🤛🏼|🤛🏽|🤛🏾|🤛🏿|🤜🏻|🤜🏼|🤜🏽|🤜🏾|🤜🏿|👏🏻|👏🏼|👏🏽|👏🏾|👏🏿|🙌🏻|🙌🏼|🙌🏽|🙌🏾|🙌🏿|👐🏻|👐🏼|👐🏽|👐🏾|👐🏿|🤲🏻|🤲🏼|🤲🏽|🤲🏾|🤲🏿|🙏🏻|🙏🏼|🙏🏽|🙏🏾|🙏🏿|💅🏻|💅🏼|💅🏽|💅🏾|💅🏿|🤳🏻|🤳🏼|🤳🏽|🤳🏾|🤳🏿|💪🏻|💪🏼|💪🏽|💪🏾|💪🏿|🦵🏻|🦵🏼|🦵🏽|🦵🏾|🦵🏿|🦶🏻|🦶🏼|🦶🏽|🦶🏾|🦶🏿|👂🏻|👂🏼|👂🏽|👂🏾|👂🏿|🦻🏻|🦻🏼|🦻🏽|🦻🏾|🦻🏿|👃🏻|👃🏼|👃🏽|👃🏾|👃🏿|👶🏻|👶🏼|👶🏽|👶🏾|👶🏿|🧒🏻|🧒🏼|🧒🏽|🧒🏾|🧒🏿|👦🏻|👦🏼|👦🏽|👦🏾|👦🏿|👧🏻|👧🏼|👧🏽|👧🏾|👧🏿|🧑🏻|🧑🏼|🧑🏽|🧑🏾|🧑🏿|👱🏻|👱🏼|👱🏽|👱🏾|👱🏿|👨🏻|👨🏼|👨🏽|👨🏾|👨🏿|🧔🏻|🧔🏼|🧔🏽|🧔🏾|🧔🏿|👩🏻|👩🏼|👩🏽|👩🏾|👩🏿|🧓🏻|🧓🏼|🧓🏽|🧓🏾|🧓🏿|👴🏻|👴🏼|👴🏽|👴🏾|👴🏿|👵🏻|👵🏼|👵🏽|👵🏾|👵🏿|🙍🏻|🙍🏼|🙍🏽|🙍🏾|🙍🏿|🙎🏻|🙎🏼|🙎🏽|🙎🏾|🙎🏿|🙅🏻|🙅🏼|🙅🏽|🙅🏾|🙅🏿|🙆🏻|🙆🏼|🙆🏽|🙆🏾|🙆🏿|💁🏻|💁🏼|💁🏽|💁🏾|💁🏿|🙋🏻|🙋🏼|🙋🏽|🙋🏾|🙋🏿|🧏🏻|🧏🏼|🧏🏽|🧏🏾|🧏🏿|🙇🏻|🙇🏼|🙇🏽|🙇🏾|🙇🏿|🤦🏻|🤦🏼|🤦🏽|🤦🏾|🤦🏿|🤷🏻|🤷🏼|🤷🏽|🤷🏾|🤷🏿|👮🏻|👮🏼|👮🏽|👮🏾|👮🏿|🕵🏻|🕵🏼|🕵🏽|🕵🏾|🕵🏿|💂🏻|💂🏼|💂🏽|💂🏾|💂🏿|🥷🏻|🥷🏼|🥷🏽|🥷🏾|🥷🏿|👷🏻|👷🏼|👷🏽|👷🏾|👷🏿|🤴🏻|🤴🏼|🤴🏽|🤴🏾|🤴🏿|👸🏻|👸🏼|👸🏽|👸🏾|👸🏿|👳🏻|👳🏼|👳🏽|👳🏾|👳🏿|👲🏻|👲🏼|👲🏽|👲🏾|👲🏿|🧕🏻|🧕🏼|🧕🏽|🧕🏾|🧕🏿|🤵🏻|🤵🏼|🤵🏽|🤵🏾|🤵🏿|👰🏻|👰🏼|👰🏽|👰🏾|👰🏿|🤰🏻|🤰🏼|🤰🏽|🤰🏾|🤰🏿|🤱🏻|🤱🏼|🤱🏽|🤱🏾|🤱🏿|👼🏻|👼🏼|👼🏽|👼🏾|👼🏿|🎅🏻|🎅🏼|🎅🏽|🎅🏾|🎅🏿|🤶🏻|🤶🏼|🤶🏽|🤶🏾|🤶🏿|🦸🏻|🦸🏼|🦸🏽|🦸🏾|🦸🏿|🦹🏻|🦹🏼|🦹🏽|🦹🏾|🦹🏿|🧙🏻|🧙🏼|🧙🏽|🧙🏾|🧙🏿|🧚🏻|🧚🏼|🧚🏽|🧚🏾|🧚🏿|🧛🏻|🧛🏼|🧛🏽|🧛🏾|🧛🏿|🧜🏻|🧜🏼|🧜🏽|🧜🏾|🧜🏿|🧝🏻|🧝🏼|🧝🏽|🧝🏾|🧝🏿|💆🏻|💆🏼|💆🏽|💆🏾|💆🏿|💇🏻|💇🏼|💇🏽|💇🏾|💇🏿|🚶🏻|🚶🏼|🚶🏽|🚶🏾|🚶🏿|🧍🏻|🧍🏼|🧍🏽|🧍🏾|🧍🏿|🧎🏻|🧎🏼|🧎🏽|🧎🏾|🧎🏿|🏃🏻|🏃🏼|🏃🏽|🏃🏾|🏃🏿|💃🏻|💃🏼|💃🏽|💃🏾|💃🏿|🕺🏻|🕺🏼|🕺🏽|🕺🏾|🕺🏿|🕴🏻|🕴🏼|🕴🏽|🕴🏾|🕴🏿|🧖🏻|🧖🏼|🧖🏽|🧖🏾|🧖🏿|🧗🏻|🧗🏼|🧗🏽|🧗🏾|🧗🏿|🏇🏻|🏇🏼|🏇🏽|🏇🏾|🏇🏿|🏂🏻|🏂🏼|🏂🏽|🏂🏾|🏂🏿|🏌🏻|🏌🏼|🏌🏽|🏌🏾|🏌🏿|🏄🏻|🏄🏼|🏄🏽|🏄🏾|🏄🏿|🚣🏻|🚣🏼|🚣🏽|🚣🏾|🚣🏿|🏊🏻|🏊🏼|🏊🏽|🏊🏾|🏊🏿|🏋🏻|🏋🏼|🏋🏽|🏋🏾|🏋🏿|🚴🏻|🚴🏼|🚴🏽|🚴🏾|🚴🏿|🚵🏻|🚵🏼|🚵🏽|🚵🏾|🚵🏿|🤸🏻|🤸🏼|🤸🏽|🤸🏾|🤸🏿|🤽🏻|🤽🏼|🤽🏽|🤽🏾|🤽🏿|🤾🏻|🤾🏼|🤾🏽|🤾🏾|🤾🏿|🤹🏻|🤹🏼|🤹🏽|🤹🏾|🤹🏿|🧘🏻|🧘🏼|🧘🏽|🧘🏾|🧘🏿|🛀🏻|🛀🏼|🛀🏽|🛀🏾|🛀🏿|🛌🏻|🛌🏼|🛌🏽|🛌🏾|🛌🏿|👭🏻|👭🏼|👭🏽|👭🏾|👭🏿|👫🏻|👫🏼|👫🏽|👫🏾|👫🏿|👬🏻|👬🏼|👬🏽|👬🏾|👬🏿|💏🏻|💏🏼|💏🏽|💏🏾|💏🏿|💑🏻|💑🏼|💑🏽|💑🏾|💑🏿|#️⃣|0️⃣|1️⃣|2️⃣|3️⃣|4️⃣|5️⃣|6️⃣|7️⃣|8️⃣|9️⃣|✋🏻|✋🏼|✋🏽|✋🏾|✋🏿|✌🏻|✌🏼|✌🏽|✌🏾|✌🏿|☝🏻|☝🏼|☝🏽|☝🏾|☝🏿|✊🏻|✊🏼|✊🏽|✊🏾|✊🏿|✍🏻|✍🏼|✍🏽|✍🏾|✍🏿|⛹🏻|⛹🏼|⛹🏽|⛹🏾|⛹🏿|😀|😃|😄|😁|😆|😅|🤣|😂|🙂|🙃|😉|😊|😇|🥰|😍|🤩|😘|😗|😚|😙|🥲|😋|😛|😜|🤪|😝|🤑|🤗|🤭|🤫|🤔|🤐|🤨|😐|😑|😶|😏|😒|🙄|😬|🤥|😌|😔|😪|🤤|😴|😷|🤒|🤕|🤢|🤮|🤧|🥵|🥶|🥴|😵|🤯|🤠|🥳|🥸|😎|🤓|🧐|😕|😟|🙁|😮|😯|😲|😳|🥺|😦|😧|😨|😰|😥|😢|😭|😱|😖|😣|😞|😓|😩|😫|🥱|😤|😡|😠|🤬|😈|👿|💀|💩|🤡|👹|👺|👻|👽|👾|🤖|😺|😸|😹|😻|😼|😽|🙀|😿|😾|🙈|🙉|🙊|💋|💌|💘|💝|💖|💗|💓|💞|💕|💟|💔|🧡|💛|💚|💙|💜|🤎|🖤|🤍|💯|💢|💥|💫|💦|💨|🕳|💣|💬|🗨|🗯|💭|💤|👋|🤚|🖐|🖖|👌|🤌|🤏|🤞|🤟|🤘|🤙|👈|👉|👆|🖕|👇|👍|👎|👊|🤛|🤜|👏|🙌|👐|🤲|🤝|🙏|💅|🤳|💪|🦾|🦿|🦵|🦶|👂|🦻|👃|🧠|🫀|🫁|🦷|🦴|👀|👁|👅|👄|👶|🧒|👦|👧|🧑|👱|👨|🧔|👩|🧓|👴|👵|🙍|🙎|🙅|🙆|💁|🙋|🧏|🙇|🤦|🤷|👮|🕵|💂|🥷|👷|🤴|👸|👳|👲|🧕|🤵|👰|🤰|🤱|👼|🎅|🤶|🦸|🦹|🧙|🧚|🧛|🧜|🧝|🧞|🧟|💆|💇|🚶|🧍|🧎|🏃|💃|🕺|🕴|👯|🧖|🧗|🤺|🏇|🏂|🏌|🏄|🚣|🏊|🏋|🚴|🚵|🤸|🤼|🤽|🤾|🤹|🧘|🛀|🛌|👭|👫|👬|💏|💑|👪|🗣|👤|👥|🫂|👣|🦰|🦱|🦳|🦲|🐵|🐒|🦍|🦧|🐶|🐕|🦮|🐩|🐺|🦊|🦝|🐱|🐈|🦁|🐯|🐅|🐆|🐴|🐎|🦄|🦓|🦌|🦬|🐮|🐂|🐃|🐄|🐷|🐖|🐗|🐽|🐏|🐑|🐐|🐪|🐫|🦙|🦒|🐘|🦣|🦏|🦛|🐭|🐁|🐀|🐹|🐰|🐇|🐿|🦫|🦔|🦇|🐻|🐨|🐼|🦥|🦦|🦨|🦘|🦡|🐾|🦃|🐔|🐓|🐣|🐤|🐥|🐦|🐧|🕊|🦅|🦆|🦢|🦉|🦤|🪶|🦩|🦚|🦜|🐸|🐊|🐢|🦎|🐍|🐲|🐉|🦕|🦖|🐳|🐋|🐬|🦭|🐟|🐠|🐡|🦈|🐙|🐚|🐌|🦋|🐛|🐜|🐝|🪲|🐞|🦗|🪳|🕷|🕸|🦂|🦟|🪰|🪱|🦠|💐|🌸|💮|🏵|🌹|🥀|🌺|🌻|🌼|🌷|🌱|🪴|🌲|🌳|🌴|🌵|🌾|🌿|🍀|🍁|🍂|🍃|🍇|🍈|🍉|🍊|🍋|🍌|🍍|🥭|🍎|🍏|🍐|🍑|🍒|🍓|🫐|🥝|🍅|🫒|🥥|🥑|🍆|🥔|🥕|🌽|🌶|🫑|🥒|🥬|🥦|🧄|🧅|🍄|🥜|🌰|🍞|🥐|🥖|🫓|🥨|🥯|🥞|🧇|🧀|🍖|🍗|🥩|🥓|🍔|🍟|🍕|🌭|🥪|🌮|🌯|🫔|🥙|🧆|🥚|🍳|🥘|🍲|🫕|🥣|🥗|🍿|🧈|🧂|🥫|🍱|🍘|🍙|🍚|🍛|🍜|🍝|🍠|🍢|🍣|🍤|🍥|🥮|🍡|🥟|🥠|🥡|🦀|🦞|🦐|🦑|🦪|🍦|🍧|🍨|🍩|🍪|🎂|🍰|🧁|🥧|🍫|🍬|🍭|🍮|🍯|🍼|🥛|🫖|🍵|🍶|🍾|🍷|🍸|🍹|🍺|🍻|🥂|🥃|🥤|🧋|🧃|🧉|🧊|🥢|🍽|🍴|🥄|🔪|🏺|🌍|🌎|🌏|🌐|🗺|🗾|🧭|🏔|🌋|🗻|🏕|🏖|🏜|🏝|🏞|🏟|🏛|🏗|🧱|🪨|🪵|🛖|🏘|🏚|🏠|🏡|🏢|🏣|🏤|🏥|🏦|🏨|🏩|🏪|🏫|🏬|🏭|🏯|🏰|💒|🗼|🗽|🕌|🛕|🕍|🕋|🌁|🌃|🏙|🌄|🌅|🌆|🌇|🌉|🎠|🎡|🎢|💈|🎪|🚂|🚃|🚄|🚅|🚆|🚇|🚈|🚉|🚊|🚝|🚞|🚋|🚌|🚍|🚎|🚐|🚑|🚒|🚓|🚔|🚕|🚖|🚗|🚘|🚙|🛻|🚚|🚛|🚜|🏎|🏍|🛵|🦽|🦼|🛺|🚲|🛴|🛹|🛼|🚏|🛣|🛤|🛢|🚨|🚥|🚦|🛑|🚧|🛶|🚤|🛳|🛥|🚢|🛩|🛫|🛬|🪂|💺|🚁|🚟|🚠|🚡|🛰|🚀|🛸|🛎|🧳|🕰|🕛|🕧|🕐|🕜|🕑|🕝|🕒|🕞|🕓|🕟|🕔|🕠|🕕|🕡|🕖|🕢|🕗|🕣|🕘|🕤|🕙|🕥|🕚|🕦|🌑|🌒|🌓|🌔|🌕|🌖|🌗|🌘|🌙|🌚|🌛|🌜|🌡|🌝|🌞|🪐|🌟|🌠|🌌|🌤|🌥|🌦|🌧|🌨|🌩|🌪|🌫|🌬|🌀|🌈|🌂|🔥|💧|🌊|🎃|🎄|🎆|🎇|🧨|🎈|🎉|🎊|🎋|🎍|🎎|🎏|🎐|🎑|🧧|🎀|🎁|🎗|🎟|🎫|🎖|🏆|🏅|🥇|🥈|🥉|🥎|🏀|🏐|🏈|🏉|🎾|🥏|🎳|🏏|🏑|🏒|🥍|🏓|🏸|🥊|🥋|🥅|🎣|🤿|🎽|🎿|🛷|🥌|🎯|🪀|🪁|🎱|🔮|🪄|🧿|🎮|🕹|🎰|🎲|🧩|🧸|🪅|🪆|🃏|🀄|🎴|🎭|🖼|🎨|🧵|🪡|🧶|🪢|👓|🕶|🥽|🥼|🦺|👔|👕|👖|🧣|🧤|🧥|🧦|👗|👘|🥻|🩱|🩲|🩳|👙|👚|👛|👜|👝|🛍|🎒|🩴|👞|👟|🥾|🥿|👠|👡|🩰|👢|👑|👒|🎩|🎓|🧢|🪖|📿|💄|💍|💎|🔇|🔈|🔉|🔊|📢|📣|📯|🔔|🔕|🎼|🎵|🎶|🎙|🎚|🎛|🎤|🎧|📻|🎷|🪗|🎸|🎹|🎺|🎻|🪕|🥁|🪘|📱|📲|📞|📟|📠|🔋|🔌|💻|🖥|🖨|🖱|🖲|💽|💾|💿|📀|🧮|🎥|🎞|📽|🎬|📺|📷|📸|📹|📼|🔍|🔎|🕯|💡|🔦|🏮|🪔|📔|📕|📖|📗|📘|📙|📚|📓|📒|📃|📜|📄|📰|🗞|📑|🔖|🏷|💰|🪙|💴|💵|💶|💷|💸|💳|🧾|💹|📧|📨|📩|📤|📥|📦|📫|📪|📬|📭|📮|🗳|🖋|🖊|🖌|🖍|📝|💼|📁|📂|🗂|📅|📆|🗒|🗓|📇|📈|📉|📊|📋|📌|📍|📎|🖇|📏|📐|🗃|🗄|🗑|🔒|🔓|🔏|🔐|🔑|🗝|🔨|🪓|🛠|🗡|🔫|🪃|🏹|🛡|🪚|🔧|🪛|🔩|🗜|🦯|🔗|🪝|🧰|🧲|🪜|🧪|🧫|🧬|🔬|🔭|📡|💉|🩸|💊|🩹|🩺|🚪|🛗|🪞|🪟|🛏|🛋|🪑|🚽|🪠|🚿|🛁|🪤|🪒|🧴|🧷|🧹|🧺|🧻|🪣|🧼|🪥|🧽|🧯|🛒|🚬|🪦|🗿|🪧|🏧|🚮|🚰|🚹|🚺|🚻|🚼|🚾|🛂|🛃|🛄|🛅|🚸|🚫|🚳|🚭|🚯|🚱|🚷|📵|🔞|🔃|🔄|🔙|🔚|🔛|🔜|🔝|🛐|🕉|🕎|🔯|🔀|🔁|🔂|🔼|🔽|🎦|🔅|🔆|📶|📳|📴|💱|💲|🔱|📛|🔰|🔟|🔠|🔡|🔢|🔣|🔤|🅰|🆎|🅱|🆑|🆒|🆓|🆔|🆕|🆖|🅾|🆗|🅿|🆘|🆙|🆚|🈁|🈂|🈷|🈶|🈯|🉐|🈹|🈚|🈲|🉑|🈸|🈴|🈳|🈺|🈵|🔴|🟠|🟡|🟢|🔵|🟣|🟤|🟥|🟧|🟨|🟩|🟦|🟪|🟫|🔶|🔷|🔸|🔹|🔺|🔻|💠|🔘|🔳|🔲|🏁|🚩|🎌|🏴|🏳|🏻|🏼|🏽|🏾|🏿|☺|☹|☠|❣|❤|✋|✌|☝|✊|✍|⛷|⛹|☘|☕|⛰|⛪|⛩|⛲|⛺|♨|⛽|⚓|⛵|⛴|✈|⌛|⏳|⌚|⏰|⏱|⏲|☀|⭐|☁|⛅|⛈|☂|☔|⛱|⚡|❄|☃|⛄|☄|✨|⚽|⚾|⛳|⛸|♠|♥|♦|♣|♟|⛑|☎|⌨|✉|✏|✒|✂|⛏|⚒|⚔|⚙|⚖|⛓|⚗|⚰|⚱|♿|⚠|⛔|☢|☣|⬆|↗|➡|↘|⬇|↙|⬅|↖|↕|↔|↩|↪|⤴|⤵|⚛|✡|☸|☯|✝|☦|☪|☮|♈|♉|♊|♋|♌|♍|♎|♏|♐|♑|♒|♓|⛎|▶|⏩|⏭|⏯|◀|⏪|⏮|⏫|⏬|⏸|⏹|⏺|⏏|♀|♂|⚧|✖|➕|➖|➗|♾|‼|⁉|❓|❔|❕|❗|〰|⚕|♻|⚜|⭕|✅|☑|✔|❌|❎|➰|➿|〽|✳|✴|❇|©|®|™|ℹ|Ⓜ|㊗|㊙|⚫|⚪|⬛|⬜|◼|◻|◾|◽|▪|▫)`
/*compile the pattern string into a regex*/
let emoRegex = new RegExp(emojiPattern, "g");
/*count of emojis*/
let emoCount = [..."👶🏻👦🏻👧🏻👨🏻👩🏻👱🏻♀️👱🏻👴🏻👵🏻👲🏻👳🏻♀️👳🏻👮🏻♀️👮🏻👷🏻♀️👷🏻💂🏻♀️💂🏻🕵🏻♀️👩🏻⚕️👨🏻⚕️👩🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾".matchAll(emoRegex)].length
console.log(emoCount) //37
/*modifying the pattern to count other characters too*/
let generalCounter = new RegExp(emojiPattern+"|.", "g") //emoji or regular character
let allCount = [..."$%^ other stuff equalling 28👶👦🏻👧🏻👨🏻👩🏻👱🏻♀️👱🏻👴🏻👵🏻👲🏻👳🏻♀️👳🏻👮🏻♀️👮🏻👷🏻♀️👷🏻💂🏻♀️💂🏻🕵🏻♀️👩🏻⚕️👨🏻⚕️👩🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾👨🏻🌾".matchAll(generalCounter)].length
console.log(allCount) //28+37 = 65
As you can see from the below example, this is to do with unicode encoding,
There's some great resources such as the one I took this example from.
https://blog.jonnew.com/posts/poo-dot-length-equals-two
console.log("👩❤️💋👩".length === 11);
For anyone interested, I had a similar problem where I wanted to count the length of an emoji at the end of a string.
This is the solution I came up with:
var emoji = new RegExp('(\\p{Extended_Pictographic})((\u200D\\p{Extended_Pictographic})*)$', 'u');
var testStrings = ['👨👩👧', '😂', '🌲'];
for(var string = 0; string < testStrings.length; string++){
var match = testStrings[string].match(emoji);
var chars = match == null ? 0 : match[0].length;
console.log(testStrings[string] + ': ' + chars);
}
Explanation: \\p{Extended_Pictographic} matches an emoji like 😂, consisting of two characters. Emojis like 👨👩👧 consists of 4 emojis (👨, 👩, 👧,👦) combined by a zero width joiner (\u200D).
The regex matches any emoji at the end ($). If there is a match the length is counted. I am sure it could be adopted for your use-case by matching all emojis in a given string and then subtracting the surplus. It's not a complete implementation for your particular question but I hope this gets you on the right track.
use lodash toArray method
console.log(_.toArray("👨👩👧").length); // 1
console.log(_.toArray("👨👩👧🧍♂️👩👧👧").length); // 3
Check here for Codesandbox
I suggest using the runes package to accomplish correct multi-byte string conversions cause else you will get more issues if using reducers and more to reverse strings for example.
Take a look at this great small package: runes
For example, const words = 'a̋b̋';, the words.length is 4. But we are expecting 2 for "real" length.
Or, is there any safe way to go through all the characters from this above words?
There's nothing built into JavaScript that will help you differentiate those combining marks from other characters. You could build something, of course, using the reference information from http://unicode.org. :-)
...but at least one person seems to have already done so for you: https://github.com/orling/grapheme-splitter
Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation of the Unicode UAX-29 standard.
const words = 'a̋b̋';
const splitter = new GraphemeSplitter();
const graphemes = splitter.splitGraphemes(words);
console.log(graphemes);
That results in two entries in graphemes, "a̋" and "b̋". (Can't do live example, live links to github raw pages are disallowed.)
I am fiddling with a program to convert Japanese addresses into romaji (latin alphabet) for use in an emergency broadcast system for foreigners living in a Japanese city.
Emergency evacuation warnings are sent out to lists of areas all at once. I would like to be able to copy/paste this Japanese list of areas and spit out the romanized equivalent.
example Japanese input:
3条4~12丁目、15~18条12丁目、2、3条5丁目
(this list is of three areas, where 条(jo) and 丁目(chome) indicate block numbers in north-south and east-west directions, respectively)
The numbers are fine as they are, and I have already written code to replace the characters 条 and 丁目 with their romanized equivalents. My program currently outputs the first two areas (correctly) as "3-jo 4~12-chome" and "15~18-jo 12-chome"
However, I would like to replace patterns like that in the last area "2、5条6丁目" (meaning blocks 2 and 5 of 6-chome) such that the output reads "2&5-jo 6-chome"
The regular expression that denotes this pattern is \d*、\d* (note the Japanese format comma)
I am still getting used to regex - how can I replace the comma found in all \d*、\d* patterns with an "&"? Note that I can't simply replace all commas because they are also used to separate areas.
The easiest way is to isolate sequences like 15、18 and replace all commas in them.
text = "3条4~12丁目、15~18条12丁目、2、3条5丁目";
text.
replace(/(?:\d+、)+\d+/g, function(match) {
return match.replace(/、/g, "&");
}).
replace(/条/g, '-jō ').
replace(/丁目/g, '-chōme').
replace(/~/g, '-').
replace(/、/g, ', ')
// => "3-jō 4-12-chōme, 15-18-jō 12-chōme, 2&3-jō 5-chōme"
(Also... Where the heck do you live that has 丁 well-ordered by cardinal directions? Where I live, addresses are a mess... :P )
(Also also, thanks to sainaen for nitpicking my regexps into perfection :) )
I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?
Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}
You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$
You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.
I want to make a JavaScript regular expression that checks for valid names.
minimum 2 chars (space can't count)
space en some special chars allowed (éàëä...)
I know how to write some seperatly but not combined.
If I use /^([A-Za-z éàë]{2,40})$/, the user could input 2 spaces as a name
If I use /^([A-Za-z]{2,40}[ éàë]{0,40})$/, the user must use 2 letters first and after using space or special char, can't use letters again.
Searched around a bit, but hard to formulate search string for my problem. Any ideas?
Please, please pretty please, don't do this. You will only end up upsetting people by telling them their name is not valid. Several examples of surnames that would be rejected by your scheme: O'Neill, Sørensen, Юдович, 李. Trying to cover all these cases and more is doomed to failure.
Just do something like this:
strip leading and trailing blanks
collapse consecutive blanks into one space
check if the result is not empty
In JavaScript, that would look like:
name = name.replace(/^\s+/, "").replace(/\s+$/, "").replace(/\s+/, " ");
if (name == "") {
// show error
} else {
// valid: maybe put trimmed name back into form
}
Most solutions don't consider the many different names there might be. There can be names with only two character like Al or Bo or someone that writes his name like F. Middlename Lastname.
This RegExp will validate most names but you can optimize it to whatever you want:
/^[a-z\u00C0-\u02AB'´`]+\.?\s([a-z\u00C0-\u02AB'´`]+\.?\s?)+$/i
This will allow:
Li Huang Wu
Cevahir Özgür
Yiğit Aydın
Finlay Þunor Boivin
Josué Mikko Norris
Tatiana Zlata Zdravkov
Ariadna Eliisabet O'Taidhg
sergej lisette rijnders
BRIANA NORMINA HAUPT
BihOtZ AmON PavLOv
Eoghan Murdo Stanek
Filimena J. Van Der Veen
D. Blair Wallace
But will not allow:
Shirley24
66Bryant Hunt88
http://stackoverflow.com
laoise_ibtihaj
hippolyte#example.com
Cy4n 4ur0r4 Blyth3 3ll1
Justisne
Danny
If the name needs to be capitalized, uppercase, lowercase, trimmed or single spaced, that's a task a formatter should do, not the user.
I would like to propose a RegEx that would match all latin based languages with their special characters:
/\A([ áàíóúéëöüñÄĞİŞȘØøğışÐÝÞðýþA-Za-z-']*)\z/
P.S. I've included all characters I could find, but please feel free to edit the answer in case I've missed any.
Why not
var reg= /^([A-Za-z]{2}[ éàëA-Za-z]*)$/;
2 letters, then as many spaces, letters or special characters as you want.
I wouldn't allow spaces in usernames though - it's begging for trouble when you have usernames like
ab ba
who's going to remember how many spaces they used?
You could do this:
/^([A-Za-zéàë]{2,40} ?)+$/
2-40 characters, and then optionally a space, repeated at least once. This will allow a space at the end, but you could trim it off separately.
After 'trim' the input value, The following will math your request only for Latin surnames.
rn = new RegExp("([\w\u00C0-\u02AB']+ ?)+","gi");
m = ln.match(rn);
valid = (m && m.length)? true: false;
Note that I am using '+', instead of '{2,}', that is because some surnames uses just one letter in a separated word like "Ortega y Gasset"
You can see I am not using RegExp.test, this is because that method don't work properly (I don't know why, but it has a high fail-rate, you may see it here:.
In my country, people from non-latin-language countries usually do some translation of their names so the previous RegExp would be enough. However, if you attempt to match any surname in the world, you may add more range of \u#### characters, avoiding to include symbols, numbers or other type. Or perhaps the xregexp library may help you.
And, please, do not forget to test the input in server side, and escaping it before using it in the sql sentences (if you have them)