Get Initials of full names with accented characters through REGEX - javascript

I want to get the initials of a full name even if the name has accents or dots or comma.
If I have the name:
"Raúl, Moreno. Rodríguez Carlos"
I get "RLMRGC".
my code is:
user.displayName.match(/\b[a-zA-Z]/gm).join('').toUpperCase()
I want to get "RMRC". Thanks in advance.

My guess is that this expression might work:
const regex = /[^A-Z]/gm;
const str = `Raúl, Moreno. Rodríguez Carlos`;
const subst = ``;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);

Try this (with REGEX):
const data = "Raúl, Moreno. Rodríguez Carlos";
const result = data.match(/\b[A-Z]/gm);
console.log(result);
other solution without REGEX:
const data = "Ędward Ącki";
const result = [...data].filter((c, k, arr) => c !== ' ' && (k == 0 || arr[k-1] == ' ' ))
console.log(result);

A fully Unicode compatible solution should match any letter after a char other than letter or digit.
Here are two solutions: 1) an XRegExp based solution for any browser, and 2) an ECMAScript 2018 only JS environment compatible solution.
var regex = XRegExp("(?:^|[^\\pL\\pN])(\\pL)");
console.log( XRegExp.match("Łukasz Żak", regex, "all").map(function(x) {return x.charAt(x.length - 1);}).join('').toUpperCase() );
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
ECMAScript 2018 compliant solution:
// ONLY WORKING IN ECMAScript2018 COMPLIANT JS ENVIRONMENT!
var regex = /(?<![\p{N}\p{L}])\p{L}/gu;
console.log( "Łukasz Żak".match(regex).join('').toUpperCase() );
// => ŁŻ
NOTE:
(?:^|[^\pL\\pN])(\pL) matches start of a string and any char but letter and digit and then matches any letter (since the char matched by the first non-capturing group is not necessary, .map(function(x) {return x.charAt(x.length - 1);}) is required to get the last char of the match)
(?<![\p{N}\p{L}])\p{L} matches any letter (\p{L}) that is not preceded with a digit or letter (see the negative lookbehind (?<![\p{N}\p{L}]))

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Capture only letter followed by letter, excluding some words - Regex

I need to capture a letter in a string followed by a letter, excluding some specific words. I have the following string in Latex:
22+2p+p^{pp^{2p+pp}}+\delta+\pi+sqrt(2p)+\\frac{2}+{2p}+ppp+2P+\sqrt+xx+\to+p2+\pi+px+ab+\alpha
I want to add * between the letters, but I don't want the following words to apply:
\frac
\delta
\pi
\sqrt
\alpha
The output should be as follows:
22+2p+p^{p*p^{2p+p*p}}+\delta+\pi+\sqrt(2p)+\\frac{2}+{2p}+p*p*p+2P+\sqrt(9)+x*x+\to+p2+\pi+p*x+a*b+\alpha
The letters are dynamic entries, which can be any of the alphabet. I thought about using "positive lookbehind" but its support is limited.
You can achieve the result you want with a string replace with callback, using a regex:
(delta|frac|pi|sqrt|alpha|to)|([a-z](?=[a-z]))
that matches one of the excluded words in group 1 or a letter that is followed by another letter in group 2. In the callback, if group 1 is present, that is returned otherwise group 2 is returned followed by a *:
let str = '22+2p+p^{pp^{2p+pp}}+\\delta+\\pi+\\sqrt(2p)+\\\\frac{2}+{2p}+ppp+2P+\\sqrt(9)+xx+\\to+p2+\\pi+px+ab+\\alpha';
const replacer = (m, p1, p2) => {
return p1 ? p1 : (p2 + '*');
}
console.log(str.replace(/(delta|frac|pi|sqrt|alpha|to)|([a-z](?=[a-z]))/gi, replacer));
You can do something like this:
const str = "22+2p+p^{pp^{2p+pp}}+\\delta+\\pi+\\sqrt(2p)+\\\\frac{2}+{2p}+ppp+2P+\\sqrt+xx+\\to+p2+\\pi+px+ab+\\alpha";
const result = str.replace(/\\?[a-zA-Z]{2,}/g, (v) => {
if (v.startsWith('\\')) {
return v;
}
return v.split("").join("*");
});
console.log(result);
What this does is to match all 2 or more consecutive letters that are preceded by a \ or not and in the replace function, if the matched group is not starting with \, the replacement is set to the letters group split and joined by *.
You could use negative lookbehind to solve this.
const regex = /(?<!\\{1,})(\b[a-zA-Z]{2,}\b)/g;
const str = `22+2p+p^{pp^{2p+pp}}+\\delta+\\pi+\\sqrt(2p)+\\\\frac{2}+{2p}+ppp+2P+\\sqrt+xx+\\to+p2+\\pi+px+ab+\\alpha`;
let m;
let result = str.replace(regex, function(match) {
return match.split("").join("*");
});
console.log("Match: ",str.match(regex).toString());
console.log(result);

How to append a string to another string after every N char?

I am trying to create a program that adds "gav" after every second letter, when the string is written.
var string1 = "word"
Expected output:
wogavrdgav
You can use the modulus operator for this -
var string1 = "word";
function addGav(str){
var newStr = '';
var strArr = str.split('');
strArr.forEach(function(letter, index){
index % 2 == 1
? newStr += letter + 'gav'
: newStr += letter
})
return newStr;
}
console.log(addGav(string1)); // gives wogavrdgav
console.log(addGav('gavgrif')) //gives gagavvggavrigavf....
RegEx
Here, we can add a quantifier to . (which matches all chars except for new lines) and design an expression with one capturing group ($1):
(.{2})
Demo
JavaScript Demo
const regex = /(.{2})/gm;
const str = `AAAAAA`;
const subst = `$1bbb`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
RegEx Circuit
You can also visualize your expressions in jex.im:
If you wish to consider new lines as a char, then this expression would do that:
([\s\S]{2})
RegEx Demo
JavaScript Demo
const regex = /([\s\S]{2})/gm;
const str = `ABCDEF
GHIJK
LMNOP
QRSTU
VWXYZ
`;
const subst = `$1bbb`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
Try this:
const string1 = 'word'
console.log('Input:', string1)
const newStr = string1.replace(/(?<=(^(.{2})+))/g, 'gav')
console.log('Output:', newStr)
.{2}: 2 any character
(.{2})+: match 2 4 6 8 any character
^(.{2})+: match 2 4 6 8 any character from start, if don't have ^, this regex will match from any position
?<=(regex_group): match something after regex_group
g: match all
This way is finding 2,4,6, etc character from the start of the string and don't match this group so it will match '' before 2,4,6, etc character and replace with 'gav'
Example with word:
match wo, word and ignore it, match something before that('') and replace with 'gav' with method replace

JS : Remove all strings which are starting with specific character

I have an array contains names. Some of them starting with a dot (.), and some of them have dot in the middle or elsewhere. I need to remove all names only starting with dot. I seek help for a better way in JavaScript.
var myarr = 'ad, ghost, hg, .hi, jk, find.jpg, dam.ark, haji, jive.pdf, .find, home, .war, .milk, raj, .ker';
var refinedArr = ??
You can use the filter function and you can access the first letter of every word using item[0]. You do need to split the string first.
var myarr = 'ad, ghost, hg, .hi, jk, find.jpg, dam.ark, haji, jive.pdf, .find, home, .war, .milk, raj, .ker'.split(", ");
var refinedArr = myarr.filter(function(item) {
return item[0] != "."
});
console.log(refinedArr)
Use filter and startsWith:
let myarr = ['ad', 'ghost', 'hg', '.hi', 'jk'];
let res = myarr.filter(e => ! e.startsWith('.'));
console.log(res);
You can use the RegEx \B\.\w+,? ? and replace with an empty String.
\B matches a non word char
\. matches a dot
\w+ matches one or more word char
,? matches 0 or 1 ,
[space]? matches 0 or 1 [space]
Demo:
const regex = /\B\.\w+,? ?/g;
const str = `ad, ghost, hg, .hi, jk, find.jpg, dam.ark, haji, jive.pdf, .find, home, .war, .milk, raj, .ker`;
const subst = ``;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);

Get first letter of each word in a string, in JavaScript

How would you go around to collect the first letter of each word in a string, as in to receive an abbreviation?
Input: "Java Script Object Notation"
Output: "JSON"
I think what you're looking for is the acronym of a supplied string.
var str = "Java Script Object Notation";
var matches = str.match(/\b(\w)/g); // ['J','S','O','N']
var acronym = matches.join(''); // JSON
console.log(acronym)
Note: this will fail for hyphenated/apostrophe'd words Help-me I'm Dieing will be HmImD. If that's not what you want, the split on space, grab first letter approach might be what you want.
Here's a quick example of that:
let str = "Java Script Object Notation";
let acronym = str.split(/\s/).reduce((response,word)=> response+=word.slice(0,1),'')
console.log(acronym);
I think you can do this with
'Aa Bb'.match(/\b\w/g).join('')
Explanation: Obtain all /g the alphanumeric characters \w that occur after a non-alphanumeric character (i.e: after a word boundary \b), put them on an array with .match() and join everything in a single string .join('')
Depending on what you want to do you can also consider simply selecting all the uppercase characters:
'JavaScript Object Notation'.match(/[A-Z]/g).join('')
Easiest way without regex
var abbr = "Java Script Object Notation".split(' ').map(function(item){return item[0]}).join('');
This is made very simple with ES6
string.split(' ').map(i => i.charAt(0)) //Inherit case of each letter
string.split(' ').map(i => i.charAt(0)).toUpperCase() //Uppercase each letter
string.split(' ').map(i => i.charAt(0)).toLowerCase() //lowercase each letter
This ONLY works with spaces or whatever is defined in the .split(' ') method
ie, .split(', ') .split('; '), etc.
string.split(' ') .map(i => i.charAt(0)) .toString() .toUpperCase().split(',')
To add to the great examples, you could do it like this in ES6
const x = "Java Script Object Notation".split(' ').map(x => x[0]).join('');
console.log(x); // JSON
and this works too but please ignore it, I went a bit nuts here :-)
const [j,s,o,n] = "Java Script Object Notation".split(' ').map(x => x[0]);
console.log(`${j}${s}${o}${n}`);
#BotNet flaw:
i think i solved it after excruciating 3 days of regular expressions tutorials:
==> I'm a an animal
(used to catch m of I'm) because of the word boundary, it seems to work for me that way.
/(\s|^)([a-z])/gi
Try -
var text = '';
var arr = "Java Script Object Notation".split(' ');
for(i=0;i<arr.length;i++) {
text += arr[i].substr(0,1)
}
alert(text);
Demo - http://jsfiddle.net/r2maQ/
Using map (from functional programming)
'use strict';
function acronym(words)
{
if (!words) { return ''; }
var first_letter = function(x){ if (x) { return x[0]; } else { return ''; }};
return words.split(' ').map(first_letter).join('');
}
Alternative 1:
you can also use this regex to return an array of the first letter of every word
/(?<=(\s|^))[a-z]/gi
(?<=(\s|^)) is called positive lookbehind which make sure the element in our search pattern is preceded by (\s|^).
so, for your case:
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.match(/(?<=(\s|^))[a-z]/gi)
.join('')
.toUpperCase();
};
toAbbr("java script object notation"); //result JSON
(by the way, there are also negative lookbehind, positive lookahead, negative lookahead, if you want to learn more)
Alternative 2:
match all the words and use replace() method to replace them with the first letter of each word and ignore the space (the method will not mutate your original string)
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.replace(/(\S+)(\s*)/gi, (match, p1, p2) => p1[0].toUpperCase());
};
toAbbr("java script object notation"); //result JSON
// word = not space = \S+ = p1 (p1 is the first pattern)
// space = \s* = p2 (p2 is the second pattern)
It's important to trim the word before splitting it, otherwise, we'd lose some letters.
const getWordInitials = (word: string): string => {
const bits = word.trim().split(' ');
return bits
.map((bit) => bit.charAt(0))
.join('')
.toUpperCase();
};
$ getWordInitials("Java Script Object Notation")
$ "JSON"
How about this:
var str = "", abbr = "";
str = "Java Script Object Notation";
str = str.split(' ');
for (i = 0; i < str.length; i++) {
abbr += str[i].substr(0,1);
}
alert(abbr);
Working Example.
If you came here looking for how to do this that supports non-BMP characters that use surrogate pairs:
initials = str.split(' ')
.map(s => String.fromCodePoint(s.codePointAt(0) || '').toUpperCase())
.join('');
Works in all modern browsers with no polyfills (not IE though)
Getting first letter of any Unicode word in JavaScript is now easy with the ECMAScript 2018 standard:
/(?<!\p{L}\p{M}*)\p{L}/gu
This regex finds any Unicode letter (see the last \p{L}) that is not preceded with any other letter that can optionally have diacritic symbols (see the (?<!\p{L}\p{M}*) negative lookbehind where \p{M}* matches 0 or more diacritic chars). Note that u flag is compulsory here for the Unicode property classes (like \p{L}) to work correctly.
To emulate a fully Unicode-aware \b, you'd need to add a digit matching pattern and connector punctuation:
/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu
It works in Chrome, Firefox (since June 30, 2020), Node.js, and the majority of other environments (see the compatibility matrix here), for any natural language including Arabic.
Quick test:
const regex = /(?<!\p{L}\p{M}*)\p{L}/gu;
const string = "Żerard Łyżwiński";
// Extracting
console.log(string.match(regex)); // => [ "Ż", "Ł" ]
// Extracting and concatenating into string
console.log(string.match(regex).join("")) // => ŻŁ
// Removing
console.log(string.replace(regex, "")) // => erard yżwiński
// Enclosing (wrapping) with a tag
console.log(string.replace(regex, "<span>$&</span>")) // => <span>Ż</span>erard <span>Ł</span>yżwiński
console.log("_Łukasz 1Żukowski".match(/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu)); // => null
In ES6:
function getFirstCharacters(str) {
let result = [];
str.split(' ').map(word => word.charAt(0) != '' ? result.push(word.charAt(0)) : '');
return result;
}
const str1 = "Hello4 World65 123 !!";
const str2 = "123and 456 and 78-1";
const str3 = " Hello World !!";
console.log(getFirstCharacters(str1));
console.log(getFirstCharacters(str2));
console.log(getFirstCharacters(str3));
Output:
[ 'H', 'W', '1', '!' ]
[ '1', '4', 'a', '7' ]
[ 'H', 'W', '!' ]
This should do it.
var s = "Java Script Object Notation",
a = s.split(' '),
l = a.length,
i = 0,
n = "";
for (; i < l; ++i)
{
n += a[i].charAt(0);
}
console.log(n);
The regular expression versions for JavaScript is not compatible with Unicode on older than ECMAScript 6, so for those who want to support characters such as "å" will need to rely on non-regex versions of scripts.
Event when on version 6, you need to indicate Unicode with \u.
More details: https://mathiasbynens.be/notes/es6-unicode-regex
Yet another option using reduce function:
var value = "Java Script Object Notation";
var result = value.split(' ').reduce(function(previous, current){
return {v : previous.v + current[0]};
},{v:""});
$("#output").text(result.v);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre id="output"/>
This is similar to others, but (IMHO) a tad easier to read:
const getAcronym = title =>
title.split(' ')
.map(word => word[0])
.join('');
ES6 reduce way:
const initials = inputStr.split(' ').reduce((result, currentWord) =>
result + currentWord.charAt(0).toUpperCase(), '');
alert(initials);
Try This Function
const createUserName = function (name) {
const username = name
.toLowerCase()
.split(' ')
.map((elem) => elem[0])
.join('');
return username;
};
console.log(createUserName('Anisul Haque Bhuiyan'));

Categories

Resources