regex get by multiple separator? - javascript

I want separate the sentence
hey ! there you are
to
["hey!","there you are"]
in js. now I found that
(?<=\!)
keep separator with before element.
but what if I want to use the rule to the "!!" or "!!!"?
so my goal is change to separate sentence from
hey! there!!! you are!!!!
to
["hey!","there!!!", "you are!!!!"]
but is it possible?
I tried to \(?<=!+) or \(?<=\+!) but fail.
I don't know even it possible to get !, !!..n by once

In addition to the elegant solution by The fourth bird, this specific requirement can be met by simply splitting on the regex, (?<=!)\s+|$ which can be explained as "One or more whitespace characters, or end of line, preceded by a !".
const regex = /(?<=!)\s+|$/;
[
"hey! there!!! you are!!!!",
"hey ! there you are"
].forEach(s =>
console.log(
s.split(regex)
.map(s => s.replace(/\s+!/, "!").trim())
)
);

Based on your needs my solution was to first get the exclamations, then get the strings (split by exclamations).
This method creates two arrays, one of exclamations and one of the strings.
Then I just loop over them, concatenate them, and push into a new array.
As a starting point it should be enough, you can always modify and built on top of this.
const str = 'hey! there!!! you are!!!!';
const exclamations = str.match(/!+/g);
const characters = str.split(/!+/).filter(s => s.trim());
let newArr = [];
for (i = 0; i < characters.length; i++) {
newArr.push(characters[i].trim() + exclamations[i]);
}
console.log(newArr); // ["hey!","there!!!","you are!!!!"]

You could use split with 2 lookarounds, asserting ! to the left and not ! to the right. If you want to remove the leading whitespace chars before the exclamation mark you could do some sanitizing:
const regex = /(?<=!)(?!!)/g;
[
"hey! there!!! you are!!!!",
"hey ! there you are"
].forEach(s =>
console.log(
s.split(regex)
.map(s => s.replace(/\s+!/, "!").trim())
)
);
Another option could be to match the parts instead of splitting:
[^\s!].*?(?:!(?!!)|$)
See a regex demo.
const regex = /[^\s!].*?(?:!(?!!)|$)/g;
[
"hey! there!!! you are!!!!",
"hey ! there you are"
].forEach(s =>
console.log(s.match(regex))
);

Here is yet another solution using a .split() and .reduce(). This does not use a lookbehind, for those concerned about Safari and other browsers not supporting it:
[
'hey ! there you are',
'hey! there!!! you are!!!!'
].forEach(str => {
let result = str
.split(/( *!+ *)/) // split and keep split pattern because of parenthesis
.filter(Boolean) // filter out empty items
.reduce((acc, val, idx) => {
if(idx % 2) {
// !+ split pattern => combine with previous array item
acc[acc.length - 1] += val.trim();
} else {
acc.push(val);
}
return acc;
}, []);
console.log(str, ' => ', result);
});
Output:
hey ! there you are => [
"hey!",
"there you are"
]
hey! there!!! you are!!!! => [
"hey!",
"there!!!",
"you are!!!!"
]

Related

Is there a way to remove a newline character within a string in an array?

I am trying to parse an array using Javascript given a string that's hyphenated.
- foo
- bar
I have gotten very close to figuring it out. I have trimmed it down to where I get the two items using this code.
const chunks = input.split(/\ ?\-\ ?/);
chunks = chunks.slice(1);
This would trim the previous input down to this.
["foo\n", "bar"]
I've tried many solutions to get the newline character out of the string regardless of the number of items in the array, but nothing worked out. It would be greatly appreciated if someone could help me solve this issue.
You could for example split, remove all the empty entries, and then trim each item to also remove all the leading and trailing whitespace characters including the newlines.
Note that you don't have to escape the space and the hyphen.
const input = `- foo
- bar`;
const chunks = input.split(/ ?- ?/)
.filter(Boolean)
.map(s => s.trim());
console.log(chunks);
Or the same approach removing only the newlines:
const input = `- foo
- bar`;
const chunks = input.split(/ ?- ?/)
.filter(Boolean)
.map(s => s.replace(/\r?\n|\r/g, ''));
console.log(chunks);
Instead of split, you might also use a match with a capture group:
^ ?- ?(.*)
The pattern matches:
^ Start of string
?- ? Match - between optional spaces
(.*) Capture group 1, match the rest of the line
const input = `- foo
- bar`;
const chunks = Array.from(input.matchAll(/^ ?- ?(.*)/gm), m => m[1]);
console.log(chunks);
You could loop over like so and remove the newline chars.
const data = ["foo\n", "bar"]
const res = data.map(str => str.replaceAll('\n', ''))
console.log(res)
Instead of trimming after the split. Split wisely and then map to replace unwanted string. No need to loop multiple times.
const str = ` - foo
- bar`;
let chunks = str.split("\n").map(s => s.replace(/^\W+/, ""));
console.log(chunks)
let chunks2 = str.split("\n").map(s => s.split(" ")[2]);
console.log(chunks2)
You could use regex match with:
Match prefix "- " but exclude from capture (?<=- ) and any number of character different of "\n" [^\n]*.
const str = `
- foo
- bar
`
console.log(str.match(/(?<=- )[^\n]*/g))
chunks.map((data) => {
data = data.replace(/(\r\n|\n|\r|\\n|\\r)/gm, "");
return data;
})
const str = ` - foo
- bar`;
const result = str.replace(/([\r\n|\n|\r])/gm, "")
console.log(result)
That should remove all kinds of line break in a string and after that you can perform other actions to get the expected result like.
const str = ` - foo
- bar`;
const result = str.replace(/([\r\n|\n|\r|^\s+])/gm, "")
console.log(result)
const actualResult = result.split('-')
actualResult.splice(0,1)
console.log(actualResult)

Combining a filter and a map

Is it possible to combine a map and a filter in a single javascript expression? For example, I am currently doing the following to trim whitespace and remove empty results:
const s = "123 hiu 234234"
console.log(s
.split(/\d+/g)
.filter((item, i) => item.trim())
.map((item, i) => item.trim())
);
Is there a more compact way to do that? And, as a follow up question, is the /g necessary when doing split or does that automatically split every occurrence?
that...
const s = "123 hiu 234234"
console.log( s.match(/[a-z]+/ig) )
If a single word comes between the numbers, I think a single regular expression would be enough here - just match non-whitespace, non-digits:
const s = "123 hiu 234234"
console.log(s
.match(/[^\d ]+/g)
);
One way to do it would also be to define a function that takes two arguments that you can call which does the task. For example:
const s = "123 hiu 234234"
const Trim = (item, i) => item.trim();
console.log(s.split(/\d+/).filter(Trim).map(Trim));
Or you could put the burden on the regex itself, for example only matching letters:
s="123 hiu 234234"
console.log(s.match(/[a-zA-Z]+/g));
// /[a-z]+/gi alternately

How to split a camel case string containing numbers

I have camel cased strings like this:
"numberOf40"
"numberOf40hc"
How can I split it like this?
["number", "Of", "40"]
["number", "Of", "40hc"]
I am using humps to decamelize keys so I can only pass a split regex as option. I am looking for an answer that only uses split function.
My best attempts:
const a = "numberOf40hc"
a.split(/(?=[A-Z0-9])/)
// ["number", "Of", "4", "0hc"]
a.split(/(?=[A-Z])|[^0-9](?=[0-9])/)
// ["number", "O", "40hc"]
Also I don't understand why the f is omitted in my last attempt.
You don't get the f in you last attempt (?=[A-Z])|[^0-9](?=[0-9]) as this part of the last pattern [^0-9] matches a single char other than a digit and will split on that char.
You could also match instead of split
const regex = /[A-Z]?[a-z]+|\d+[a-z]*/g;
[
"numberOf40",
"numberOf40hc"
].forEach(s => console.log(Array.from(s.matchAll(regex), m => m[0])));
Using split only, you could use lookarounds with a lookbehind which is gaining more browser support.
const regex = /(?=[A-Z])|(?<=[a-z])(?=\d)/g;
[
"numberOf40",
"numberOf40hc"
].forEach(s => console.log(s.split(regex)));
const a = "numberOf40hc"
let b = a.split(/(\d+[a-z]*|[A-Z][a-z]*)/).filter(a => a);
console.log(b);
The .filter(a => a) is necessary because split, returns both the left and right side of a matched delimter. E.g. 'a.'.split('.') returns both the left (i.e. 'a') and right (i.e. '') side of '.'.
Per your update regarding the need for compatibility with humps, it seems humps supports customizing the handler:
const humps = require('humps');
let myObj = {numberOf40hc: 'value'};
let decamelizedObj = humps.decamelizeKeys(myObj, key =>
key.split(/(\d+[a-z]*|[A-Z][a-z]*)/).filter(a => a).join('_'));
console.log(decamelizedObj);
Try this:
const splitStr = (str='') =>
str.split(/([A-Z][a-z]+)/).filter(e => e);
console.log( splitStr("numberOf40") );
console.log( splitStr("numberOf40hc") );

Could a regular expression be used to find text between pairs of delimiters

I need to parse an email template for custom variables that occur between pairs of dollar signs, e.g:
$foo$bar$baz$foo$bar$baz$wtf
So I would want to start by extracting 'foo' above, since it comes between the first pair (1st and 2nd) of dollar signs. And then skip 'bar' but extract 'baz' as it comes between the next pair (3rd and 4th) of dollar signs.
I was able to accomplish this with split and filter as below, but am wondering, if there's a way to accomplish the same with a regular expression instead? I presume some sort of formal parser, recursive or otherwise, could be used, but that would seem like overkill in my opinion
const body = "$foo$bar$baz$foo$bar$baz$wtf";
let delimitedSegments = body.split('$');
if (delimitedSegments.length % 2 === 0) {
// discard last segment when length is even since it won't be followed by the delimiter
delimitedSegments.pop();
}
const alternatingDelimitedValues = delimitedSegments.filter((segment, index) => {
return index % 2;
});
console.log(alternatingDelimitedValues);
OUTPUT: [ 'foo', 'baz', 'bar' ]
Code also at: https://repl.it/#dexygen/findTextBetweenDollarSignDelimiterPairs
Just match the delimiter twice in the regexp
const body = "$foo$bar$baz$foo$bar$baz$wtf";
const result = body.match(/\$[^$]*\$/g).map(s => s.replace(/\$/g, ''));
console.log(result);
You could use this regex /\$\w+\$/g to get the expected output'
let regex = /\$\w+\$/g;
let str = '$foo$bar$baz$foo$bar$baz$wtf';
let result = str.match(regex).map( item => item.replace(/\$/g, ''));
console.log(result);
You can use capturing group in the regex.
const str1 = '$foo$bar$baz$foo$bar$baz$wtf';
const regex1 = /\$(\w+)\$/g;
const str2 = '*foo*bar*baz*foo*bar*baz*wtf';
const regex2 = /\*(\w+)\*/g;
const find = (str, regex) =>
new Array(str.match(regex).length)
.fill(null)
.map(m => regex.exec(str)[1]);
console.log('delimiters($)', JSON.stringify(find(str1, regex1)));
console.log('delimiters(*)', JSON.stringify(find(str2, regex2)));

Get first letter of each word in a string, in JavaScript

How would you go around to collect the first letter of each word in a string, as in to receive an abbreviation?
Input: "Java Script Object Notation"
Output: "JSON"
I think what you're looking for is the acronym of a supplied string.
var str = "Java Script Object Notation";
var matches = str.match(/\b(\w)/g); // ['J','S','O','N']
var acronym = matches.join(''); // JSON
console.log(acronym)
Note: this will fail for hyphenated/apostrophe'd words Help-me I'm Dieing will be HmImD. If that's not what you want, the split on space, grab first letter approach might be what you want.
Here's a quick example of that:
let str = "Java Script Object Notation";
let acronym = str.split(/\s/).reduce((response,word)=> response+=word.slice(0,1),'')
console.log(acronym);
I think you can do this with
'Aa Bb'.match(/\b\w/g).join('')
Explanation: Obtain all /g the alphanumeric characters \w that occur after a non-alphanumeric character (i.e: after a word boundary \b), put them on an array with .match() and join everything in a single string .join('')
Depending on what you want to do you can also consider simply selecting all the uppercase characters:
'JavaScript Object Notation'.match(/[A-Z]/g).join('')
Easiest way without regex
var abbr = "Java Script Object Notation".split(' ').map(function(item){return item[0]}).join('');
This is made very simple with ES6
string.split(' ').map(i => i.charAt(0)) //Inherit case of each letter
string.split(' ').map(i => i.charAt(0)).toUpperCase() //Uppercase each letter
string.split(' ').map(i => i.charAt(0)).toLowerCase() //lowercase each letter
This ONLY works with spaces or whatever is defined in the .split(' ') method
ie, .split(', ') .split('; '), etc.
string.split(' ') .map(i => i.charAt(0)) .toString() .toUpperCase().split(',')
To add to the great examples, you could do it like this in ES6
const x = "Java Script Object Notation".split(' ').map(x => x[0]).join('');
console.log(x); // JSON
and this works too but please ignore it, I went a bit nuts here :-)
const [j,s,o,n] = "Java Script Object Notation".split(' ').map(x => x[0]);
console.log(`${j}${s}${o}${n}`);
#BotNet flaw:
i think i solved it after excruciating 3 days of regular expressions tutorials:
==> I'm a an animal
(used to catch m of I'm) because of the word boundary, it seems to work for me that way.
/(\s|^)([a-z])/gi
Try -
var text = '';
var arr = "Java Script Object Notation".split(' ');
for(i=0;i<arr.length;i++) {
text += arr[i].substr(0,1)
}
alert(text);
Demo - http://jsfiddle.net/r2maQ/
Using map (from functional programming)
'use strict';
function acronym(words)
{
if (!words) { return ''; }
var first_letter = function(x){ if (x) { return x[0]; } else { return ''; }};
return words.split(' ').map(first_letter).join('');
}
Alternative 1:
you can also use this regex to return an array of the first letter of every word
/(?<=(\s|^))[a-z]/gi
(?<=(\s|^)) is called positive lookbehind which make sure the element in our search pattern is preceded by (\s|^).
so, for your case:
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.match(/(?<=(\s|^))[a-z]/gi)
.join('')
.toUpperCase();
};
toAbbr("java script object notation"); //result JSON
(by the way, there are also negative lookbehind, positive lookahead, negative lookahead, if you want to learn more)
Alternative 2:
match all the words and use replace() method to replace them with the first letter of each word and ignore the space (the method will not mutate your original string)
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.replace(/(\S+)(\s*)/gi, (match, p1, p2) => p1[0].toUpperCase());
};
toAbbr("java script object notation"); //result JSON
// word = not space = \S+ = p1 (p1 is the first pattern)
// space = \s* = p2 (p2 is the second pattern)
It's important to trim the word before splitting it, otherwise, we'd lose some letters.
const getWordInitials = (word: string): string => {
const bits = word.trim().split(' ');
return bits
.map((bit) => bit.charAt(0))
.join('')
.toUpperCase();
};
$ getWordInitials("Java Script Object Notation")
$ "JSON"
How about this:
var str = "", abbr = "";
str = "Java Script Object Notation";
str = str.split(' ');
for (i = 0; i < str.length; i++) {
abbr += str[i].substr(0,1);
}
alert(abbr);
Working Example.
If you came here looking for how to do this that supports non-BMP characters that use surrogate pairs:
initials = str.split(' ')
.map(s => String.fromCodePoint(s.codePointAt(0) || '').toUpperCase())
.join('');
Works in all modern browsers with no polyfills (not IE though)
Getting first letter of any Unicode word in JavaScript is now easy with the ECMAScript 2018 standard:
/(?<!\p{L}\p{M}*)\p{L}/gu
This regex finds any Unicode letter (see the last \p{L}) that is not preceded with any other letter that can optionally have diacritic symbols (see the (?<!\p{L}\p{M}*) negative lookbehind where \p{M}* matches 0 or more diacritic chars). Note that u flag is compulsory here for the Unicode property classes (like \p{L}) to work correctly.
To emulate a fully Unicode-aware \b, you'd need to add a digit matching pattern and connector punctuation:
/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu
It works in Chrome, Firefox (since June 30, 2020), Node.js, and the majority of other environments (see the compatibility matrix here), for any natural language including Arabic.
Quick test:
const regex = /(?<!\p{L}\p{M}*)\p{L}/gu;
const string = "Żerard Łyżwiński";
// Extracting
console.log(string.match(regex)); // => [ "Ż", "Ł" ]
// Extracting and concatenating into string
console.log(string.match(regex).join("")) // => ŻŁ
// Removing
console.log(string.replace(regex, "")) // => erard yżwiński
// Enclosing (wrapping) with a tag
console.log(string.replace(regex, "<span>$&</span>")) // => <span>Ż</span>erard <span>Ł</span>yżwiński
console.log("_Łukasz 1Żukowski".match(/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu)); // => null
In ES6:
function getFirstCharacters(str) {
let result = [];
str.split(' ').map(word => word.charAt(0) != '' ? result.push(word.charAt(0)) : '');
return result;
}
const str1 = "Hello4 World65 123 !!";
const str2 = "123and 456 and 78-1";
const str3 = " Hello World !!";
console.log(getFirstCharacters(str1));
console.log(getFirstCharacters(str2));
console.log(getFirstCharacters(str3));
Output:
[ 'H', 'W', '1', '!' ]
[ '1', '4', 'a', '7' ]
[ 'H', 'W', '!' ]
This should do it.
var s = "Java Script Object Notation",
a = s.split(' '),
l = a.length,
i = 0,
n = "";
for (; i < l; ++i)
{
n += a[i].charAt(0);
}
console.log(n);
The regular expression versions for JavaScript is not compatible with Unicode on older than ECMAScript 6, so for those who want to support characters such as "å" will need to rely on non-regex versions of scripts.
Event when on version 6, you need to indicate Unicode with \u.
More details: https://mathiasbynens.be/notes/es6-unicode-regex
Yet another option using reduce function:
var value = "Java Script Object Notation";
var result = value.split(' ').reduce(function(previous, current){
return {v : previous.v + current[0]};
},{v:""});
$("#output").text(result.v);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre id="output"/>
This is similar to others, but (IMHO) a tad easier to read:
const getAcronym = title =>
title.split(' ')
.map(word => word[0])
.join('');
ES6 reduce way:
const initials = inputStr.split(' ').reduce((result, currentWord) =>
result + currentWord.charAt(0).toUpperCase(), '');
alert(initials);
Try This Function
const createUserName = function (name) {
const username = name
.toLowerCase()
.split(' ')
.map((elem) => elem[0])
.join('');
return username;
};
console.log(createUserName('Anisul Haque Bhuiyan'));

Categories

Resources