How to split() string with brackets in Javascript - javascript

i just want to make
str = "a(bcde(dw)d)e"
to
arr = {"a", "(bcde)", "(dw)", "(d)", "e"}
What regEx can i use in str.split()?
PS: Explanations || helpful links welcome.
Examples:
s: "a(bcdefghijkl(mno)p)q" --> [ 'a', '(bcdefghijkl)', '(mno)', '(p)', 'q' ]
s: "abc(cba)ab(bac)c" --> [ 'abc', '(cba)', 'ab', '(bac)', 'c' ]

Go through each parentheses using a counter:
array = [], c = 0;
'abc(cba)ab(bac)c'.split(/([()])/).filter(Boolean).forEach(e =>
// Increase / decrease counter and push desired values to an array
e == '(' ? c++ : e == ')' ? c-- : c > 0 ? array.push('(' + e + ')') : array.push(e)
);
console.log(array)

Edit
str = "a(bcde(dw)d)e"
// replace any `(alpha(` by `(alpha)(`
str1 = str.replace(/\(([^)]+)\(/g, '($1)(');
// replace any `)alpha)` by )(alpha)`
str2 = str1.replace(/\)([^(]+)\)/g, ')($1)');
// prefix any opening parenthesis with #--# (just a character string unlikly to appear in the original string)
str3 = str2.replace(/\(/g, '#--#(');
// prefix any closing parenthesis with #--#
str4 = str3.replace(/\)/g, ')#--#');
// remove any double `#--#`
str5 = str4.replace(/(#--#)+/g, '#--#');
// split by invented character string
arr = str5.split('#--#');
console.log(arr);
Old wrong answer
str = "a(bcde(dw)d)e"
console.log(str.split(/[()]/));
This looks a little bit weird, but it's like this.
str is string which has a split method. This can take a string or a regular expression as argument. A string will be delimited by " and a RegExp by /.
The brackets [] wrap a character class which means any one of the characters inside. Then inside we have the two parentheses () which are the two characters we are looking for.

I don't think the result you want is possible without modifying the values of the array after the split. But if you want to be able to split the string based on 2 symbols (in this case the brackets '(' and ')') you can do this:
var arr = str.split("(").toString().split(")");
It returns an array with the "words" of the string.
I hope I could help.

Given that the desired output includes characters that aren't in the string, e.g., adding closing or opening parentheses to the substrings in the outer part of the nested parentheses, it will be necessary to make some changes to the individual substrings after they are extracted one way or another.
Maybe something like this:
function getGroups(str) {
var groups = str.match(/(?:^|[()])[^()]+/g)
if (!groups) return []
var parenLevel = 0
return groups.map(function(v) {
if (v[0] === "(") {
parenLevel++
} else if (v[0] === ")") {
parenLevel--
}
v = v.replace(/[()]/,"")
return parenLevel > 0 ? "(" + v + ")" : v
})
}
console.log(JSON.stringify( getGroups("a(bcde(dw)d)e") ))
console.log(JSON.stringify( getGroups("abc(cba)ab(bac)c") ))
console.log(JSON.stringify( getGroups("ab(cd)ef(gh)") ))
console.log(JSON.stringify( getGroups("ab(cd)(e(f(gh)i))") ))
console.log(JSON.stringify( getGroups("(ab(c(d))ef(gh)i)") ))

Related

JavaScript RegExp - find all prefixes up to a certain character

I have a string which is composed of terms separated by slashes ('/'), for example:
ab/c/def
I want to find all the prefixes of this string up to an occurrence of a slash or end of string, i.e. for the above example I expect to get:
ab
ab/c
ab/c/def
I've tried a regex like this: /^(.*)[\/$]/, but it returns a single match - ab/c/ with the parenthesized result ab/c, accordingly.
EDIT :
I know this can be done quite easily using split, I am looking specifically for a solution using RegExp.
NO, you can't do that with a pure regex.
Why? Because you need substrings starting at one and the same location in the string, while regex matches non-overlapping chunks of text and then advances its index to search for another match.
OK, what about capturing groups? They are only helpful if you know how many /-separated chunks you have in the input string. You could then use
var s = 'ab/c/def'; // There are exact 3 parts
console.log(/^(([^\/]+)\/[^\/]+)\/[^\/]+$/.exec(s));
// => [ "ab/c/def", "ab/c", "ab" ]
However, it is unlikely you know that many details about your input string.
You may use the following code rather than a regex:
var s = 'ab/c/def';
var chunks = s.split('/');
var res = [];
for(var i=0;i<chunks.length;i++) {
res.length > 0 ? res.push(chunks.slice(0,i).join('/')+'/'+chunks[i]) : res.push(chunks[i]);
}
console.log(res);
First, you can split the string with /. Then, iterate through the elements and build the res array.
I do not think a regular expression is what you are after. A simple split and loop over the array can give you the result.
var str = "ab/c/def";
var result = str.split("/").reduce(function(a,s,i){
var last = a[i-1] ? a[i-1] + "/" : "";
a.push(last + s);
return a;
}, []);
console.log(result);
or another way
var str = "ab/c/def",
result = [],
parts=str.split("/");
while(parts.length){
console.log(parts);
result.unshift(parts.join("/"));
parts.pop();
}
console.log(result);
Plenty of other ways to do it.
You can't do it with a RegEx in javascript but you can split parts and join them respectively together:
var array = "ab/c/def".split('/'), newArray = [], key = 0;
while (value = array[key++]) {
newArray.push(key == 1 ? value : newArray[newArray.length - 1] + "/" + value)
}
console.log(newArray);
May be like this
var str = "ab/c/def",
result = str.match(/.+?(?=\/|$)/g)
.map((e,i,a) => a[i-1] ? a[i] = a[i-1] + e : e);
console.log(result);
Couldn't you just split the string on the separator character?
var result = 'ab/c/def'.split(/\//g);

Unexpected behavior of regexp in JavaScript

I've encountered this weird behavior:
I'm on a breakpoint (variables don't change). At the console you can see, that each time I try to evaluate regexp methods on the same unchanging variable "text" I get these opposite responses. Is there an explanation for such thing?
The relevant code is here:
this.singleRe = /<\$([\s\S]*?)>/g;
while( this.singleRe.test( text ) ){
match = this.singleRe.exec( text );
result = "";
if( match ){
result = match[ 1 ].indexOf( "." ) != -1 ? eval( "obj." + match[ 1 ] ) : eval( "value." + match[ 1 ] );
}
text = text.replace( this.singleRe , result );
}
When you use regex with exec() and a global flag - g, a cursor is changing each time, like here:
var re = /\w/g;
var s = 'Hello regex world!'
re.exec(s); // => ['H']
re.exec(s); // => ['e']
re.exec(s); // => ['l']
re.exec(s); // => ['l']
re.exec(s); // => ['o']
Note the g flag! This means that regex will match multiple occurencies instead of one!
EDIT
I suggest instead of using regex.exec(string) to use string.match(regex) if possible. This will yield an array of occurences and it is easy to inspect the array or to iterate through it.

Get first letter of each word in a string, in JavaScript

How would you go around to collect the first letter of each word in a string, as in to receive an abbreviation?
Input: "Java Script Object Notation"
Output: "JSON"
I think what you're looking for is the acronym of a supplied string.
var str = "Java Script Object Notation";
var matches = str.match(/\b(\w)/g); // ['J','S','O','N']
var acronym = matches.join(''); // JSON
console.log(acronym)
Note: this will fail for hyphenated/apostrophe'd words Help-me I'm Dieing will be HmImD. If that's not what you want, the split on space, grab first letter approach might be what you want.
Here's a quick example of that:
let str = "Java Script Object Notation";
let acronym = str.split(/\s/).reduce((response,word)=> response+=word.slice(0,1),'')
console.log(acronym);
I think you can do this with
'Aa Bb'.match(/\b\w/g).join('')
Explanation: Obtain all /g the alphanumeric characters \w that occur after a non-alphanumeric character (i.e: after a word boundary \b), put them on an array with .match() and join everything in a single string .join('')
Depending on what you want to do you can also consider simply selecting all the uppercase characters:
'JavaScript Object Notation'.match(/[A-Z]/g).join('')
Easiest way without regex
var abbr = "Java Script Object Notation".split(' ').map(function(item){return item[0]}).join('');
This is made very simple with ES6
string.split(' ').map(i => i.charAt(0)) //Inherit case of each letter
string.split(' ').map(i => i.charAt(0)).toUpperCase() //Uppercase each letter
string.split(' ').map(i => i.charAt(0)).toLowerCase() //lowercase each letter
This ONLY works with spaces or whatever is defined in the .split(' ') method
ie, .split(', ') .split('; '), etc.
string.split(' ') .map(i => i.charAt(0)) .toString() .toUpperCase().split(',')
To add to the great examples, you could do it like this in ES6
const x = "Java Script Object Notation".split(' ').map(x => x[0]).join('');
console.log(x); // JSON
and this works too but please ignore it, I went a bit nuts here :-)
const [j,s,o,n] = "Java Script Object Notation".split(' ').map(x => x[0]);
console.log(`${j}${s}${o}${n}`);
#BotNet flaw:
i think i solved it after excruciating 3 days of regular expressions tutorials:
==> I'm a an animal
(used to catch m of I'm) because of the word boundary, it seems to work for me that way.
/(\s|^)([a-z])/gi
Try -
var text = '';
var arr = "Java Script Object Notation".split(' ');
for(i=0;i<arr.length;i++) {
text += arr[i].substr(0,1)
}
alert(text);
Demo - http://jsfiddle.net/r2maQ/
Using map (from functional programming)
'use strict';
function acronym(words)
{
if (!words) { return ''; }
var first_letter = function(x){ if (x) { return x[0]; } else { return ''; }};
return words.split(' ').map(first_letter).join('');
}
Alternative 1:
you can also use this regex to return an array of the first letter of every word
/(?<=(\s|^))[a-z]/gi
(?<=(\s|^)) is called positive lookbehind which make sure the element in our search pattern is preceded by (\s|^).
so, for your case:
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.match(/(?<=(\s|^))[a-z]/gi)
.join('')
.toUpperCase();
};
toAbbr("java script object notation"); //result JSON
(by the way, there are also negative lookbehind, positive lookahead, negative lookahead, if you want to learn more)
Alternative 2:
match all the words and use replace() method to replace them with the first letter of each word and ignore the space (the method will not mutate your original string)
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.replace(/(\S+)(\s*)/gi, (match, p1, p2) => p1[0].toUpperCase());
};
toAbbr("java script object notation"); //result JSON
// word = not space = \S+ = p1 (p1 is the first pattern)
// space = \s* = p2 (p2 is the second pattern)
It's important to trim the word before splitting it, otherwise, we'd lose some letters.
const getWordInitials = (word: string): string => {
const bits = word.trim().split(' ');
return bits
.map((bit) => bit.charAt(0))
.join('')
.toUpperCase();
};
$ getWordInitials("Java Script Object Notation")
$ "JSON"
How about this:
var str = "", abbr = "";
str = "Java Script Object Notation";
str = str.split(' ');
for (i = 0; i < str.length; i++) {
abbr += str[i].substr(0,1);
}
alert(abbr);
Working Example.
If you came here looking for how to do this that supports non-BMP characters that use surrogate pairs:
initials = str.split(' ')
.map(s => String.fromCodePoint(s.codePointAt(0) || '').toUpperCase())
.join('');
Works in all modern browsers with no polyfills (not IE though)
Getting first letter of any Unicode word in JavaScript is now easy with the ECMAScript 2018 standard:
/(?<!\p{L}\p{M}*)\p{L}/gu
This regex finds any Unicode letter (see the last \p{L}) that is not preceded with any other letter that can optionally have diacritic symbols (see the (?<!\p{L}\p{M}*) negative lookbehind where \p{M}* matches 0 or more diacritic chars). Note that u flag is compulsory here for the Unicode property classes (like \p{L}) to work correctly.
To emulate a fully Unicode-aware \b, you'd need to add a digit matching pattern and connector punctuation:
/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu
It works in Chrome, Firefox (since June 30, 2020), Node.js, and the majority of other environments (see the compatibility matrix here), for any natural language including Arabic.
Quick test:
const regex = /(?<!\p{L}\p{M}*)\p{L}/gu;
const string = "Żerard Łyżwiński";
// Extracting
console.log(string.match(regex)); // => [ "Ż", "Ł" ]
// Extracting and concatenating into string
console.log(string.match(regex).join("")) // => ŻŁ
// Removing
console.log(string.replace(regex, "")) // => erard yżwiński
// Enclosing (wrapping) with a tag
console.log(string.replace(regex, "<span>$&</span>")) // => <span>Ż</span>erard <span>Ł</span>yżwiński
console.log("_Łukasz 1Żukowski".match(/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu)); // => null
In ES6:
function getFirstCharacters(str) {
let result = [];
str.split(' ').map(word => word.charAt(0) != '' ? result.push(word.charAt(0)) : '');
return result;
}
const str1 = "Hello4 World65 123 !!";
const str2 = "123and 456 and 78-1";
const str3 = " Hello World !!";
console.log(getFirstCharacters(str1));
console.log(getFirstCharacters(str2));
console.log(getFirstCharacters(str3));
Output:
[ 'H', 'W', '1', '!' ]
[ '1', '4', 'a', '7' ]
[ 'H', 'W', '!' ]
This should do it.
var s = "Java Script Object Notation",
a = s.split(' '),
l = a.length,
i = 0,
n = "";
for (; i < l; ++i)
{
n += a[i].charAt(0);
}
console.log(n);
The regular expression versions for JavaScript is not compatible with Unicode on older than ECMAScript 6, so for those who want to support characters such as "å" will need to rely on non-regex versions of scripts.
Event when on version 6, you need to indicate Unicode with \u.
More details: https://mathiasbynens.be/notes/es6-unicode-regex
Yet another option using reduce function:
var value = "Java Script Object Notation";
var result = value.split(' ').reduce(function(previous, current){
return {v : previous.v + current[0]};
},{v:""});
$("#output").text(result.v);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre id="output"/>
This is similar to others, but (IMHO) a tad easier to read:
const getAcronym = title =>
title.split(' ')
.map(word => word[0])
.join('');
ES6 reduce way:
const initials = inputStr.split(' ').reduce((result, currentWord) =>
result + currentWord.charAt(0).toUpperCase(), '');
alert(initials);
Try This Function
const createUserName = function (name) {
const username = name
.toLowerCase()
.split(' ')
.map((elem) => elem[0])
.join('');
return username;
};
console.log(createUserName('Anisul Haque Bhuiyan'));

Remove all dots except the first one from a string

Given a string
'1.2.3.4.5'
I would like to get this output
'1.2345'
(In case there are no dots in the string, the string should be returned unchanged.)
I wrote this
function process( input ) {
var index = input.indexOf( '.' );
if ( index > -1 ) {
input = input.substr( 0, index + 1 ) +
input.slice( index ).replace( /\./g, '' );
}
return input;
}
Live demo: http://jsfiddle.net/EDTNK/1/
It works but I was hoping for a slightly more elegant solution...
There is a pretty short solution (assuming input is your string):
var output = input.split('.');
output = output.shift() + '.' + output.join('');
If input is "1.2.3.4", then output will be equal to "1.234".
See this jsfiddle for a proof. Of course you can enclose it in a function, if you find it necessary.
EDIT:
Taking into account your additional requirement (to not modify the output if there is no dot found), the solution could look like this:
var output = input.split('.');
output = output.shift() + (output.length ? '.' + output.join('') : '');
which will leave eg. "1234" (no dot found) unchanged. See this jsfiddle for updated code.
It would be a lot easier with reg exp if browsers supported look behinds.
One way with a regular expression:
function process( str ) {
return str.replace( /^([^.]*\.)(.*)$/, function ( a, b, c ) {
return b + c.replace( /\./g, '' );
});
}
You can try something like this:
str = str.replace(/\./,"#").replace(/\./g,"").replace(/#/,".");
But you have to be sure that the character # is not used in the string; or replace it accordingly.
Or this, without the above limitation:
str = str.replace(/^(.*?\.)(.*)$/, function($0, $1, $2) {
return $1 + $2.replace(/\./g,"");
});
You could also do something like this, i also don't know if this is "simpler", but it uses just indexOf, replace and substr.
var str = "7.8.9.2.3";
var strBak = str;
var firstDot = str.indexOf(".");
str = str.replace(/\./g,"");
str = str.substr(0,firstDot)+"."+str.substr(1,str.length-1);
document.write(str);
Shai.
Here is another approach:
function process(input) {
var n = 0;
return input.replace(/\./g, function() { return n++ > 0 ? '' : '.'; });
}
But one could say that this is based on side effects and therefore not really elegant.
This isn't necessarily more elegant, but it's another way to skin the cat:
var process = function (input) {
var output = input;
if (typeof input === 'string' && input !== '') {
input = input.split('.');
if (input.length > 1) {
output = [input.shift(), input.join('')].join('.');
}
}
return output;
};
Not sure what is supposed to happen if "." is the first character, I'd check for -1 in indexOf, also if you use substr once might as well use it twice.
if ( index != -1 ) {
input = input.substr( 0, index + 1 ) + input.substr(index + 1).replace( /\./g, '' );
}
var i = s.indexOf(".");
var result = s.substr(0, i+1) + s.substr(i+1).replace(/\./g, "");
Somewhat tricky. Works using the fact that indexOf returns -1 if the item is not found.
Trying to keep this as short and readable as possible, you can do the following:
JavaScript
var match = string.match(/^[^.]*\.|[^.]+/g);
string = match ? match.join('') : string;
Requires a second line of code, because if match() returns null, we'll get an exception trying to call join() on null. (Improvements welcome.)
Objective-J / Cappuccino (superset of JavaScript)
string = [string.match(/^[^.]*\.|[^.]+/g) componentsJoinedByString:''] || string;
Can do it in a single line, because its selectors (such as componentsJoinedByString:) simply return null when sent to a null value, rather than throwing an exception.
As for the regular expression, I'm matching all substrings consisting of either (a) the start of the string + any potential number of non-dot characters + a dot, or (b) any existing number of non-dot characters. When we join all matches back together, we have essentially removed any dot except the first.
var input = '14.1.2';
reversed = input.split("").reverse().join("");
reversed = reversed.replace(\.(?=.*\.), '' );
input = reversed.split("").reverse().join("");
Based on #Tadek's answer above. This function takes other locales into consideration.
For example, some locales will use a comma for the decimal separator and a period for the thousand separator (e.g. -451.161,432e-12).
First we convert anything other than 1) numbers; 2) negative sign; 3) exponent sign into a period ("-451.161.432e-12").
Next we split by period (["-451", "161", "432e-12"]) and pop out the right-most value ("432e-12"), then join with the rest ("-451161.432e-12")
(Note that I'm tossing out the thousand separators, but those could easily be added in the join step (.join(','))
var ensureDecimalSeparatorIsPeriod = function (value) {
var numericString = value.toString();
var splitByDecimal = numericString.replace(/[^\d.e-]/g, '.').split('.');
if (splitByDecimal.length < 2) {
return numericString;
}
var rightOfDecimalPlace = splitByDecimal.pop();
return splitByDecimal.join('') + '.' + rightOfDecimalPlace;
};
let str = "12.1223....1322311..";
let finStr = str.replace(/(\d*.)(.*)/, '$1') + str.replace(/(\d*.)(.*)/, '$2').replace(/\./g,'');
console.log(finStr)
const [integer, ...decimals] = '233.423.3.32.23.244.14...23'.split('.');
const result = [integer, decimals.join('')].join('.')
Same solution offered but using the spread operator.
It's a matter of opinion but I think it improves readability.

Javascript split only once and ignore the rest

I am parsing some key value pairs that are separated by colons. The problem I am having is that in the value section there are colons that I want to ignore but the split function is picking them up anyway.
sample:
Name: my name
description: this string is not escaped: i hate these colons
date: a date
On the individual lines I tried this line.split(/:/, 1) but it only matched the value part of the data. Next I tried line.split(/:/, 2) but that gave me ['description', 'this string is not escaped'] and I need the whole string.
Thanks for the help!
a = line.split(/:/);
key = a.shift();
val = a.join(':');
Use the greedy operator (?) to only split the first instance.
line.split(/: (.+)?/, 2);
If you prefer an alternative to regexp consider this:
var split = line.split(':');
var key = split[0];
var val = split.slice(1).join(":");
Reference: split, slice, join.
Slightly more elegant:
a = line.match(/(.*?):(.*)/);
key = a[1];
val = a[2];
May be this approach will be the best for such purpose:
var a = line.match(/([^:\s]+)\s*:\s*(.*)/);
var key = a[1];
var val = a[2];
So, you can use tabulations in your config/data files of such structure and also not worry about spaces before or after your name-value delimiter ':'.
Or you can use primitive and fast string functions indexOf and substr to reach your goal in, I think, the fastest way (by CPU and RAM)
for ( ... line ... ) {
var delimPos = line.indexOf(':');
if (delimPos <= 0) {
continue; // Something wrong with this "line"
}
var key = line.substr(0, delimPos).trim();
var val = line.substr(delimPos + 1).trim();
// Do all you need with this key: val
}
Split string in two at first occurrence
To split a string with multiple i.e. columns : only at the first column occurrence
use Positive Lookbehind (?<=)
const a = "Description: this: is: nice";
const b = "Name: My Name";
console.log(a.split(/(?<=^[^:]*):/)); // ["Description", " this: is: nice"]
console.log(b.split(/(?<=^[^:]*):/)); // ["Name", " My Name"]
it basically consumes from Start of string ^ everything that is not a column [^:] zero or more times *. Once the positive lookbehind is done, finally matches the column :.
If you additionally want to remove one or more whitespaces following the column,
use /(?<=^[^:]*): */
Explanation on Regex101.com
function splitOnce(str, sep) {
const idx = str.indexOf(sep);
return [str.slice(0, idx), str.slice(idx+1)];
}
splitOnce("description: this string is not escaped: i hate these colons", ":")

Categories

Resources