Split string by all spaces except those in parentheses - javascript

I'm trying to split text the following like on spaces:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}"
but I want it to ignore the spaces within parentheses. This should produce an array with:
var words = ["Text", "(what is)|what's", "a", "story|fable" "called|named|about", "{Search}|{Title}"];
I know this should involve some sort of regex with line.match(). Bonus points if the regex removes the parentheses. I know that word.replace() would get rid of them in a subsequent step.

Use the following approach with specific regex pattern(based on negative lookahead assertion):
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}",
words = line.split(/(?!\(.*)\s(?![^(]*?\))/g);
console.log(words);
(?!\(.*) ensures that a separator \s is not preceded by brace ((including attendant characters)
(?![^(]*?\)) ensures that a separator \s is not followed by brace )(including attendant characters)

Not a single regexp but does the job. Removes the parentheses and splits the text by spaces.
var words = line.replace(/[\(\)]/g,'').split(" ");

One approach which is useful in some cases is to replace spaces inside parens with a placeholder, then split, then unreplace:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}";
var result = line.replace(/\((.*?)\)/g, m => m.replace(' ', 'SPACE'))
.split(' ')
.map(x => x.replace(/SPACE/g, ' '));
console.log(result);

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Regex works only with a single words not separated by whitespaces

I have this regex that looks for a digit and a character in a word with a minimum length of 4 :
^(?=.*\d)(?=.*[a-zA-Z])[a-zA-Z0-9]{4,}$
it works for :
ABCD1
but if i have multiple words like :
ABCD1 ABCD2
it stop working because the whitespace break the regex :/
How can i improve my regex to allow to capture all the words separated by spaces ?
Demo : https://regex101.com/r/S3APfJ/1
You could use match() on the input to find all matches:
var input = "1234 ABCD1 ABCD2 ABCDE";
var matches = input.match(/\b(?=\S*\d)(?=\S*[a-zA-Z])[a-zA-Z0-9]{4,}\b/g);
console.log(matches);
You can use
text.split(/\s+/).filter(x => /YOUR_VALIDATION_REGEX/.test(x))
NOTE:
.split(/\s+/) - splits the string with whitespace
.filter(x => /^(?=.*\d)(?=.*[a-zA-Z])[a-zA-Z0-9]{4,}$/.test(x) - fetches the item if it matches your initial regex.
See a JavaScript demo:
const text = "this is a phrase with ABC1 and ABCD2 but no ABC1!";
const rx = /^(?=.*\d)(?=.*[a-zA-Z])[a-zA-Z0-9]{4,}$/;
console.log(text.split(/\s+/).filter(x => rx.test(x)));
The /^(?=.*\d)(?=.*[a-zA-Z])[a-zA-Z0-9]{4,}$/ performance can be improved if you use
/^(?=\D*\d)(?=[^a-zA-Z]*[a-zA-Z])[a-zA-Z0-9]{4,}$/

jQuery autocomplete RegExp for highlight words [duplicate]

I have this function that finds whole words and should replace them. It identifies spaces but should not replace them, ie, not capture them.
function asd (sentence, word) {
str = sentence.replace(new RegExp('(?:^|\\s)' + word + '(?:$|\\s)'), "*****");
return str;
};
Then I have the following strings:
var sentence = "ich mag Äpfel";
var word = "Äpfel";
The result should be something like:
"ich mag *****"
and NOT:
"ich mag*****"
I'm getting the latter.
How can I make it so that it identifies the space but ignores it when replacing the word?
At first this may seem like a duplicate but I did not find an answer to this question, that's why I'm asking it.
Thank you
You should put back the matched whitespaces by using a capturing group (rather than a non-capturing one) with a replacement backreference in the replacement pattern, and you may also leverage a lookahead for the right whitespace boundary, which is handy in case of consecutive matches:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word + '(?=$|\\s)'), "$1*****");
return str;
};
var sentence = "ich mag Äpfel";
var word = "Äpfel";
console.log(asd(sentence, word));
See the regex demo.
Details
(^|\s) - Group 1 (later referred to with the help of a $1 placeholder in the replacement pattern): a capturing group that matches either start of string or a whitespace
Äpfel - a search word
(?=$|\s) - a positive lookahead that requires the end of string or whitespace immediately to the right of the current location.
NOTE: If the word can contain special regex metacharacters, escape them:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '(?=$|\\s)'), "$1*****");
return str;
};

How to slice optional arguments in RegEx?

Actually i have the following RegExp expression:
/^(?:(?:\,([A-Za-z]{5}))?)+$/g
So the accepted input should be something like ,IGORA but even ,IGORA,GIANC,LOLLI is valid and i would be able to slice the string to 3 group in this case, in other the group number should be equals to the user input that pass the RegExp test.
i was trying to do something like this in JavaScript but it return only the last value
var str = ',GIANC,IGORA';
var arr = str.match(/^(?:(?:\,([A-Za-z]{5}))?)+$/).slice(1);
alert(arr);
So the output is 'IGORA' while i would it to be 'GIANC' 'IGORA'
Here is another example
/^([A-Z]{5})(?:(?:\,([A-Za-z]{2}))?)+$/g
test of regexp may have at least 5 chart string but it also can have other 5 chart string separated with a comma so from input
IGORA,CIAOA,POPOP
I would have an array of ["IGORA","CIAOA","POPOP"]
You can capture the words in a capturing surrounded by an optional preceding comma or an optional trailing comma.
You can test the regex here: ,?([A-Za-z]+),?
const pattern = /,?([A-Za-z]+),?/gm;
const str = `,IGORA,GIANC,LOLLI`;
let matches = [];
let match;
// Iterate until no match found
while ((m = pattern.exec(str))) {
// The first captured group is the match
matches.push(m[1]);
}
console.log(matches);
There are other ways to do this, but I found that one of the simple ways is by using the replace method, as it can replace all instances that match that regex.
For example:
var regex = /^(?:(?:\,([A-Za-z]{5}))?)+$/g;
var str = ',GIANC,IGORA';
var arr = [];
str.replace(regex, function(match) {
arr[arr.length] = match;
return match;
});
console.log(arr);
Also, in my code snippet you can see that there is an extra coma in each string, you can solve that by changing line 5 to arr[arr.length] = match.replace(/^,/, '').
Is this what you're looking for?
Explanation:
\b word boundary (starting or ending a word)
\w a word ([A-z])
{5} 5 characters of previous
So it matches all 5-character words but not NANANANA
var str = 'IGORA,CIAOA,POPOP,NANANANA';
var arr = str.match(/\b\w{5}\b/g);
console.log(arr); //['IGORA', 'CIAOA', 'POPOP']
If you only wish to select words separated by commas and nothing else, you can test for them like so:
(?<=,\s*|^) preceded by , with any number of trailing space, OR is the first word in list.
(?=,\s*|$) followed by , and any number of trailing spaces OR is last word in list.
In the following code, POPOP and MOMMA are rejected because they are not separated by a comma, and NANANANA fails because it is not 5 character.
var str = 'IGORA, CIAOA, POPOP MOMMA, NANANANA, MEOWI';
var arr = str.match(/(?<=,\s*|^)\b\w{5}\b(?=,\s*|$)/g);
console.log(arr); //['IGORA', 'CIAOA', 'MEOWI']
If you can't have any trailing spaces after the comma, just leave out the \s* from both (?<=,\s*|^) and (?=,\s*|$).

Finding ++ in Regular Expression

I want to find ++ or -- or // or ** sign in in string can anyone help me?
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = ++,--,//,**;
var result = str.match(patt1);
if (result)
{
alert("you cant do this :l");
document.getElementById('screen').innerHTML='';
}
This finds doubles of the characters by a backreference:
/([+\/*-])\1/g
[from q. comments]: i know this but when i type var patt1 = /[++]/i; code find + and ++
[++] means one arbitrary of the characters. Normally + is the qantifier "1 or more" and needs to be escaped by a leading backslash when it should be a literal, except in brackets where it does not have any special meaning.
Characters that do need to be escaped in character classes are e.g. the escape character itself (backslash), the expression delimimiter (slash), the closing bracket and the range operator (dash/minus), the latter except at the end of the character class as in my code example.
A character class [] matches one character. A quantifier, e.g. [abc]{2} would match "aa", "bb", but "ab" as well.
You can use a backreference to a match in parentheses:
/(abc)\1
Here the \1 refers to the first parentheses (abc). The entire expression would match "abcabc".
To clarify again: We could use a quantifier on the backreference:
/([+\/*-])\1{9}/g
This matches exactly 10 equal characters out of the class, the subpattern itself and 9 backreferences more.
/.../g finds all occurrences due to the modifier global (g).
test-case on regextester.com
Define your pattern like this:
var patt1 = /\+\+|--|\/\/|\*\*/;
Now it should do what you want.
More info about regular expressions: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
You can use:
/\+\+|--|\/\/|\*\*/
as your expression.
Here I have escaped the special characters by using a backslash before each (\).
I've also used .test(str) on the regular expression as all you need is a boolean (true/false) result.
See working example below:
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = /\+\+|--|\/\/|\*\*/;
var result = patt1.test(res);
if (result) {
alert("you cant do this :l");
document.getElementById('screen').innerHTML = '';
}
<div id="screen">
This is some++ text
</div>
Try this:-
As
n+:- Matches any string that contains at least one n
n* Matches any string that contains zero or more occurrences of n
We need to use backslash before this special characters.
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = /\+\+|--|\/\/|\*\*/;
var result = str.match(patt1);
if (result)
{
alert("you cant do this :l");
document.getElementById('screen').innerHTML='';
}
<div id="screen">2121++</div>

Categories

Resources