I'm building something called formula builder. The idea is, the user have to type text of formula in textarea, then we'll be parse the string value. The result is array.
For example, this text will be parsed
LADV-(GCNBIZ+UNIN)+(TNW*-1)
then generate result below
["LADV", "-", "(", "GCNBIZ", "+", "UNIN", ")", "+", "(", "TNW", "*", "-1", ")"]
The point is to split each word joined by one of this character: +, *, /, -, (, and this ); but still include the splitter itself.
I have tried to split using this expression /[-+*/()]/g, but the result doesn't include the splitter character. And also the -1 need to be detected as one expression.
["LADV", "MISC", "", "GCNBIZ", "UNIN", "", "", "TNW", "", "1", ""]
What is the match regex to solve this?
var input = 'LADV-(GCNBIZ+UNIN)+(TNW*-1)';
var match = input.match(/(-?\d+)|([a-z]+)|([-+*()\/])/gmi);
console.log(match);
You can use match instead of split with an alternation regex:
var s = 'LADV-(GCNBIZ+UNIN)+(TNW*-1)';
var m = s.match(/(-\d+(?:\.\d+)?|[-+\/*()]|\w+)/g);
console.log(m);
//=> ["LADV", "-", "(", "GCNBIZ", "+", "UNIN", ")", "+", "(", "TNW", "*", "-1", ")"]
RegEx Breakup:
( # start capture group
- # match a hyphen
\d+(?:\.\d+)? # match a number
| # OR
[-+\/*()] # match one of the symbols
| # OR
\w+ # match 1 or more word characters
) # end capture group
Order of patterns in alternation is important.
Related
I want to match sets of characters that include a letter and non-letter characters. Many of them are a single letter. Or two letters.
const match = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ'.match(/\b(p|p\'|m|f|t|t\'|n|l|k|k\'|h|tɕ|tɕ\'|ɕ|tʂ|tʂ\'|ʂ|ʐ|ts|ts\'|s)\b/g)
console.log(match)
I thought I could use \b, but it's wrong because there are "non-words" characters in the sets.
This is the current output:
[
"t",
"m",
"m"
]
But I want this to be the output:
[
"tɕ'",
"m",
"m",
"k",
"ʂ"
]
Note: notice that some sets end with a non-word boundary, like tɕ'.
(In phonetic terms, the consonants.)
As stated in comments above \b doesn't with unicode characters in JS and moreover from your expected output it appears that you don't need word boundaries.
You can use this shortened and refactored regex:
t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]
Code:
const s = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ';
const re = /t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]/g
console.log(s.match(re))
//=> ["tɕ'", "m", "m", "k", "ʂ" ]
RegEx Demo
RegEx Details:
- t[ɕʂs]'?: Match t followed by any letter inside [...] and then an optional '
|: OR
[tkp]'?: Match letters t or k or p and then an optional '
|: OR
[tmfnlhshɕʐʂ]): Match any letter inside [...]
I am trying to remove all the characters from the string after comma except the first letter. The string is basically the last name,first name.
For example:
Smith,John
I tried as below but it removes comma and everything after comma.
let str = "Smith,John";
str = str.replace(/\s/g, ""); // to remove all whitespace if there is any at the beginning, in the middle and at the end
str = str.split(',')[0];
Expected output: Smith,J
Thank you!
Or try (,\w).* with replace:
let str = "Smith,John";
str = str.replace(/(,\w).*/, '$1');
console.log(str);
Try this regex out:
\w+,\w
This matches one or more characters before the comma and then matches only 1 character.
Here is the demo: https://regex101.com/r/bKpWt7/1
Note: \w matches any character from [a-zA-Z0-9_].
Taking optional spaces around the comma in to account, and perhaps multiple "names" before the comma:
*([^\s,][^,\n]*?) *, *([^\s,]).*
* Match optional spaces
( Capture group 1
*([^\s,] Match optional spaces and match at least a single char other than a whitespace char or a ,
[^,\n]*? Match any char except a , or a newline non greedy
) Close group 1
*, * Match a comma between optional spaces
([^\s,]) Capture group 2, match a single char other than , or a whitespace char
.* Match the rest of the line
Regex demo
In the replacement using group 1 and group 2 with a comma in between $1,$2
const regex = / *([^\s,][^,\n]*?) *, *([^\s,]).*/;
[
"Smith,John Jack",
"Smith Lastname , Jack John",
"Smith , John",
" ,Jack"
].forEach(s => console.log(s.replace(regex, "$1,$2")));
I'm trying to split a string in infix notation into a tokenized list, ideally with regex.
e.g. ((10 + 4) ^ 2) * 5 would return ['(', '(', '10', '+', '4', ')', '^', '2', ')', '*', '5']
At the moment I'm just splitting it up by character, but this doesn't allow me to represent numbers with more than one digit.
I tried tokens = infixString.split("(\d+|[^ 0-9])"); which I found online for this very same problem, but I think it was for Java and it simply gives a list with only one element, being the entire infixString itself.
I know next to nothing about regex, so any tips would be appreciated. Thanks!
It's because you're passing a string to split. If you use a literal regex it will output something closer to what you'd expect
infixString.split(/(\d+|[^ 0-9])/)
// Array(23) [ "", "(", "", "(", "", "10", " ", "+", " ", "4", … ]
However there's a bunch of empty elements and white space that you might want to filter out
infixString.split(/(\d+|[^ 0-9])/).filter(e => e.trim().length > 0)
// Array(11) [ "(", "(", "10", "+", "4", ")", "^", "2", ")", "*", … ]
Depending on the version of JavaScript/ECMAScript you're targeting here, the syntax in the filter (or the filter function itself) might need to be adapted.
let test = "((10 + 4) ^ 2) * 5 * -1.5";
let arr = test.replace(/\s+/g, "").match(/(?:(?<!\d)-)?\d+(?:\.\d+)?|./g);
console.log(arr);
code { white-space: nowrap !important }
(?:(?<!\d)-)?\d+(?:\.\d+)?
(?:(?<!\d)-)? — Negative lookbehind. Catching minus sign, only if it is not a subtraction (has no \d digit behind)
(?:\.\d+)? — ?: non capture group, \.\d+ dot and one or more digits, ? optional.
I'm trying to match words that consist only of characters in this character class: [A-z'\\/%], excluding cases where:
they are between < and >
they are between [ and ]
they are between { and }
So, say I've got this funny string:
[beginning]<start>How's {the} /weather (\\today%?)[end]
I need to match the following strings:
[ "How's", "/weather", "\\today%" ]
I've tried using this pattern:
/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm
But for some reason, it matches:
[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]
I'm not sure why my pattern allows stuff between [ and ], since I used (?![^\[]*\]), and a similar approach seems to work for not matching {these cases} and <these cases>. I'm also not sure why it matches all the empty strings.
Any wisdom? :)
There are essentially two problems with your pattern:
Never use A-z in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z (or A-Za-z).
Using the * quantifier after the character class will allow empty matches. Use the + quantifier instead.
So, the fixed pattern should be:
[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)
Demo.
1 The [A-z] character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').
Split it with regular expression:
let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);
console.log(matches.filter(v => "" !== v));
You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.
The [^ is a negated character class that matches any character except what is specified.
(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)
Explanation
(?: Non capture group
\[[^\][]*] Match from opening till closing []
| Or
<[^<>]*> Match from opening till closing <>
| Or
{[^{}]*} Match from opening till closing {}
) Close non capture group
| Or
([A-Za-z'/\\%]+) Repeat the character class 1+ times to prevent empty matches and capture in group 1
Regex demo
const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m[1] !== undefined) console.log(m[1]);
}
I want to have a regular expression in JavaScript which help me to validate a string with contains only lower case character and and this character -.
I use this expression:
var regex = /^[a-z][-\s\.]$/
It doesn't work. Any idea?
Just use
/^[a-z-]+$/
Explanation
^ : Match from beginning string.
[a-z-] : Match all character between a-z and -.
[] : Only characters within brackets are allowed.
a-z : Match all character between a-z. Eg: p,s,t.
- : Match only strip (-) character.
+ : The shorthand of {1,}. It's means match 1 or more.
$: Match until the end of the string.
Example
const regex= /^[a-z-]+$/
console.log(regex.test("abc")) // true
console.log(regex.test("aBcD")) // false
console.log(regex.test("a-c")) // true
Try this:
var regex = /^[-a-z]+$/;
var regex = /^[-a-z]+$/;
var strs = [
"a",
"aB",
"abcd",
"abcde-",
"-",
"-----",
"a-b-c",
"a-D-c",
" "
];
strs.forEach(str=>console.log(str, regex.test(str)));
Try this
/^[a-z-]*$/
it should match the letters a-z or - as many times as possible.
What you regex does is trying to match a-z one single time, followed by any of -, whitespace or dot one single time. Then expect the string to end.
Use this regular expression:
let regex = /^[a-z\-]+$/;
Then:
regex.test("abcd") // true
regex.test("ab-d") // true
regex.test("ab3d") // false
regex.test("") // false
PS: If you want to allow empty string "" to pass, use /^[a-z\-]*$/. Theres an * instead of + at the end. See Regex Cheat Sheet: https://www.rexegg.com/regex-quickstart.html
I hope this helps
var str = 'asdadWW--asd';
console.log(str.match(/[a-z]|\-/g));
This will work:
var regex = /^[a-z|\-|\s]+$/ //For this regex make use of the 'or' | operator
str = 'test- ';
str.match(regex); //["test- ", index: 0, input: "test- ", groups: undefined]
str = 'testT- ' // string now contains an Uppercase Letter so it shouldn't match anymore
str.match(regex) //null