Matching sets consisting of letters plus non-letter characters - javascript

I want to match sets of characters that include a letter and non-letter characters. Many of them are a single letter. Or two letters.
const match = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ'.match(/\b(p|p\'|m|f|t|t\'|n|l|k|k\'|h|tɕ|tɕ\'|ɕ|tʂ|tʂ\'|ʂ|ʐ|ts|ts\'|s)\b/g)
console.log(match)
I thought I could use \b, but it's wrong because there are "non-words" characters in the sets.
This is the current output:
[
"t",
"m",
"m"
]
But I want this to be the output:
[
"tɕ'",
"m",
"m",
"k",
"ʂ"
]
Note: notice that some sets end with a non-word boundary, like tɕ'.
(In phonetic terms, the consonants.)

As stated in comments above \b doesn't with unicode characters in JS and moreover from your expected output it appears that you don't need word boundaries.
You can use this shortened and refactored regex:
t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]
Code:
const s = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ';
const re = /t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]/g
console.log(s.match(re))
//=> ["tɕ'", "m", "m", "k", "ʂ" ]
RegEx Demo
RegEx Details:
- t[ɕʂs]'?: Match t followed by any letter inside [...] and then an optional '
|: OR
[tkp]'?: Match letters t or k or p and then an optional '
|: OR
[tmfnlhshɕʐʂ]): Match any letter inside [...]

Related

Is there a regex to remove everything after comma in a string except first letter

I am trying to remove all the characters from the string after comma except the first letter. The string is basically the last name,first name.
For example:
Smith,John
I tried as below but it removes comma and everything after comma.
let str = "Smith,John";
str = str.replace(/\s/g, ""); // to remove all whitespace if there is any at the beginning, in the middle and at the end
str = str.split(',')[0];
Expected output: Smith,J
Thank you!
Or try (,\w).* with replace:
let str = "Smith,John";
str = str.replace(/(,\w).*/, '$1');
console.log(str);
Try this regex out:
\w+,\w
This matches one or more characters before the comma and then matches only 1 character.
Here is the demo: https://regex101.com/r/bKpWt7/1
Note: \w matches any character from [a-zA-Z0-9_].
Taking optional spaces around the comma in to account, and perhaps multiple "names" before the comma:
*([^\s,][^,\n]*?) *, *([^\s,]).*
* Match optional spaces
( Capture group 1
*([^\s,] Match optional spaces and match at least a single char other than a whitespace char or a ,
[^,\n]*? Match any char except a , or a newline non greedy
) Close group 1
*, * Match a comma between optional spaces
([^\s,]) Capture group 2, match a single char other than , or a whitespace char
.* Match the rest of the line
Regex demo
In the replacement using group 1 and group 2 with a comma in between $1,$2
const regex = / *([^\s,][^,\n]*?) *, *([^\s,]).*/;
[
"Smith,John Jack",
"Smith Lastname , Jack John",
"Smith , John",
" ,Jack"
].forEach(s => console.log(s.replace(regex, "$1,$2")));

Validate string in regular expression

I want to have a regular expression in JavaScript which help me to validate a string with contains only lower case character and and this character -.
I use this expression:
var regex = /^[a-z][-\s\.]$/
It doesn't work. Any idea?
Just use
/^[a-z-]+$/
Explanation
^ : Match from beginning string.
[a-z-] : Match all character between a-z and -.
[] : Only characters within brackets are allowed.
a-z : Match all character between a-z. Eg: p,s,t.
- : Match only strip (-) character.
+ : The shorthand of {1,}. It's means match 1 or more.
$: Match until the end of the string.
Example
const regex= /^[a-z-]+$/
console.log(regex.test("abc")) // true
console.log(regex.test("aBcD")) // false
console.log(regex.test("a-c")) // true
Try this:
var regex = /^[-a-z]+$/;
var regex = /^[-a-z]+$/;
var strs = [
"a",
"aB",
"abcd",
"abcde-",
"-",
"-----",
"a-b-c",
"a-D-c",
" "
];
strs.forEach(str=>console.log(str, regex.test(str)));
Try this
/^[a-z-]*$/
it should match the letters a-z or - as many times as possible.
What you regex does is trying to match a-z one single time, followed by any of -, whitespace or dot one single time. Then expect the string to end.
Use this regular expression:
let regex = /^[a-z\-]+$/;
Then:
regex.test("abcd") // true
regex.test("ab-d") // true
regex.test("ab3d") // false
regex.test("") // false
PS: If you want to allow empty string "" to pass, use /^[a-z\-]*$/. Theres an * instead of + at the end. See Regex Cheat Sheet: https://www.rexegg.com/regex-quickstart.html
I hope this helps
var str = 'asdadWW--asd';
console.log(str.match(/[a-z]|\-/g));
This will work:
var regex = /^[a-z|\-|\s]+$/ //For this regex make use of the 'or' | operator
str = 'test- ';
str.match(regex); //["test- ", index: 0, input: "test- ", groups: undefined]
str = 'testT- ' // string now contains an Uppercase Letter so it shouldn't match anymore
str.match(regex) //null

Matching UPPERCASE, PascalCase and camelCase in single word

Let's say I have a string testTESTCheckTESTAnother and I want to split it in few words, like that ["test", "TEST", "Check", "TEST", "Another"].
Input:
Only [A-Za-z] characters allowed
testTESTCheckTESTAnother
Code:
My best try with regex was:
"testTESTCheckTESTAnother".match(/^[a-z]+|[A-Z][a-z]*/g)
Output: ["test", "T", "E", "S", "T", "Check", "T", "E", "S", "T", "Another"]
I tried negative lookahead but it didn't work either:
"testTESTCheckTESTAnother".match(/?![A-Z][a-z]+)[A-Z]+/g)
Output: ["TESTC", "TESTA"]
Desired output:
["test", "TEST", "Check", "TEST", "Another"]
Other inputs-outputs:
input: "ITest"
output: ["I", "Test"]
input: "WHOLETESTWORD"
output: ["WHOLETESTWORD"]
input: "C"
output: ["C"]
Regex
/[a-z]+|[A-Z]+(?=[A-Z]|$)|([A-Z][a-z]+)/g
Demo
[a-z]+ - Lowercase
[A-Z]+(?=[A-Z]|$) - Uppercase
([A-Z][a-z]+) - TitleCase
let string = "testTESTCheckTESTAnother"
console.log(string.match(/[a-z]+|[A-Z]+(?=[A-Z]|$)|([A-Z][a-z]+)/g))
Use this regular expression: ^[a-z]+|((?![A-Z][a-z])[A-Z])+|[A-Z][a-z]+
See it in action at https://regex101.com/r/5r8MzJ/1
Explanation. We have three alternative patterns we will capture.
^[a-z]+
Accept a series of lowercase letters at the start of the string only.
((?![A-Z][a-z])[A-Z])+
Accept a series of uppercase letters except the last one if followed by a lowercase letter
[A-Z][a-z]+
Accept a series of one uppercase letter and at least one lowercase letters.

Regex - split mathematic expression

I'm building something called formula builder. The idea is, the user have to type text of formula in textarea, then we'll be parse the string value. The result is array.
For example, this text will be parsed
LADV-(GCNBIZ+UNIN)+(TNW*-1)
then generate result below
["LADV", "-", "(", "GCNBIZ", "+", "UNIN", ")", "+", "(", "TNW", "*", "-1", ")"]
The point is to split each word joined by one of this character: +, *, /, -, (, and this ); but still include the splitter itself.
I have tried to split using this expression /[-+*/()]/g, but the result doesn't include the splitter character. And also the -1 need to be detected as one expression.
["LADV", "MISC", "", "GCNBIZ", "UNIN", "", "", "TNW", "", "1", ""]
What is the match regex to solve this?
var input = 'LADV-(GCNBIZ+UNIN)+(TNW*-1)';
var match = input.match(/(-?\d+)|([a-z]+)|([-+*()\/])/gmi);
console.log(match);
You can use match instead of split with an alternation regex:
var s = 'LADV-(GCNBIZ+UNIN)+(TNW*-1)';
var m = s.match(/(-\d+(?:\.\d+)?|[-+\/*()]|\w+)/g);
console.log(m);
//=> ["LADV", "-", "(", "GCNBIZ", "+", "UNIN", ")", "+", "(", "TNW", "*", "-1", ")"]
RegEx Breakup:
( # start capture group
- # match a hyphen
\d+(?:\.\d+)? # match a number
| # OR
[-+\/*()] # match one of the symbols
| # OR
\w+ # match 1 or more word characters
) # end capture group
Order of patterns in alternation is important.

javascript regexp to identify different components of a sentence

I have a very specific requirement. Consider the sentence "I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray"
I am interested in a regexp which recognizes "I", "am", "a" , "robot", "X-rrt", ",", "I", "am", "35", "and", "my", "creator", "is", "5-MAF", ".", "Everthing", "here", "is", "5", "times", "than", "my", "world5", "-", "hurray"
i.e 1)it should recognize all punctuations except "-" when it a part of a word
2)numbers if part of a word containg alphabets should not be recognized seperately
I am extremely confused with this one. Would appreciate some advise!
Try splitting at each group of whitespaces, and before dots and commas:
str.split(/\s+|(?=[.,])/);
This is not too easy. I suggest some preprocession on the text before a split, for example:
var text = "I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray";
var preprocessedText = text.replace(/(\w|^)(\W)( |$)/g, "$1 $2$3");
var tokens = preprocessedText.split(" ");
alert(tokens.join("\n"));
I tested this in perl. Shouldn't be too hard to translate to javascript.
my $sentence = 'I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray';
my #words = split(/\s|(?<!-)\b(?!-)/, $sentence);
say "'" . join ("', '", #words) . "'";
Try this match regexp:
str.match(/[\w\d-]+|.|,/g);
Here is a solution that meets both your requirements:
/(?:\w|\b-\b)+|[^\w\s]+/g
See the regex demo.
Details:
(?:\w|\b-\b)+ - 1 or more
\w - word char
| - or
\b-\b - a hyphen in between word characters
| - or
[^\w\s]+ - 1 or more characters other than word and whitespace symbols.
See the JS demo below:
var s = "I am a robot X-rrt, I am 35 and my creator is 5-MAF. Everything here is 5 times than my world5 - hurray";
console.log(s.match(/(?:\w|\b-\b)+|[^\w\s]+/g));

Categories

Resources