split string based on words and highlighted portions with `^` sign

split string based on words and highlighted portions with `^` sign - javascript

I have a string that has highlighted portions with ^ sign:
const inputValue = 'jhon duo ^has a car^ right ^we know^ that';
Now how to return an array which is splited based on words and ^ highlights, so that we return this array:
['jhon','duo', 'has a car', 'right', 'we know', 'that']
Using const input = inputValue.split('^'); to split by ^ and const input = inputValue.split(' '); to split by words is not working and I think we need a better idea.
How would you do this?

You can use match with a regular expression:
const inputValue = 'jhon duo ^has a car^ right ^we know^ that';
const result = Array.from(inputValue.matchAll(/\^(.*?)\^|([^^\s]+)/g),
([, a, b]) => a || b);
console.log(result);
\^(.*?)\^ will match a literal ^ and all characters until the next ^ (including it), and the inner part is captured in a capture group
([^^\s]+) will match a series of non-white space characters that are not ^ (a "word") in a second capture group
| makes the above two patterns alternatives: if the first doesn't match, the second is tried.
The Array.from callback will extract only what occurs in a capture group, so excluding the ^ characters.

trincot's answer is good, but here's a version that doesn't use regex and will throw an error when there are mismatched ^:
function splitHighlights (inputValue) {
const inputSplit = inputValue.split('^');
let highlighted = true
const result = inputSplit.flatMap(splitVal => {
highlighted = !highlighted
if (splitVal == '') {
return [];
} else if (highlighted) {
return splitVal.trim();
} else {
return splitVal.trim().split(' ')
}
})
if (highlighted) {
throw new Error(`unmatched '^' char: expected an even number of '^' characters in input`);
}
return result;
}
console.log(splitHighlights('^jhon duo^ has a car right ^we know^ that'));
console.log(splitHighlights('jhon duo^ has^ a car right we^ know that^'));
console.log(splitHighlights('jhon duo^ has a car^ right ^we know^ that'));
console.log(splitHighlights('jhon ^duo^ has a car^ right ^we know^ that'));

You can still use split() but capture the split-sequence to include it in the output.
For splitting you could use *\^([^^]*)\^ *| + to get trimmed items in the results.
const inputValue = 'jhon duo ^has a car^ right ^we know^ that';
// filtering avoids empty items if split-sequence at start or end
let input = inputValue.split(/ *\^([^^]*)\^ *| +/).filter(Boolean);
console.log(input);
regex
matches
*\^
any amount of space followed by a literal caret
([^^]*)
captures any amount of non-carets
\^ *
literal caret followed by any amount of space
| +
OR split at one or more spaces

Related

Calculating mixed numbers and chars and concatinating it back again in JS/jQuery

I need to manipulate drawing of a SVG, so I have attribute "d" values like this:
d = "M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985"
What I want is to "purify" all the values (to strip the chars from them), to calculate them (for the sake of simplicity, let's say to add 100 to each value), to deconstruct the string, calculate the values inside and then concatenate it all back together so the final result is something like this:
d = "M661.5402,368.917 C735.622,368.917 404.476,665.985 479.298,665.985"
Have in mind that:
some values can start with a character
values are delimited by comma
some values within comma delimiter can be delimited by space
values are decimal
This is my try:
let arr1 = d.split(',');
arr1 = arr1.map(element => {
let arr2 = element.split(' ');
if (arr2.length > 1) {
arr2 = arr2.map(el => {
let startsWithChar = el.match(/\D+/);
if (startsWithChar) {
el = el.replace(/\D/g,'');
}
el = parseFloat(el) + 100;
if (startsWithChar) {
el = startsWithChar[0] + el;
}
})
}
else {
let startsWithChar = element.match(/\D+/);
if (startsWithChar) {
element = element.replace(/\D/g,'');
}
element = parseFloat(element) + 100;
if (startsWithChar) {
element = startsWithChar[0] + element;
}
}
});
d = arr1.join(',');
I tried with regex replace(/\D/g,'') but then it strips the decimal dot from the value also, so I think my solution is full of holes.
Maybe another solution would be to somehow modify directly each of path values/commands, I'm opened to that solution also, but I don't know how.

const s = 'M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985'
console.log(s.replaceAll(/[\d.]+/g, m=>+m+100))

You might use a pattern to match the format in the string with 2 capture groups.
([ ,]?\b[A-Z]?)(\d+\.\d+)\b
The pattern matches:
( Capture group 1
[ ,]?\b[A-Z]? Match an optional space or comma, a word boundary and an optional uppercase char A-Z
) Close group 1
( Capture group 2
\d+\.\d+ Match 1+ digits, a dot and 1+ digits
) Close group 1
\b A word boundary to prevent a partial word match
Regex demo
First capture the optional delimiter followed by an optional uppercase char in group 1, and the decimal number in group 2.
Then add 100 to the decimal value and join back the 2 group values.
const d = "M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985";
const regex = /([ ,]?\b[A-Z]?)(\d+\.\d+)\b/g;
const res = Array.from(
d.matchAll(regex), m => m[1] + (+m[2] + 100)
).join('');
console.log(res);

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?

1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);

You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Insure that regex moves to the second OR element only if the first one doesn't exist

I'm trying to match a certain word on a string and only if it doesn't exist i want to match the another one using the OR | operator ....but the match is ignoring that... how can i insure that the behavior works :
const str = 'Soraka is an ambulance 911'
const regex = RegExp('('+'911'+'|'+'soraka'+')','i')
console.log(str.match(regex)[0]) // should get 911 instead

911 occurs late in the string, whereas Soraka occurs earlier, and the regex engine iterates character-by-character, so Soraka gets matched first, even though it's on the right-hand side of the alternation.
One option would be to match Soraka or 911 in captured lookaheads instead, and then with the regex match object, alternate between the two groups to get the one which is not undefined:
const check = (str) => {
const regex = /^(?=.*(911)|.*(Soraka))/;
const match = str.match(regex);
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('foo 911');
check('foo Soraka');

You can use includes and find
You can pass the strings in the priority sequence, so as soon as find found any string in the original string it returns that strings back,
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.includes(toCheck))
}
console.log(findStr("911", "Soraka"))
You can extend the findStr if you want your match to be case insensitive something like this
const str = 'Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck => str.toLowerCase().includes(toCheck.toLowerCase()))
}
console.log(findStr("Soraka", "911"))
If you want match to be whole word not the partial words than you can build dynamic regex and use it search value
const str = '911234 Soraka is an ambulance 911'
const findStr = (...arg) => {
return [...arg].find(toCheck =>{
let regex = new RegExp(`\\b${toCheck}\\b`,'i')
return regex.test(str)
})
}
console.log(findStr("911", "Soraka"))

Just use a greedy dot before a capturing group that matches 911 or Soraka:
/.*(911)|(Soraka)/
See the regex demo
The .* (or, if there are line breaks, use /.*(911)|(Soraka)/s in Chrome/Node, or /[^]*(911)|(Soraka)/ to support legacy EMCMScript versions) will ensure the regex index advances to the rightmost position when matching 911 or Soraka.
JS demo (borrowed from #CertainPerformance's answer):
const check = (str) => {
const regex = /.*(911)|(Soraka)/;
const match = str.match(regex) || ["","NO MATCH","NO MATCH"];
console.log(match[1] || match[2]);
};
check('Soraka is an ambulance 911');
check('Ambulance 911, Soraka');
check('foo 911');
check('foo Soraka');
check('foo oops!');

why condition is always true in javascript?

Could you please tell me why my condition is always true? I am trying to validate my value using regex.i have few conditions
Name should not contain test "text"
Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
I do like this
https://jsfiddle.net/aoerLqkz/2/
var val = 'ab dd'
if (/test|[^a-z]|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i.test(val)) {
alert( 'match')
} else {
alert( 'false')
}
I tested my code with the following string and getting an unexpected result
input string "abc" : output fine :: "match"
input string "aaa" : output fine :: "match"
input string "aa a" : **output ** :: "match" why it is match ?? there is space between them why it matched ????
input string "sa c" : **output ** :: "match" why it is match ?? there is different string and space between them ????

The string sa c includes a space, the pattern [^a-z] (not a to z) matches the space.
Possibly you want to use ^ and $ so your pattern also matches the start and end of the string instead of looking for a match anywhere inside it.

there is space between them why it matched ????
Because of the [^a-z] part of your regular expression, which matches the space:
> /[^a-z]/i.test('aa a');
true

The issue is the [^a-z]. This means that any string that has a non-letter character anywhere in it will be a match. In your example, it is matching the space character.
The solution? Simply remove |[^a-z]. Without it, your regex meets all three criteria.
test checks if the value contains the word 'test'.
abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz checks if the value contains three sequential letters.
(.)\1\1 checks if any character is repeated three times.
Complete regex:
/test|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i`
I find it helpful to use a regex tester, like https://www.regexpal.com/, when writing regular expressions.
NOTE: I am assuming that the second criteria actually means "three consecutive letters", not "three consecutive characters" as it is written. If that is not true, then your regex doesn't meet the second criteria, since it only checks for three consecutive letters.

I would not do this with regular expresions, this expresion will always get more complicated and you have not the possibilities you had if you programmed this.
The rules you said suggest the concept of string derivative. The derivative of a string is the distance between each succesive character. It is specially useful dealing with password security checking and string variation in general.
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
//these strings have the same derivative: [0,0,0,0]
console.log(derivative('aaaaa'));
console.log(derivative('bbbbb'));
//these strings also have the same derivative: [1,1,1,1]
console.log(derivative('abcde'));
console.log(derivative('mnopq'));
//up and down: [1,-1, 1,-1, 1]
console.log(derivative('ababa'));
With this in mind you can apply your each of your rules to each string.
// Rules:
// 1. Name should not contain test "text"
// 2. Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
// 3. Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
const arrayContains = (master, sub) =>
master.join(",").indexOf( sub.join( "," ) ) == -1;
const rule1 = (text) => !text.includes('text');
const rule2 = (text) => !arrayContains(derivative(text),[1,1]);
const rule3 = (text) => !arrayContains(derivative(text),[0,0]);
const testing = [
"smthing textual",'abc','aaa','xyz','12345',
'1111','12abb', 'goodbcd', 'weeell'
];
const results = testing.map((input)=>
[input, rule1(input), rule2(input), rule3(input)]);
console.log(results);

Based on the 3 conditions in the post, the following regex should work.
Regex: ^(?:(?!test|([a-z])\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz).)*$
Demo

Javascript validation regex for names

I am looking to accept names in my app with letters and hyphens or dashes, i based my code on an answer i found here
and coded that:
function validName(n){
var nameRegex = /^[a-zA-Z\-]+$/;
if(n.match(nameRegex) == null){
return "Wrong";
}
else{
return "Right";
}
}
the only problem is that it accepts hyphen as the first letter (even multiple ones) which i don't want.
thanks

Use negative lookahead assertion to avoid matching the string starting with a hyphen. Although there is no need to escape - in the character class when provided at the end of character class. Use - removed character class for avoiding - at ending or use lookahead assertion.
var nameRegex = /^(?!-)[a-zA-Z-]*[a-zA-Z]$/;
// or
var nameRegex = /^(?!-)(?!.*-$)[a-zA-Z-]+$/;
var nameRegex = /^(?!-)[a-zA-Z-]*[a-zA-Z]$/;
// or
var nameRegex1 = /^(?!-)(?!.*-$)[a-zA-Z-]+$/;
function validName(n) {
if (n.match(nameRegex) == null) {
return "Wrong";
} else {
return "Right";
}
}
function validName1(n) {
if (n.match(nameRegex1) == null) {
return "Wrong";
} else {
return "Right";
}
}
console.log(validName('abc'));
console.log(validName('abc-'));
console.log(validName('-abc'));
console.log(validName('-abc-'));
console.log(validName('a-b-c'));
console.log(validName1('abc'));
console.log(validName1('abc-'));
console.log(validName1('-abc'));
console.log(validName1('-abc-'));
console.log(validName1('a-b-c'));
FYI : You can use RegExp#test method for searching regex match and which returns boolean based on regex match.
if(nameRegex.test(n)){
return "Right";
}
else{
return "Wrong";
}
UPDATE : If you want only single optional - in between words, then use a 0 or more repetitive group which starts with -as in #WiktorStribiżew answer .
var nameRegex = /^[a-zA-Z]+(?:-[a-zA-Z]+)*$/;

You need to decompose your single character class into 2 , moving the hyphen outside of it and use a grouping construct to match sequences of the hyphen + the alphanumerics:
var nameRegex = /^[a-zA-Z]+(?:-[a-zA-Z]+)*$/;
See the regex demo
This will match alphanumeric chars (1 or more) at the start of the string and then will match 0 or more occurrences of - + one or more alphanumeric chars up to the end of the string.
If there can be only 1 hyphen in the string, replace * at the end with ? (see the regex demo).
If you also want to allow whitespace between the alphanumeric chars, replace the - with [\s-] (demo).

You can either use a negative lookahead like Pranav C Balan propsed or just use this simple expression:
^[a-zA-Z]+[a-zA-Z-]*$
Live example: https://regex101.com/r/Dj0eTH/1

The below regex is useful for surnames if one wants to forbid leading or trailing non-alphabetic characters, while permitting a small set of common word-joining characters in between two names.
^[a-zA-Z]+[- ']{0,1}[a-zA-Z]+$
Explanation
^[a-zA-Z]+ must begin with at least one letter
[- ']{0,1} allow zero or at most one of any of -, or '
[a-zA-Z]+$ must end with at least one letter
Test cases
(The double-quotes have been added purely to illustrate the presence of whitespace.)
"Blair" => match
" Blair" => no match
"Blair " => no match
"-Blair" => no match
"- Blair" => no match
"Blair-" => no match
"Blair -" => no match
"Blair-Nangle" => match
"Blair--Nangle" => no match
"Blair Nangle" => match
"Blair -Nangle" => no match
"O'Nangle" => match
"BN" => match
"BN " => no match
" O'Nangle" => no match
"B" => no match
"3Blair" => no match
"!Blair" => no match
"van Nangle" => match
"Blair'" => no match
"'Blair" => no match
Limitations include:
No single-character surnames
No surnames composed of more than two words
Check it out on regex101.

Develop Reference

JavaScript is the programming language of the Web.

split string based on words and highlighted portions with `^` sign - javascript

Related

Calculating mixed numbers and chars and concatinating it back again in JS/jQuery

Regex match apostrophe inside, but not around words, inside a character set

Insure that regex moves to the second OR element only if the first one doesn't exist

why condition is always true in javascript?

Javascript validation regex for names

Categories

Resources