Multiple OR conditions for words in JavaScript regular expression - javascript

I trying to have a regular expression which is finding between two words but those words are not certain one.
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
This is my text. I'm trying to find the word between Soyadı and Sınıfı, in this case ERTANĞA, but the word Sınıfı also can be no, numara or any number. This is what I did.
soyad[ıi](.*)S[ıi]n[ıi]f[ıi]|no|numara|[0-9]
[ıi] is for Turkish character issue, don't mind that.

You can use something like below :
/.*Soyad(ı|i)|S(ı|i)n(ı|i)f(ı|i).*|no.*|numera.*|[0-9]/gmi
Here is the link I worked on : https://regex101.com/r/QXLjLF/1
In JS code:
const regex = /.*Soyad(ı|i)|S(ı|i)n(ı|i)f(ı|i).*|no.*|numera.*|[0-9]/gmi;
var str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303`;
var newStr = str.replace(regex, '');
console.log(newStr);

You can use a single capture group to get the word ERTANĞA, keep the character class [ıi] instead of using an alternation for (ı|i) and group the alternatives at the end of the pattern using a non capture group (?:
soyad[ıi](.+?)(?:S[ıi]n[ıi]f[ıi]|n(?:o|umara)|[0-9])
soyad[ıi] Match soyadı or soyadi
(.+?) Capture group 1, match 1 or more chars as least as possible
(?: Non capture group
S[ıi]n[ıi]f[ıi] Match S and then ı or i etc..
| Or
n(?:o|umara) Match either no or numara
| Or
[0-9] Match a digit 0-9
) Close non capture group
Note that you don't need the /m flag as there are no anchors in the pattern.
Regex demo
const regex = /soyad[ıi](.+?)(?:S[ıi]n[ıi]f[ıi]|n(?:o|umara)|[0-9])/gi;
const str = "2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303\n";
console.log(Array.from(str.matchAll(regex), m => m[1]));

This might do it
const str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnumaraE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnoE10/ENo303`
const re = /(?:Soyad(ı|i))(.*?)(?:S(ı|i)n(ı|i)f(ı|i)|no|numara)/gmi
console.log([...str.matchAll(re)].map(x => x[2]))
ES5
const str = `2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞASınıfıE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnumaraE10/ENo303
2015ÖĞLEYEMEKKARTI(2016-20.AdıMEVLÜTSoyadıERTANĞAnoE10/ENo303`
const re = /(?:Soyad(ı|i))(.*?)(?:S(ı|i)n(ı|i)f(ı|i)|no|numara)/gmi
const res = []
let match;
while ((match = re.exec(str)) !== null) res.push(match[2])
console.log(res)

Related

How do I replace the last character of the selected regex?

I want this string {Rotation:[45f,90f],lvl:10s} to turn into {Rotation:[45,90],lvl:10}.
I've tried this:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d)\w+/g
console.log(bar.replace(regex, '$&'.substring(0, -1)))
I've also tried to just select the letter at the end using $ but I can't seem to get it right.
You can use
bar.replace(/(\d+)[a-z]\b/gi, '$1')
See the regex demo.
Here,
(\d+) - captures one or more digits into Group 1
[a-z] - matches any letter
\b - at the word boundary, ie. at the end of the word
gi - all occurrences, case insensitive
The replacement is Group 1 value, $1.
See the JavaScript demo:
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[a-z]\b/gi
console.log(bar.replace(regex, '$1'))
Check this out :
const str = `{Rotation:[45f,90f],lvl:10s}`.split('');
const x = str.splice(str.length - 2, 1)
console.log(str.join(''));
You can use positive lookahead to match the closing brace, but not capture it. Then the single character can be replaced with a blank string.
const bar= '{Rotation:[45f,90f],lvl:10s}'
const regex = /.(?=})/g
console.log(bar.replace(regex, ''))
{Rotation:[45f,90f],lvl:10}
The following regex will match each group of one or more digits followed by f or s.
$1 represents the contents captured by the capture group (\d).
const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[fs]/g
console.log(bar.replace(regex, '$1'))

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Reg Exp for finding hashtag words

I have the following sentence as a test:
This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here
I have figured out most of the Reg Exp I need. Here's what I have so far: /((?<=#)([A-Z]*))/gi
This matches every tag but also matches the shouldnotshow portion. I want to not match words that are prefixed by anything but # (excluding whitespace & \n).
So the only matched words I should get are: shouldshow show yes.
Note: after #show is a newline
You just need to see if the hash is prefixed with whitespace or starts the string
https://regex101.com/r/JDuGvr/1
/(\s|^)#(\w+)/gm
with positive lookbehind as OP used
https://regex101.com/r/06X3ZX/1
/(?<=(\s|^)#)(\w+)/gm;
use [a-zA-Z0-9] if you do not want an underscore
const re1 = /(\s|^)#(\w+)/gm;
const re2 = /(?<=(\s|^)#)(\w+)/gm;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on here`;
const res1 = [...str.matchAll(re1)].map(match => match[2]); // here the match is the third item
console.log(res1)
const res2 = [...str.matchAll(re2)].map(match => match[0]); // match is the first item
console.log(res2)
Another option could be using your pattern asserting a # on the left that does not have a non whitespace char before it using (?<!\S)# and get the match only without capture groups.
Match at least 1+ times a char A-Z to prevent matching an empty string.
(?<=(?<!\S)#)[A-Z]+
Regex demo
const regex = /(?<=(?<!\S)#)[A-Z]+/gi;
const str = `This is a test with #shouldshow and to see if there #show
#yes this#shouldnotshow what is going on her`;
console.log(str.match(regex));

Setting the end of the match

I have the following string:
[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
And I would like to get the value of the prefix of TITLE, which is a.
I have tried it with (?<=TITLE|)(?<=prefix=).*?(?=]|\|) and that seems to work but that gives me also the prefix of STORENAME (b). So if [TITLE|prefix=a] will be missing in the string, I'll have the wrong value.
So I need to set the end of the match with ] that belongs to [TITLE. Please notice that this string is dynamic. So it could be [TITLE|suffix=x|prefix=y] as well.
const regex = "[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]".match(/(?<=TITLE|)(?<=prefix=).*?(?=]|\|)/);
console.log(regex);
You can use
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+
See the regex demo. Details:
(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=) - a location in string immediately preceded with TITLE|prefix| or TITLE|suffix=...|prefix|
[^\]|]+ - one or more chars other than ] and |.
See JavaScript demo:
const texts = ['[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]', '[TITLE|suffix=s|prefix=a]'];
for (let s of texts) {
console.log(s, '=>', s.match(/(?<=TITLE(?:\|suffix=[^\]|]+)?\|prefix=)[^\]|]+/)[0]);
}
You could also use a capturing group
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Explanation
\[TITLE\| Match [TITLE|
(?:\w+=\w+\|)* Repeat 0+ occurrences wordchars = wordchars and |
prefix= Match literally
(\w+) Capture group 1, match 1+ word chars
[^\]]* Match any char except ]
] Match the closing ]
Regex demo
const regex = /\[TITLE\|(?:\w+=\w+\|)*prefix=(\w+)[^\]]*\]/g;
const str = `[TITLE|prefix=a] [STORENAME|prefix=b|suffix=c] [DYNAMIC|limit=10|random=0|reverse=0]
[TITLE|suffix=x|prefix=y]`;
let m;
while ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}
Or with a negated character class instead of \w
\[TITLE\|(?:[^|=\]]*=[^|=\]]*\|)*prefix=([^|=\]]*)[^\]]*]
Regex demo

Making Regex more safe

i'm trying to turn a bunch of regex more safe, what i mean by more safe, i want more accuracy.
So, i'm very new with RegExp, and i want know if i'm doing this right (not the Regex, but turn into more safety).
So, i'm starting now, and this is the first RegExp that i want change, i want push the 01/2011.
Past RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP.*?(\d\S*)/;
var match = reg.exec(text);
console.log(match[1]);
New RegExp:
var text = 'INSCRIÇÃO: 60.537.263/0001-66 COMP: 01/2011 COD REC: 150';
var reg = /COMP:\s([0-9]{0,2}\/[0-9]{0,4})/;
var match = reg.exec(text);
console.log(match[1]);
Why this? This text is just a part of a huge text, so i need accuraci.
Other question is about turn the Regex optional, so if doesn't match anything, return undefined.
Thanks.
According to your feedback:
i want specifically push the value with two numbers, one / and four numbers
You can use
/\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g
The \b is a word boundary, thus 5COMP won't be matched.
The \s* will match 0 or more whitespace (if there must be whitespace, use + quantifier instead).
The \d{2} will match exactly 2 digits.
The \d{4} will match 4 digits and no more because of the look-ahead (?!\d). This look-ahead just makes sure there is no digit after the 4 previous digits. You may use \b here as well to ensure matching a word boundary.
arr = [];
var re = /\bCOMP:\s*(\d{2}\/\d{4})(?!\d)/g;
var str = 'COMP:10/9995, COMP: 21/1234, COMP: 21/123434, REGCOMP: 21/1234';
var m;
while ((m = re.exec(str)) !== null) {
arr.push(m[1]);
}
console.log(arr);

Categories

Resources