Regex to match all of symbols but except a word - javascript

How do regex to match all of symbols but except a word?
Need find all symbols except a word.
(.*) - It find all symbols.
[^v] - It find all symbols except letter v
But do how find all symbols except a word?
Solution (writed below):
((?:(?!here any word for block)[\s\S])*?)
or
((?:(?!here any word for block).)*?)
((?:(?!video)[\s\S])*?)
I want to find all except |end| and replace all except `|end|.
I try:
Need all except |end|
var str = '|video| |end| |water| |sun| |cloud|';
// May be:
//var str = '|end| |video| |water| |sun| |cloud|';
//var str = '|cloud| |video| |water| |sun| |end|';
str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);
function test_fun2(match, p1, offset, str_full) {
console.log("--------------");
p1 = "["+p1+"]";
console.log(p1);
console.log("--------------");
return p1;
}
Output console log:
--------------
[video]
--------------
--------------
--------------
--------------
--------------
--------------
--------------
Example what need:
Any symbols except [video](
input - '[video](text-1 *******any symbols except: "[video](" ******* [video](text-2 any symbols) [video](text-3 any symbols) [video](text-4 any symbols) [video](text-5 any symbols)'
output - <div>text-1 *******any symbols except: "[video](" *******</div> <div>text-2 any symbols</div><div>text-3 any symbols</div><div>text-4 any symbols</div><div>text-5 any symbols</div>

Scenario 1
Use the best trick ever:
One key to this technique, a key to which I'll return several times, is that we completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we inspect the Group 1 matches, which, when set, contain what we are looking for.
Solution:
s = s.replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) {
return $1 ? "[" + $1 + "]" : $0;
});
Details
\|end\| - |end| is matched
| - or
\|([^|]*)\| - | is matched, any 0+ chars other than | are captured into Group 1, and then | is matched.
If Group 1 matched ($1 ?) the replacement occurs, else, $0, the whole match, is returned back to the result.
JS test:
console.log(
"|video| |end| |water| |sun| |cloud|".replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) {
return $1 ? "[" + $1 + "]" : $0;
})
)
Scenario 2
Use
.replace(/\[(?!end])[^\]]*]\(((?:(?!\[video]\()[\s\S])*?)\)/g, '<div>$1</div>')
See the regex demo
Details
\[ - a [ char
(?!end]) - no end] allowed right after the current position
[^\]]* - 0+ chars other than ] and [
] - a ] char
\( - a ( char
((?:(?!\[video])[\s\S])*?) - Group 1 that captures any char ([\s\S]), 0 or more occurrences, but as few as possible (*?) that does not start a [video]( char sequence
\) - a ) char.

Something like this is better done in multiple steps. Also, if you're matching stuff, you should use match.
var str = '|video| |end| |water| |sun| |cloud|';
var matches = str.match(/\|.*?\|/g);
// strip pipe characters...
matches = matches.map(m=>m.slice(1,-1));
// filter out unwanted words
matches = matches.filter(m=>!['end'].includes(m));
// this allows you to add more filter words easily
// if you'll only ever need "end", just do (m=>m!='end')
console.log(matches); // ["video","water","sun","cloud"]
Notice how this is a lot easier to understand what's going on, and also much easier to maintain and change in future as needed.

You are on the right track. Here is what you need to do with regex:
var str = '|video| |end| |water| |sun| |cloud|';
console.log(str.replace(/(?!\|end\|)\|(\S*?)\|/gm, test_fun2));
function test_fun2(match, p1, offset, str_full) {
return "["+p1+"]";
}
And an explanation of what was wrong - you had your negative-lookahead placed after the | character. That means that the matching engine would do the following:
Match |video| because the pattern works with it
Grab the next |
Find that the next text is end which is in the negative lookahead and drop it.
Grab the | immediately after end
grab the space and the next | character, since this passes the negative lookahead and also works with .*?
continue grabbing the intermediate | | sequences because the | in the beginning of the word was consumed by the previous match.
So you end up matching the following things
var str = '|video| |end| |water| |sun| |cloud|';
^^^^^^^ ^^^ ^^^ ^^^
|video| ______| | | |
| | ____________________| | |
| | ____________________________| |
| | __________________________________|
All because the |end match was dropped.
You can see this if you print out the matches
var str = '|video| |end| |water| |sun| |cloud|';
str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);
function test_fun2(match, p1, offset, str_full) {
console.log(match, p1, offset);
}
You will see that the second, third, and fourth match is | | the captured item p1 is - a blank space (not very well displayed, but there) and the offset they were found were 12, 20, 26
|video| |end| |water| |sun| |cloud|
01234567890123456789012345678901234
^ ^ ^
12 _________| | |
20 _________________| |
26 _______________________|
The change I made was to instead look for explicitly the |end| pattern in a negative lookahead and also to only match non-whitespace characters, so you don't grab | | again.
Also worth noting that you can move your filtering logic to the replacement callback instead, instead of the regex. This simplifies the regex but makes your replacement more complex. Still, it's a fair tradeoff, as code is usually easier to maintain if you have more complex conditions:
var str = '|video| |end| |water| |sun| |cloud|';
//capturing word characters - an alternative to "non-whitespace"
console.log(str.replace(/\|(\w*)\|/gm, test_fun2));
function test_fun2(match, p1, offset, str_full) {
if (p1 === 'end') {
return match;
} else {
return "[" + p1 + "]"
}
}

Related

Regex to capture all vars between delimiters

How do I capture all 1, 2 and 3 in not |1|2|3|
My regex \|(.*?)\| skips 2.
const re = /\|(.*?)\|/gi;
const text = 'not |1|2|3|'
console.log(text.match(re).map(m => m[1]));
You can use
const re = /\|([^|]*)(?=\|)/g;
const text = 'not |123|2456|37890|'
console.log(Array.from(text.matchAll(re), m => m[1]));
Details:
\| - a | char
([^|]*) - Group 1: zero or more chars other than |
(?=\|) - a positive lookahead that matches a location that is immediately followed with |.
If you do not care about matching the | on the right, you can remove the lookahead.
If you also need to match till the end of string when the trailing | is missing, you can use /\|([^|]*)(?=\||$)/g.

JS - how to replace in regex for links

I have a regex that replaces the link from the moment http to the occurrence of certain characters in [], I would like to add to these characters - that is to replace the string with the occurrence of certain characters or a hard space:
"https://test.pl'".replace(/https?:\/\/[^ "'><]+/g," ")
Works fine for the characters mentioned in [], I don't know how to add here
You can use
.replace(/https?:\/\/.*?(?: |[ '"><]|$)/g," ")
See the regex demo.
Details:
https?:\/\/ - http:// or https://
.*? - any zero or more chars other than line break chars as few as possible
(?: |[ '"><]|$) - one of the following:
- char sequence
| - or
[ "'><] - a space, ", ', > or < char
| - or
$ - end of string.
JavaScript demo:
const texts = ["https://test.pl test","https://test.pl'test"];
const re = /https?:\/\/.*?(?: |[ '"><]|$)/g;
for (const s of texts) {
console.log(s, '=>', s.replace(re, " "));
}

How can I check the last character in a string?

I have these line options:
<40m:22s - ok
<40m:22m; - not ok
<40h:22s;<40m:22m - ok
<40m:22m;<40m:22m; - not ok
I need to check for semicolons. If I have one entry, then it shouldn't be. If I have several entries in a row, then the last entry should not have a semicolon.
Now I have so far only succeeded:
([<>][1-9][0-9][hms]:[1-9][0-9][hms][;?]+)(?<!;)
I will be grateful for any help, hint
You can use
^(?:[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms](?:;(?!$)|$))+$
Or, a bit more verbose since it includes a repetition of the main pattern:
^[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms](?:;[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms])*$
See the regex #1 demo and regex #2 demo.
Details:
^ - start of string
(?:[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms](?:;(?!$)|$))+ - one or more repetitions of
[<>] - a < or > char
[1-9] - a non-zero digit
[0-9]? - an optional digit (remove ? if it must be obligatory)
[hms] -h, mors`
: - a colon
[1-9][0-9]?[hms] - a non-zero digit, an optional digit and h/m/s
(?:;(?!$)|$) - a ; not at the end of string or end of string
$ - end of string.
The ^[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms](?:;[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms])*$ pattern follows the ^<MAIN>(?:<SEP><MAIN>)*$ scheme, and this pattern can be easily built dynamically using RegExp constructor.
const texts = ['<40m:22s', '<40m:22m;', '<40h:22s;<40m:22m', '<40m:22m;<40m:22m;'];
const rx = /^(?:[<>][1-9][0-9]?[hms]:[1-9][0-9]?[hms](?:;(?!$)|$))+$/;
for (let text of texts) {
console.log(text, '=>', rx.test(text));
}
The general pattern for a delimited list is
^ item (delimiter item)* $
To avoid self-repetition and make it all more or less readable, it would make sense to use variables, template strings and whitespace. This way your regexp looks like a grammar definition (what it actually is) and not as a soup of symbols.
let term = `[1-9] [0-9] [hms]`
let item = `< ${term} : ${term}`
let list = `^ ${item} ( ; ${item} )* $`
let re = new RegExp(list.replace(/\s/g, ''))
console.log(re)
test = `
<40m:22s
<40m:22m;
<40h:22s;<40m:22m
<40m:22m;<40m:22m;
`
for (t of test.trim().split('\n'))
console.log(t, re.test(t))
Lets simplify the problem to a = [<>][1-9][0-9][hms]:[1-9][0-9][hms], so the accepted strings can be
a - ok
a;a - ok
a; - not ok
a;a; - not ok
so our regex must end with a which leads to a$
now we want to accept none or multiple a with ; between each a, the regex for that is (a;)*
combining these 2 will resut in const regex = /^(a;)*a$/;
now if we replace a with [<>][1-9][0-9][hms]:[1-9][0-9][hms] the result will be const regex = /^([<>][1-9][0-9][hms]:[1-9][0-9][hms];)*[<>][1-9][0-9][hms]:[1-9][0-9][hms]$/;
demo

JS Regex Add Spaces to string

I have a two-part string and parts always delimited by space and |. Like this:
teststring | secondstring
It's possible to add predefined count of space between parts using ONLY Javascript regex.replace()?
I tried something like this:
([^\|]+)(\s){0,17}(?(R2)\s|\s)([\|a-zA-Z0-9]+)
And Substitution:
$1$2$3
It's possible to repeat capture group in substitution e.g. $2{17} or match same space multiple times?
EDIT:
I have function
function InvokeRegexp(originalString, pattern, replaceExpr)
{
return originalString.replace(pattern, replaceExpr);
}
and i want to pass two-part text, pattern containing number of spaces or replaceExpr containin number of spaces and get result: firstpart | secondpart
A non regex answer:
str.split("|").join(
" ".repeat(9 /*whitespaces*/) + "|"
)
Or with regex its probably:
str.replace(/\|/," ".repeat(9)+"|")
You could use padStart and padEnd. Because you said you want them to have a certain length.
const input = 'teststring | secondstring';
// Split the input variable and select spaces as well.
// 1. Select multiple spaces: \s+
// 2. Select pipe: \|
// 3. Select all following spaces: \s+
const parts = input.split( /\s+\|\s+/ );
// So every part should be at least 20 chars in this example.
const len = 20;
const output = `${ parts[ 0 ].padEnd( len ) }|${ parts[ 1 ].padStart( len )}`;
console.log( output );

Regex - collect characters between obligatory prefix and optional groups

I'm creating regex in JavaScript that find all groups occurrences, all optional.
I have collected optional groups (thanks for #wiktor-stribiżew) now. Missing thing is gathering characters between new- prefix and first occurred group.
Input:
new-rooms-3-area-50
new-poland-warsaw-rooms-3-area-50-bar
new-some-important-location-rooms-3-asdads-anything-area-50-uiop
new-another-location-area-50-else
Requested output:
["rooms-3", "area-50"]
["poland-warsaw", "rooms-3", "area-50"]
["some-important-location", "rooms-3", "area-50"]
["another-location", "area-50"]
I have now
new-(?:.*?(rooms-\d+))?.*?(area-\d+)
regex. I think that collecting .* between new- and rooms|area may be stupid solution.
Online demo: https://regex101.com/r/QvmYN0/5
Note: I created two separated questions, because it refers to 2 separately problems. I hope that somebody have similar problems in the future.
I think it is better to split by steps like this:
// Split by \n to work with each line
getArrays = input => input.split`\n`.map(x => {
// Split by your desired delimiters:
// -dashes which has "area" or "rooms" in front
return x.split(/-(?=area-|rooms-)/g).map(y => {
// remove the "new-" from start or anything in front the numbers
return y.replace(/^new-|\D+$/, '');
// make sure you don't have empty cases
}).filter(y => y);
});
var txt = `new-rooms-3-area-50
new-poland-warsaw-rooms-3-area-50-bar
new-some-important-location-rooms-3-asdads-anything-area-50-uiop
new-another-location-area-50-else`;
console.log(getArrays(txt));
EDIT:
The above code returns the requested output. However, I was thinking you should want an array of models instead:
// initial state of your model
getModel = () => ({
new: '',
area: 0,
rooms: 0,
});
// the function that will return the array of models:
getModels = input => input.split`\n`.map(line => {
var model = getModel();
// set delimiters:
var delimiters = new RegExp(
'-(?=(?:' + Object.keys(model).join`|` + ')-)', 'g');
// set the properties of your model:
line.split(delimiters).forEach(item => {
// remove non-digits after the last digit:
item.replace(/(\d)\D+$/, '$1')
// set each matched property:
.replace(/^([^-]+)-(.*)/,
(whole_match, key, val) => model[key] = val);
});
return model;
});
var txt = `new-rooms-3-area-50
new-poland-warsaw-rooms-3-area-50-bar
new-some-important-location-rooms-3-asdads-anything-area-50-uiop
new-another-location-area-50-else`;
console.log(getModels(txt));
This is the high-end solution which does it all at once.
Doesn't split or massage the data, just takes it as is (and always will be).
It may not be for beginners, but be for the more experienced.
(Note that I don't know JS, but I can tell you, this took about 20 minutes
googling about strings. This is just too easy, do people really get paid
to do this ?!)
This uses exec to push each element ( group 2 )
and create an array of records, one for each line.
( ^ new ) # (1)
|
( # (2 start)
(?: rooms | area )
- \d+
| (?:
(?:
(?!
(?: rooms | area )
- \d+
)
[a-z]
)+
(?:
-
(?:
(?!
(?: rooms | area )
- \d+
)
[a-z]
)+
)+
)
) # (2 end)
var strTarget = "\
new-rooms-3-area-50\n\
new-poland-warsaw-rooms-3-area-50-bar\n\
new-some-important-location-rooms-3-asdads-anything-area-50-uiop\n\
new-another-location-area-50-else\n\
";
var RxLine = /^new.+/mg;
var RxRecord = /(^new)|((?:rooms|area)-\d+|(?:(?:(?!(?:rooms|area)-\d+)[a-z])+(?:-(?:(?!(?:rooms|area)-\d+)[a-z])+)+))/g;
var records = [];
var matches
var match;
while( (match = RxLine.exec( strTarget )) ){
var line = match[0];
matches = [];
while( (match = RxRecord.exec( line )) ){
if ( match[2] )
matches.push( match[2] );
}
records.push( matches );
}
console.log( records );
Here you go:
new-(.*?)?-?(rooms-\d+|area-\d+).*?(area-\d+)?.*
Demo: https://regex101.com/r/Qvdkdx/1

Categories

Resources