Regex match first character once, followed by repetitive matching until end - javascript

I'm trying to match characters that shouldn't be allowed in a username string to then be replaced.
Anything outside this range should match first character [a-zA-Z] <-- restricting the first character is causing problems and I don't know how to fix it
And then match everything else outside this range [0-9a-zA-Z_.] <---- repeat until the end of the string
Matches:
/////hey/// <-- first match /////, second match ///
[][123Bc_.// <-- first match [][, second match //
(/abc <-- should match (/
a2__./) <-- should match /)
Non Matches:
a_____
b__...
Current regex
/^([^a-zA-Z])([^\w.])*/
const regex = /^([^a-zA-Z])([^0-9a-zA-Z_.])*/;
'(/abc'.replace(regex, '') // => return expected abc
'/////hey///'.replace(regex, '') // => return expected "hey"

/^([^a-zA-Z])([^\w.])*/
You can not do it this way, with negated character classes and the pattern anchored at the start. For example for your va2__./), this of course won’t match - because the first character is not in the disallowed range, so the whole expression doesn’t match.
Your allowed characters for the first position are a subset, of what you want to allow for “the rest” - so do that second part first, replace everything that does not match [0-9a-zA-Z_.] with an empty string, without anchoring the pattern at the beginning or end.
And then, in the result of that operation, replace any characters not matching [a-zA-Z] from the start. (So that second pattern does get anchored at the beginning, and you’ll want to use + as quantifier - because when you remove the first invalid character, the next one becomes the new first, and that one might still be invalid.)

Related

Regex for testing if commas are missing from a string

I'm trying to check that a list of items is entered properly and includes a comma between each entry. In this list there can only be a single word and after every word there must be a comma.
I'm attempting to use a lookbehind to assert that there is a comma before every space, but it seems to only work for the first occurrence of the character. How can I look through the entire string?
const nameStringList = "Fozzie, Gonzo, Kermit Animal "
const isValid = /\s+/.test(nameStringList) && !(/(?<=,)\s.*/.test(nameStringList))
console.log(isValid);
/^(\S+(,\s|$))+$/
Explanation:
Match one or more non-whitespace characters followed by either a comma and a whitespace character or the end of the message. This should be repeated at least once but can be repeated more times. This should match from the start to the end of the message, so if part of the string doesn't match then it won't work.

JS Regex - Match until the end of line OR a character

Here is an example of what I'm trying to match:
Match everything after this:::
one
two
three
Match this also:::
one
two
three
___
but not this
My code:
const thing = /[^:]:::\n([\s\S]*)(_{3}|$)/gm
I want it to match everything AFTER ':::', but end either when it sees ___, or if that is not there, then the end of the text input, $.
It works for the FIRST example, but it continues to match the text after the ___ in the second example.
Any ideas how to make this right?
I'm only interested in the results in the first grouping. I had to group the (_{3}|$) otherwise it creates an infinite loop.
The pattern [^:]:::\n([\s\S]*)(_{3}|$) that you tried matches too much because [\s\S]* will match all the way to the end. Then when at the end of string, there is an alternation (_{3}|$) matches either 3 times an underscore or the end of the string.
Then pattern can settle matching the end of the string.
You could use a capture group, and match all following lines that do not start with ___
[^:](:::(?:\n(?!___).*)*)
[^:] Match any char except :
( Capture group 1
::: Match literally
(?:\n(?!___).*)* Match all consecutive lines that does not start with ___
) Close group 1
Regex demo
Or with a negative lookbehind if supported to get a match only, asserting not : to the left
(?<!:):::(?:\n(?!___).*)*
Regex demo

How to match one 'x' but not one or both of xs in 'xx' globally in string [duplicate]

Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.

JavaScript Regex start of string clarification + str.replace()

got a question about the start of string regex anchor tag ^.
I was trying to sanitize a string to check if it's a palindrome and found a solution to use regex but couldn't wrap my head around the explanations I found for the start of string anchor tag:
To my understanding:
^ denotes that whatever expression that follows must match, starting from the beginning of the string.
Question:
Why then is there a difference between the two output below:
1)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/[^a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: A*man**a*plan**a*canal**Panama
VS.
2)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/[a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: * ***, * ****, * *****: ******
VS.
3)
let x = 'A man, a plan, a canal: Panama';
const re = new RegExp(/^[a-z]/, 'gi');
console.log(x.replace(re, '*'));
Output: * man, a plan, a canal: Panama
Please let me know if my explanation for each of the case above is off:
1) Confused about this one. If it matches a character class of [a-z] case insensitive + global find, with start of string anchor ^ denoting that it must match at the start of each string, should it not return all the words in the sentence? Since each word is a match of [a-z] insensitive characters that occurs at the start of each string per global find iteration?
(i.e.
finds "A" at the start
then on the next iteration, it should start search on the remaining string " man"
finds a space...and moves on to search "man"?
and so on and so forth...
Q: Why does it then when I call replace does it only targets the non alpha stuff? Should I in this case be treating ^ as inverting [a-z]?
2) This seems pretty straight forward, finds all occurrence of [a-z]and replaces them with the start. Inverse case of 1)??
3) Also confused about this one. I'm not sure how this is different from 1).
/^[a-z]/gi to me means: "starting at the start of the string being looked at, match all alpha characters, case insensitive. Repeat for global find".
Compared to:
1) /[^a-z]/gi to me means: "match all character class that starts each line with alpha character. case insensitive, repeat search for global find."
To mean they mean exactly the same #_#. Please let me know how my understanding is off for the above cases.
Your first expression [^a-z] matches anything other than an alphabetic, lower case letter, therefore that's why when you replace with * all the special characters such as whitespace, commas and colons are replaced.
Your second expression [a-z] matches any alphabetic, lower case letter, therefore the special characters mentioned are not replaced by *.
Your third expression ^[a-z] matches a alphabetic, lower case letter at the start of the string, therefore only the first letter is replaced by *.
For the first two expressions, the global flag g ensures that all characters that match the specified pattern, regardless of their position in the string, are replaced. For the third pattern however, since ^ anchors the pattern at the beginning of the string, only the first letter is replaced.
As you mentioned, the i flag ensures case insensitivity, so that all three patterns operate on both lower and upper case alphabetic letters, from a to z and A to Z.
The character ^ therefore has two meanings:
It negates characters in a character set.
It asserts position at the start of string.
^ denotes that whatever expression that follows must match, starting from the beginning of the string.
That's only when it's the first thing in the regex; it has other purposes when used elsewhere:
/[^a-z]/gi
In the above regex, the ^ does not indicate anchoring the match to the beginning of a string; it inverts the rest of the contents of the [] -- so the above regex will match any single character except a-z. Since you're using the g flag it will repeat that match for all characters in the string.
/[a-z]/gi
The above is not inverted, so will match a single instance of any character from a-z (and again because of the g flag will repeat to match all of those instances.)
/^[a-z]/gi
In this last example, the caret anchors the match to the beginning of the string; the bracketed portion will match any single a-z character. The g flag is still in use, so the regex would try to continue matching more characters later in the string -- but none of them except the first one will will meet the anchored-to-start requirement, so this will end up matching only the first character (if it's within a-z), exactly as if the g flag was not in use.
(When used anywhere in a regex other than the start of the regex or the start of a [] group, the ^ will be treated as a literal ^.)
If you're trying to detect palindromes, you'll want to remove everything except letter characters (and will probably want to convert everything to the same letter case, instead of having to detect that "P" == "p":)
const isPalindrome = function(input) {
let str = input.toLowerCase().replace(/[^a-z]/g,'');
return str === str.split('').reverse().join('')
}
console.log(isPalindrome("Able was I, ere I saw Elba!"))
console.log(isPalindrome("No, it never propagates if I set a ”gap“ or prevention."))
console.log(isPalindrome("Are we not pure? “No, sir!” Panama’s moody Noriega brags. “It is garbage!” Irony dooms a man –– a prisoner up to new era."))
console.log(isPalindrome("Taco dog is not a palindrome."))

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $
You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

Categories

Resources