Match character but not when preceded by - javascript

I want to replace all line breaks but only if they're not preceded by these two characters {] (both, not one of them) using JavaScript. The following expression seems to do the job but it breaks other regex results so something must be wrong:
/[^\{\]]\n/g
What am I doing wrong?

Do you need to be able to strip out \n, \r\n, or both?
This should do the job:
/(^|^.|[^{].|.[^\]])\r?\n/gm
And would require that you place $1 at the beginning of your replacement string.
To answer your question about why /[^\{\]]\n/ is wrong, this regex equates to: "match any character that is neither { nor ]", followed by \n, so this incorrectly fail to match the following:
here's a square]\n
see the following{\n
You're also missing the g flag at the end, but you may have noticed that.

When you're using [^\{\]] you're using a character range: this stands for "any character which is not \{ or \]. Meaning the match will fail on {\n or }\n.
If you want to negate a pattern longer than one character you need a negative look-ahead:
/^(?!.*{]\n)([^\n]*)\n/mg
^(?! # from the beginning of the line (thanks to the m flag)
.*{]\n # negative lookahead condition: the line doesn't end with {]\n
)
([^\n]*) # select, in capturing group 1, everything up to the line break
\n
And replace it with
$1 + replacement_for_\n
What we do is check line by line that our line doesn't hold the unwanted pattern.
If it doesn't, we select everything up to the ending \n in capturing group 1, and we replace the whole line with that everything, followed by what you want to replace \n with.
Demo: http://regex101.com/r/nM2xE1

Look behind is not supported, you could emulate it this way
stringWhereToReplaceNewlines.replace(/(.{0,2})\n/g, function(_, behind) {
return (behind || "") + ((behind === '{]') ? "\n" : "NEWLINE_REPLACE")
})
The callback is called for every "\n" with the 2 preceding characters as the second parameter. The callback must return the string replacing the "\n" and the 2 characters before. If the 2 preceding characters are "{]" then the new line should not be replaced so we return exactly the same string matched, otherwise we return the 2 preceding characters (possibly empty) and the thing that should replace the newline

My solution would be this:
([^{].|.[^\]])\n
Your replacement string should be $1<replacement>
Since JavaScript doesn't support lookbehind, we have to make do with lookahead. This is how the regex works:
Anything but { and then anything - [^{].
Or anything and then anything but ] - .[^\]]
Put simply, [^{].|.[^\]] matches everything that .. matches except for {]
Finally a \n
The two chars before the \n are captured, so you can reinsert them into the replacement string using $1.

Related

regular expression, not reading entire string

I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false

Match last character, get next to last if regex is null

I'm trying to get the last character of a string, but only if it matches the following RegEx:
/\W/
If it doesn't match, I want it to move to the next last character and do the test again until it finds a match.
function getLastChar(s) {
var l = s.length - 1;
return s[l - i]; // need logic to keep checking for /\W/
}
getLastChar('hello.'); // returns '.', want it to return 'o'
I have the following idea of how to match if the character isn't a letter/number; however, I'm searching for a more elegant solution, one that would allow me to return the last matching character on a single line with a ternary if()
if(string.match(/\W/) !== null){
//keep looking for a match, going backwards.
}
/(\w)\W*$/
Capture one \w character, that is followed by zero or more \W characters, anchored to the end of the subject.
[Edited after comments.]
Easy enough.. just do a greedy match up to the last \W
string.match(/.*(\W)/)
If you're looking for a simple answer, you might be able to accomplish it with a single regex, no looping required - something like the following:
^.*(\W)[^\W]*$
The capture group will have the last non-word character.
For example, running this regex on ~~~~99*9 puts the character * in the capture group.
Edit:
However, after re-reading your question, it seems like you really meant to use \w not \W - in other words, you want the last word character, not the last non-word character. That's easily fixed by swapping \W for \w in the regex above.

With a JS Regex matching exact word but not hypenated words starting with said word

I could not find a match to this question.
I have a string like so
var s="one two one-two one-three one one_four"
and my function is as follows
function replaceMatches( str, word )
{
var pattern=new RegExp( '\\b('+word+')\\b','g' )
return str.replace( pattern, '' )
}
the problem is if I run the function like
var problem=replaceMatches( s,'one' )
it
returns two -two -three one_four"
the function replaces every "one" like it should but treats words with a hyphen as
two words replacing the "one" before the hyphen.
My question is not about the function but about the regex. What literal regex will match
only the words "one" in my string and not "one-two" or "one-\w"<--you know what I mean lol
basically
var pat=/\b(one)\b/g
"one one-two one".replace( pat, '')
I want the above ^ to return
" one-two "
only replace the exact match "one" and not the one in "one-two"
the "one" on the end is important to, the regex must work if the match is at the very end
Thank you, sorry if my question is relatively confusing. I am just trying to get my learn on, and expand my personal library.
What do you considered to be a word?
A word is a sequence of 1 or more word characters, and word boundary \b is defined based upon the definition of word character (and non-word character).
Word character as defined by \w in JavaScript RegExp is shorthand for character class [a-zA-Z0-9_].
What is your definition of a "word"? Let's say your definition is [a-zA-Z0-9_-].
Emulating word boundary
This post describes how to emulate a word boundary in languages that support look-behind and look-ahead. Too bad, JS doesn't support look-behind.
Let us assume the word to be replaced is one for simplicity.
We can limit the replacement with the following code:
inputString.replace(/([^a-zA-Z0-9_-]|^)one(?![a-zA-Z0-9_-])/g, "$1")
Note: I use the expanded form [a-zA-Z0-9_-] instead of [\w-] to avoid association with \w.
Break down the regex:
(
[^a-zA-Z0-9_-] # Negated character class of "word" character
| # OR
^ # Beginning of string
)
one # Keyword
(?! # Negative look-ahead
[a-zA-Z0-9_-] # Word character
)
I emulate the negative look-behind (which is (?<![a-zA-Z0-9_-]) if supported) by matching a character from negated character class of "word" character and ^ beginning of string. This is natural, since if we can't find a "word" character, then it must be either a non-"word" character or beginning of the string. Everything is wrapped in a capturing group so that it can be replaced back later.
Since one is only replace if there is no "word" character before or after, there is no risk of missing a match.
Putting together
Since you are removing "word"s, you must make sure your keyword contains only "word" characters.
function replaceMatches(str, keyword)
{
// The keyword must not contain non-"word" characters
if (!/^[a-zA-Z0-9_-]+$/.test(keyword)) {
throw "not a word";
}
// Customize [a-zA-Z0-9_-] and [^a-zA-Z0-9_-] with your definition of
// "word" character
var pattern = new RegExp('([^a-zA-Z0-9_-]|^)' + keyword + '(?![a-zA-Z0-9_-])', 'g')
return str.replace(pattern, '$1')
}
You need to escape meta-characters in the keyword if your definition of "word" character includes regex meta-characters.
Use this for your RegExp:
function replaceMatches( str, word ) {
var pattern = new RegExp('(^|[^-])\\b('+word+')\\b([^-]|$)', 'g');
return str.replace(pattern, '$1$3')
}
The (^|[^-]) will match either the start of the string or any character except -. The ([^-]|$) will match either a character other than - or the end of the string.
I'm not a JS pattern function expert but the function should replace all.
As for the hyphen in 'one-two' between one and - is a word boundry (ie. \b) and the
end of string is a word boundry if a \w character is there before it.
But, it sounds like you may want 'one' to be preceeded with a space or BOL.
([ ]|^)one\b in that case you want to make the replacement capture group 1, thus strippking out 'one' only.
And, I'm not sure how that function call works in JS.
Edit: after new expected output, the regex could be -
([ ]|^)one(?=[ ]|$)

Regex get all text from # to quotation

Okay so I currently have:
/(#([\"]))/g;
I want to be able to check for a string like:
#23ad23"
Whats wrong with my regex?
Your regex (/(#([\"]))/g) breaks down like this:
without start/end delimiters/flags and capturing braces..
#[\"]
which just means #, followed by ", but the square brackets for the class are unnecessary, as there is only one item, so equivalent to...
#"
I think you want to match all characters between # and " inclusive (and captured exclusively).
Start with regex like this:
#.+?"
Which means # followed by anything (.) one or more times (+) un-greedily (?) followed by "
so with the capturing brackets, and delimeters...
/(#(.+?)")/g
Is this how you mean?
/(#([^\"]+))/g;
This will include everything until it reaches the " char.
For minimum match count (bigger-length matches): #(.+)\"
For maximum match count (smaller-length matches): #(.+?)\"

How can I shorten this regex for JavaScript?

Basically I just want it to match anything inside (). I tried the . and * but they don't seem to work. Right now my regex looks like:
\(([\\\[\]\-\d\w\s/*\.])+\)
The strings it's going to match are URL routes like:
#!/foo/bar/([a-z])/([\d\w])/(*)
In this example, my regex above matches:
([a-z])
([\d\w])
(*)
BONUS:
How can I make it so that it only matches when it starts with a ( and ends with a ). I thought I used the ^ at the front where it's \( and the $ and the end where it's \) but no luck.
Disregard this bonus. I didnt realize it didnt matter...
Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren:
\(([^)]*)\)
Basically I just want it to match anything inside ().
BONUS: How can I make it so that it only matches when it starts with a ( and ends with a )?
Easy peasy.
var re1 = /^\(.*\)$/
// or
var re2 = new RegExp('^\\(.*\\)$');
Edit
Re: #Mike Samuel's comments
Does not match newlines between the parentheses which were explicitly matched by \s in the original.
...
Maybe you should use [\s\S] instead of .
...
If you're going to exclude newlines you should do so intentionally or explicitly.
Note that . matches any single character except the newline character. If you also want to match newlines as part of the "anything" between parentheses, use the [\s\S] character class:
var re3 = /^\([\s\S]*\)$/
// or
var re4 = new RegExp('^\\([\\s\\S]*\\)$');
To negate a match, you use the [^...] construct. Thus, to match anything within parentheses, you would use:
\([^)]+\)
which says "match any string that starts with an open parenthesis, contains any number of characters that are not closing parentheses and ends with a closing parenthesis.
To match entire lines that match the above construct, just wrap it with ^ and $:
^\([^)]+\)$
I'm not completely sure I understand what you're doing, but try this:
var re = /\/(\([^()]+\)(?=\/|$)/;
Matching the leading slash in addition to the opening paren ensures that the paren is indeed at the beginning. You can't do the same thing at the end because you don't know there will be a trailing slash. And if there is one, you don't want to consume it because it's also the leading slash for the next match attempt.
Instead, you use the lookahead - (?=\/|$) - to match the trailing slash without consuming it. If there is no slash, I assume no other character should be present either--hence the anchor: $.
#patorjk brought up a good point, though: can there be more parentheses between the outermost pair? If there are, the problem is much more complicated. I won't bother trying to expand my regex to deal with nested parens; some regex flavors can handle such things, but not JavaScript. Instead I'll recommend this sloppier regex:
\/(\([\s\S]+?\))(?=\/|$)
I say "sloppy" because it relies on the assumption that the sequences /( and )/ will never appear inside a valid match. As with my first regex, the text that you're interested in (i.e., everything but the leading and trailing slashes) will be captured in group #1.
Notice the non-greedy quantifier, too. With a regular greedy quantifier it will match everything from the first ( to the last ) in one shot. In other words, it'll match ([a-z])/([\d\w])/(*) instead of ([a-z]), ([\d\w]) and (*) as you wanted.

Categories

Resources