Regex match lines that are not commented - javascript

So I have a string read from a JavaScript file, that will have:
...
require('some/path/to/file.less');
...
// require('some/path/to/file.less');
...
I'm using this reg-ex:
requireRegExp = /require(\ +)?\((\ +)?['"](.+)?['"](\ +)?\)\;?/g
To catch all those lines. However, I need to filter out the ones that are commented.
So when I run
while( match = requireRegExp.exec(str) ){
...
}
I will only get a match for the uncommented line that starts with require...

regequireRegExp = /^\s*require\('([^']+)'\);/gm
Explanation:
^ assert position at start of a line
\s* checks for a whitespace character 0 ... many times
require matches the word require
\( matches the character (
' matches the character '
([^']+)matches anything that isnt a ' 1 ... many times
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
' matches the character ' literally
\) matches the character ) literally
; matches the character ; literally
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
EDIT
Apparently you wanted to get the path in a group so I edited my answer to better respond to your question.
Here is the example:
https://regex101.com/r/kQ0lY8/3

If you want to match only those with require use the following:
/^\s*require.*/gm
See DEMO

Related

Match everything Between two Characters except when there is a Blank line

I am trying to find a regex pattern that matches everything between one or two dollar signs, \$.*\$|\${2}.*\${2}, except when there is a blank line (it's either two or one, can't be this: \$.*\$\$). Below, I provide examples of what I want to match and what I want to skip. The match should include/exclude everything.
Examples of what I want to match:
$$ \abc + ko$$
$*-ls$
Here the single dollar sign has a escape character before it so it won't break the match.
$$
654a\$
$$
$123
a*/\
[]{}$
Examples of what I want to exclude:
$$
asd
$$
$asdasd$$
Again, I want to match everything if they are bound by one $ or two $ at each side, unless there is (are) empty line(s) in between.
So far I figured out how to match the ones occurring in a single line, but I am struggling how to include break-line and exclude them if the whole line is empty.
Here is what I have:
^\${2}.*[^\\$]\${2}$|^\$.*[^\\$]\$$
Demo
You may use
/^[^\S\r\n]{0,3}(\${1,2})(?:(?!\1|^$)[\s\S])+?\1[^\S\r\n]*$/gm
See the regex demo
Details
^ - start of a line (since m makes ^ match line start positions)
[^\S\r\n]{0,3} - zero to three occurrences of any whitespace but CR and LF
(\${1,2}) - Group 1 holding one or two $ chars
(?:(?!\1|^$)[\s\S])+? - any char ([\s\S]), 1 or more occurrences, but as few as possible (due to the lazy +? quantifier), that does not start the same sequence as captured in Group 1 (\1) and a position between two line break chars (^$)
\1 - the same value as in Group 1 ($ or $$)
[^\S\r\n]* - zero or more occurrences of any whitespace but CR and LF
$ - end of a line (since m makes ^ match line start positions)
For your example data, you might use
(?<!\S)(\$\$?+)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*(?:\r?\n(?![^\S\r\n]*$)[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)*)*\1(?!\S)
Explanation
(?<!\S) Assert a whitespace boundary on the left
(\$\$?+) Capture group 1, match $ or $$ where the second one is possessive (prevent backtracking)
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Match any char except $ or newline or a $ when not directly followed by another $
(?: Non capture group
\r?\n(?![^\S\r\n]*$) Match a newline, assert not a line consisting of only spaces
[^\r\n$]*(?:\$(?!\$)[^\r\n$]*)* Same pattern as above
)* Close the group and repeat 0+ times
\1 Backreference to what is captured in group 1
(?!\S) Assert a whitespace boundary on the right
Regex demo

Replace all spaces except the first & last ones

I want to replace all whitespaces except the first and the last one as this image.
How to replace only the red ones?
how to replace the red space
I tried :
.replace(/\s/g, "_");
but it captures all spaces.
I suggest match and capture the initial/trailing whitespaces that will be kept and then matching any other whitespace that will be replaced with _:
var s = " One Two There ";
console.log(
s.replace(/(^\s+|\s+$)|\s/g, function($0,$1) {
return $1 ? $1 : '_';
})
);
Here,
(^\s+|\s+$) - Group 1: either one or more whitespaces at the start or end of the string
| - or
\s - any other whitespace.
The $0 in the callback method represents the whole match, and $1 is the argument holding the contents of Group 1. Once $1 matches, we return its contents, else, replace with _.
You can use ^ to check for first character and $ for last, in other words, search for space that is either preceded by something other than start of line, or followed by something thing other than end of line:
var rgx = /(?!^)(\s)(?!$)/g;
// (?!^) => not start of line
// (?!$) => not end of line
console.log(' One Two Three '.replace(rgx, "_"));

Why does string.replace(/\W*/g,'_') prepend all characters?

I've been learning regexp in js an encountered a situation that I didn't understand.
I ran a test of the replace function with the following regexp:
/\W*/g
And expected it prepend the beginning of the string and proceed to replace all non-word characters.
The Number is (123)(234)
would become:
_The_Number_is__123___234_
This would be prepending the string because it has at least zero instances, and then replacing all non-breaking spaces and non-word characters.
Instead, it prepended every character and replaced all non-word characters.
_T_h_e__N_u_m_b_e_r__i_s__1_2_3__2_3_4__
Why did it do this?
The problem is the meaning of \W*. It means "0 or more non-word characters". This means that the empty string "" would match, given that it is indeed 0 non-word characters.
So the regex matches before every character in the string and at the end, hence why all the replacements are done.
You want either /\W/g (replacing each individual non-word character) or /\W+/g (replacing each set of consecutive non-word characters).
"The Number is (123)(234)".replace(/\W/g, '_') // "The_Number_is__123__234_"
"The Number is (123)(234)".replace(/\W+/g, '_') // "The_Number_is_123_234_"
TL;DR
Never use a pattern that can match an empty string in a regex replace method if your aim is to replace and not insert text
To replace all separate occurrences of a non-word char in a string, use .replace(/\W/g, '_') (that is, remove * quantifier that matches zero or more occurrences of the quantified subpattern)
To replace all chunks of non-word chars in a string with a single pattern, use .replace(/\W+/g, '_') (that is, replace * quantifier with + that matches one or more occurrences of the quantified subpattern)
Note: the solution below is tailored for the OP much more specific requirements.
A string is parsed by the JS regex engine as a sequence of chars and locations in between them. See the following diagram where I marked locations with hyphens:
-T-h-e- -N-u-m-b-e-r- -i-s- -(-1-2-3-)-(-2-3-4-)-
||| |
||Location between T and h, etc. ............. |
|1st symbol |
start -> end
All these positions can be analyzed and matched with a regex.
Since /\W*/g is a regex matching all non-overlapping occurrences (due to g modifier) of 0 and more (due to * quantifier) non-word chars, all the positions before word chars are matched. Between T and h, there is a location tested with the regex, and as there is no non-word char (h is a word char), the empty match is returned (as \W* can match an empty string).
So, you need to replace the start of string and each non-word char with a _. Naive approach is to use .replace(/\W|^/g, '_'). However, there is a caveat: if a string starts with a non-word character, no _ will get appended at the start of the string:
console.log("Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/\W|^/g, '_')); // _Hi_there_
Note that here, \W comes first in the alternation and "wins" when matching at the beginning of the string: the space is matched and then no start position is found at the next match iteration.
You may now think you can match with /^|\W/g. Look here:
console.log("Hi there.".replace(/^|\W/g, '_')); // _Hi_there_
console.log(" Hi there.".replace(/^|\W/g, '_')); // _ Hi_there_
The _ Hi_there_ second result shows how JS regex engine handles zero-width matches during a replace operation: once a zero-width match (here, it is the position at the start of the string) is found, the replacement occurs, and the RegExp.lastIndex property is incremented, thus proceeding to the position after the first character! That is why the first space is preserved, and no longer matched with \W.
A solution is to use a consuming pattern that will not allow zero-width matches:
console.log("Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
console.log(" Hi there.".replace(/^(\W?)|\W/g, function($0,$1) { return $1 ? "__" : "_"; }));
You can use RegExp /(^\W*){1}|\W(?!=\w)/g to match one \W at beginning of string or \W not followed by \w
var str = "The Number is (123)(234)";
var res = str.replace(/(^\W*){1}|\W(?!=\w)/g, "_");
console.log(res);
You should have used /\W+/g instead.
"*" means all characters by itself.
It's because you're using the * operator. That matches zero or more characters. So between every character matches. If you replace the expression with /\W+/g it works as you expected.
This should work for you
Find: (?=.)(?:^\W|\W$|\W|^|(.)$)
Replace: $1_
Cases explained:
(?= . ) # Must be at least 1 char
(?: # Ordered Cases:
^ \W # BOS + non-word (consumes bos)
| \W $ # Non-word + EOS (consumes eos)
| \W # Non-word
| ^ # BOS
| ( . ) # (1), Any char + EOS
$
)
Note this could have been done without the lookahead via
(?:^\W|\W$|\W|^$)
But, this will insert a single _ on an empty string.
So, it ends up being more elaborate.
All in all though, it's a simple replacement.
Unlike Stribnez's solution, no callback logic is required
on the replace side.

regular expression, not reading entire string

I have a standard expression that is not working correctly.
This expression is supposed to catch if a string has invalid characters anywhere in the string. It works perfect on RegExr.com but not in my tests.
The exp is: /[a-zA-Z0-9'.\-]/g
It is failing on : ####
but passing with : aa####
It should fail both times, what am I doing wrong?
Also, /^[a-zA-Z0-9'.\-]$/g matches nothing...
//All Boxs
$('input[type="text"]').each(function () {
var text = $(this).prop("value")
var textTest = /[a-zA-Z0-9'.\-]/g.test(text)
if (!textTest && text != "") {
allFieldsValid = false
$(this).css("background-color", "rgba(224, 0, 0, 0.29)")
alert("Invalid characters found in " + text + " \n\n Valid characters are:\n A-Z a-z 0-9 ' . -")
}
else {
$(this).css("background-color", "#FFFFFF")
$(this).prop("value", text)
}
});
edit:added code
UPDATE AFTER QUESTION RE-TAGGING
You need to use
var textTest = /^[a-zA-Z0-9'.-]+$/.test(text)
^^
Note the absence of /g modifier and the + quantifier. There are known issues when you use /g global modifier within a regex used in RegExp#test() function.
You may shorten it a bit with the help of the /i case insensitive modifier:
var textTest = /^[A-Z0-9'.-]+$/i.test(text)
Also, as I mention below, you do not have to escape the - at the end of the character class [...], but it is advisable to keep escaped if the pattern will be modified later by less regex-savvy developers.
ORIGINAL C#-RELATED DETAILS
Ok, say, you are using Regex.IsMatch(str, #"[a-zA-Z0-9'.-]"). The Regex.IsMatch searches for partial matches inside a string. So, if the input string contains an ASCII letter, digit, ', . or -, this will pass. Thus, it is logical that aa#### passes this test, and #### does not.
If you use the second one as Regex.IsMatch(str, #"^[a-zA-Z0-9'.-]$"), only 1 character strings (with an optional newline at the end) would get matched as ^ matches at the start of the string, [a-zA-Z0-9'.-] matches 1 character from the specified ranges/sets, and $ matches the end of the string (or right before the final newline).
So, you need a quantifier (+ to match 1 or more, or * to match zero or more occurrences) and the anchors \A and \z:
Regex.IsMatch(str, #"\A[a-zA-Z0-9'.-]+\z")
^^ ^^^
\A matches the start of string (always) and \z matches the very end of the string in .NET. The [a-zA-Z0-9'.-]+ will match 1+ characters that are either ASCII letters, digits, ', . or -.
Note that - at the end of the character class does not have to be escaped (but you may keep the \- if some other developers will have to modify the pattern later).
And please be careful where you test your regexps. Regexr only supports JavaScript regex syntax. To test .NET regexps, use RegexStorm.net or RegexHero.
/^[a-zA-Z0-9'.-]+$/g
In the second case your (/[a-zA-Z0-9'.-]/g) was working because it matched on the first letter, so to make it correct you need to match the whole string (use ^ and $) and also allow more letters by adding a + or * (if you allow empty string).
Try this regex it matches any char which isn't part of the allowed charset
/[^a-zA-Z0-9'.\-]+/g
Test
>>regex = /[^a-zA-Z0-9'.\-]+/g
/[^a-zA-Z0-9'.\-]+/g
>>regex.test( "####dsfdfjsakldfj")
true
>>regex.test( "dsfdfjsakldfj")
false

Match character but not when preceded by

I want to replace all line breaks but only if they're not preceded by these two characters {] (both, not one of them) using JavaScript. The following expression seems to do the job but it breaks other regex results so something must be wrong:
/[^\{\]]\n/g
What am I doing wrong?
Do you need to be able to strip out \n, \r\n, or both?
This should do the job:
/(^|^.|[^{].|.[^\]])\r?\n/gm
And would require that you place $1 at the beginning of your replacement string.
To answer your question about why /[^\{\]]\n/ is wrong, this regex equates to: "match any character that is neither { nor ]", followed by \n, so this incorrectly fail to match the following:
here's a square]\n
see the following{\n
You're also missing the g flag at the end, but you may have noticed that.
When you're using [^\{\]] you're using a character range: this stands for "any character which is not \{ or \]. Meaning the match will fail on {\n or }\n.
If you want to negate a pattern longer than one character you need a negative look-ahead:
/^(?!.*{]\n)([^\n]*)\n/mg
^(?! # from the beginning of the line (thanks to the m flag)
.*{]\n # negative lookahead condition: the line doesn't end with {]\n
)
([^\n]*) # select, in capturing group 1, everything up to the line break
\n
And replace it with
$1 + replacement_for_\n
What we do is check line by line that our line doesn't hold the unwanted pattern.
If it doesn't, we select everything up to the ending \n in capturing group 1, and we replace the whole line with that everything, followed by what you want to replace \n with.
Demo: http://regex101.com/r/nM2xE1
Look behind is not supported, you could emulate it this way
stringWhereToReplaceNewlines.replace(/(.{0,2})\n/g, function(_, behind) {
return (behind || "") + ((behind === '{]') ? "\n" : "NEWLINE_REPLACE")
})
The callback is called for every "\n" with the 2 preceding characters as the second parameter. The callback must return the string replacing the "\n" and the 2 characters before. If the 2 preceding characters are "{]" then the new line should not be replaced so we return exactly the same string matched, otherwise we return the 2 preceding characters (possibly empty) and the thing that should replace the newline
My solution would be this:
([^{].|.[^\]])\n
Your replacement string should be $1<replacement>
Since JavaScript doesn't support lookbehind, we have to make do with lookahead. This is how the regex works:
Anything but { and then anything - [^{].
Or anything and then anything but ] - .[^\]]
Put simply, [^{].|.[^\]] matches everything that .. matches except for {]
Finally a \n
The two chars before the \n are captured, so you can reinsert them into the replacement string using $1.

Categories

Resources