RegEx match end character only when other character is not present - javascript

I'm having quite some trouble to define a regEx that I'm needing....
Basically the idea is to detect all lines that end with a , or a ; character. For this I have defined the following regex:
(,|;)$
Which works fine for this, but then I have the exception that if there's a * character within that line (not necessarily starting with, but at some position), then I don't want to detect that match. Based on this sample:
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
other,
I would intend to find 2 groups, the first one
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
And the second one
other,
I've tried many things such as non capturing groups, negative looks ahead and also start of a string with [^\s*\*] with no success. Is there a way to do this?
Some of the regEx I've tried...
^[^\*](.*?)(,|;)$
^[^\s*\*](.*?)(,|;)$

To match an optional C comment and the following line ending with ; or , you may use
/(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)?.*[;,]$/gm
See this regex demo
Details
(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)? - an optional (as there is a ? quantifier after the group) non-capturing group matching 1 or 0 occurrences of
\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/ - a C comment pattern
\r?\n - a CRLF or LF ending
.*[;,]$ - a whole line that ends with ; or , ($ is the end of a line anchor here due to m modifier).

You can use this regex:
/^[^*]*?[,;]$/gm
It will start by mathing any number of characters not being '*', then match ',' or ';' at the end of the line. It uses the global and multiline flags to match all lines.

Related

Exclude some characters in string

In quotes I try to exclude the ; characters using regex in Node.js.
For example i have an sql like below:
update codes set character='.' where character = ';' and write_date < now()-5 ;
I want to find effected rows before execute the statement. I wrote below regex but it not work correctly if there is ; character in quotes as above
const regexp = /update\s+((.|\n)[^;]*?)\s+set\s+((.|\n)[^;]*?)\s+where\s+((.|\n)[^;]*?);/ig;
regexp.exec(str)
Expected output:
table: codes
where: character = ';' and write_date < now()-5
But I get:
table: codes
where: character = ';
You can use
update\s+([^;]*?)\s+set\s(?:[^;]*?\s)?where\s+((?:'[^']*'|[^;])*?)\s*;
See the regex demo. Details:
update - a word
\s+ - one or more whitespaces
([^;]*?) - Group 1: zero or more but as few as possible chars other than ;
\s+ - one or more whitespaces
set - a word
\s - a whitespace
(?:[^;]*?\s)? - an optional sequence of any chars other than ; as few as possible, and then a whitespace
where - a word
\s+ - one or more whitespaces
((?:'[^']*'|[^;])*?) - Group 2: zero or more (as few as possible) sequences of ', zero or more non-'s, and then ', or any single char other than a ;
\s* - zero or more whitespaces
; - a ; char.
First of, I'm not sure what (.|\n) is for, so I'm ignoring that.
I believe there are two problems with your regexp, changing either will probably solve your problem, but I'd change both, just to be sure.
The ? after the * makes the * non-greedy, which means the regex will match as little as possible, so that the final ; in the regexp will match the first possible ; it finds, not the last possible. So I'd leave the ? out.
The regexp doesn't use $ to anchor to the end of string. Add $ after ; at the end (possibly \s*$ if you expect additional white space at the end of the string). If you do this, you actually don't need to exclude ;. And it may be a good idea, to add ^ (or ^\s*) at the beginning to anchor to the beginning of the string, too.
So the resulting regexp is
const regexp = /^\s*update\s+((.|\n).*)\s+set\s+((.|\n).*)\s+where\s+((.|\n).*);\s*$/ig;
Finally some conceptional ideas: Why are you doing this in the first place? Instead of starting with the UPDATE SQL, why don't you start out with the structure:
{
table: "codes",
where: "character = ';' and write_date < now()-5"
}
and build both the UPDATE and the SELECT SQLs from that?
Or if you only have the UPDATE SQL, instead of using a regular expression, there are SQL parser libraries (example) which would probably be more reliable.

JavaScript Regular Express both dotall and global flags

I have a string like this:
#a
b
#c
d
I would like to break it up into sections beginning with #:
#a
b
and
#c
d
I have attempted this with a regular expression, but I find that I can’t get it working.
I though that the following would work:
var test='#a\nb\n#c\nd';
var re=/#.*?/gs;
var match=test.match(re);
alert(match.length);
alert(match);
That is, the s modifier matches through line breaks, and the g modifier picks up multiple instances. The ? lazy quantifier should stop the * from going too far.
However, I find that when I use just s, it only goes to the end of the line.
Clearly there’s something I’m not getting about either the regular expression or the match() method.
By the way, I know that s is only a recent addition to JavaScript, but I’m working in Electron, where it is readily available.
Regex is too much for this job. Use built-in string functions.
var str = `#a
b
#c
d`;
var chunks = str.split("\n\n");
console.log(chunks);
I think that I wrested with a bear once's answer assumes that you wish to break on the basis of line breaks, and the answer by Wiktor Stribiżew is very good but it fails (at least in my opinion).
For example, if we use Wiktor's regex /#.*(?:\r?\n(?!\r?\n).*)*/g on the string
#Section 1
This is one section
And this is also part of first sections
#Section 2
This is part of section two.
Then it will ignore the line "This is also part of second section." in its match. The reason is simply because his regex breaks on the basis of double \r?\n, and hence it will just ignore the that line.
I am assuming you want to something similar to what happens in markdown where the # are used to automatically detects the sections and heading.
If that is the case, then use the following regex: /#.*(?:\r?\n(?!#).*)*/g , it's a minor modification of Wiktor's great answer. And this matches the lines as (I hope) we wanted.
What it does is that it matches the whole section, and does a negative lookahead so that it doesn't include anything beyond the next section i.e., next # symbol at the beginning of the line.
Link: https://regex101.com/r/ai15fP/2
EDIT: If the only goal is to split into sections at lines starting with # you may just use
test.split(/^(?=#)/m)
See the JS demo:
var test="#a\nb\n\n#c\nd";
console.log(test.split(/^(?=#)/m))
The .*? at the end of the pattern never matches any chars because it is skipped and end of pattern signals the match lookup is complete.
Use
s.match(/#.*(?:\r?\n(?!\r?\n).*)*/g)
See the regex demo
Details
# - a # char
.* - any 0+ chars other than line break chars
(?:\r?\n(?!\r?\n).*)* - 0 or more repetitions of
\r?\n(?!\r?\n) - an optional CR and then LF that are not followed with an optional CR and then LF
.* - any 0+ chars other than line break chars
Or, use split with /(?:\r?\n){2,}/ that matches 2 or more line break sequences.
JS demo:
var test="#a\nb\n\n#c\nd";
console.log(test.match(/#.*(?:\r?\n(?!\r?\n).*)*/g));
console.log(test.split(/(?:\r?\n){2,}/));

Regex for given pattern

I have below test case
hello how are you // 1. Allow
hello how are you // 2. Not Allow
hello < // 3. Not Allow
for the following Rules:
Allow spaces at start and end
Not allow more than one space between words
Not allow angle brackets < >
I am trying the below:
^([^<> ]+ )+[^<> ]+$|^[^<> ]+$
This is working, but when giving spaces at start or end it is not allowing.
I assume that you use your regex to find matches in the whole
text string (all 3 lines together).
I see also that both your alternatives contain starting ^ and ending $,
so you want to limit the match to a single line
and probably use m regex option.
Note that [^...]+ expression matches a sequence of characters other than
placed between brackets, but it does not exclude \n chars,
what is probably not what you want.
So, maybe you should add \n in each [^...]+ expression.
The regex extended such a way is:
^([^<> \n]+ )+[^<> \n]+$|^[^<> \n]+$
and it matches line 1 and 2.
But note that the first alternative alone (^([^<> \n]+ )+[^<> \n]+$)
also does the job.
It you realy want that line 2 should not match, please specify why.
Edit
To allow any number of spaces at the begin / end of each line,
add * after initial ^ and before final $, so that the
regex (first alternative only) becomes:
^ *([^<> \n]+ )+[^<> \n]+ *$
But it still matches line 2.
Or maybe dots in your test string are actually spaces, but you wrote
the string using dots, to show the numer of spaces?
You should have mentioned it in your question.
Edit 2
Yet another possibility, allowing dots in place of spaces:
^[ .]*((?:[^<> .\n]+[ .])+[^<> .\n]+|[^<> .\n]+)[ .]*$
Details:
^[ .]* - Start of a line + a sequence of spaces or dots, may be empty.
( - Start of the capturing group - container for both alternatives of
stripped content.
(?:[^<> .\n]+[ .])+ - Alternative 1: A sequence of "allowed" chars ("word") +
a single space or dot (before the next "word", repeated a few times.
This group does not need to be a capturing one, so I put ?:.
[^<> .\n]+ - A sequence of "allowed" chars - the last "word".
| - Or.
[^<> .\n]+ - Alternative 2: A single "word".
) - End of the capturing group.
[ .]*$ - The final (possibly empty) sequence of spaces / dots + the
end of line.
Of course, with m option.

Grabbing URL sans the last segment/file with a regular expression

Looking for a regex so that I may grab everything BUT the last segment + extension.
So for example
http://stackoverflow.com/stuff/code/apple.jpg
I need
http://stackoverflow.com/stuff/code/
I'm able to grab the last segment, but with a myriad of possible directories this images could be under, I'm unsure how to get everything sans the last segment.
Code
See regex in use here
https?:\/{2}.*\/
You may even be able to simply use .*\/ (if you don't need to ensure it starts with http or https). If that's the case, you may as well just split on the last occurrence of / - see second snippet below (it's hidden, so expand it).
var s = "http://stackoverflow.com/stuff/code/apple.jpg"
var r = /https?:\/{2}.*\//
console.log(r.exec(s))
Substring on last occurrence method:
var s = "http://stackoverflow.com/stuff/code/apple.jpg"
console.log(s.substr(0, s.lastIndexOf('/')) + '/')
Explanation
https? Match http or https (s is made optional by ?)
:\/{2} Match the colon character : literally, followed by two forward slash characters / literally
.* Match any character any number of times
\/ Match the forward slash character / literally
If you're able to manipulate the string, you can use this to remove the end piece:
var str = "http://stackoverflow.com/stuff/code/apple.jpg"
str = str.replace(/\/[^/]+$/, '/')
Regex explanation:
\/ match a slash
[^/]+$ match a string that has no / characters in it and continues until the end of the string
This will replace /apple.jpg with /
EDIT: i just noticed that you need the trailing / char and have updated my answer to meet that requirement. See # comment line in code
Here's a unix shell solution
var="http://stackoverflow.com/stuff/code/apple.jpg"
echo "${var%/*}/"
#--------------^-- we've deleted everything from that last / to end
#--------------+ just append a replacment char /
output
http://stackoverflow.com/stuff/code/
% is a parameter expansion feature that removes the smallest suffix pattern.
See POSIX definitions for Parameter Expansion for detailed info on this and similar features. (Go to the bottom of the page).
IHTH

Catching start number and final number

i'm trying to create a regex to catch the first number in the line and the last one, but i'm having some problem with the last one:
The lines look like this:
00005 SALARIO MENSAL 17030 36.397.291,92 36.397.291,92
00010 HORAS TRABALHADAS 0798 19.731,93 19.731,93
And this is my regex:
(^\d+).*(\d)
As you can see here: http://regexr.com/3crbt is not working as expected. I can get the first one, but the last is just the last number.
Thanks!
You can use
/^(\d+).*?(\d+(?:[,.]\d+)*)$/gm
See the regex demo
The regex matches:
^ - start of the line
(\d+) - captures into Group 1 one or more digits
.*? - matches any characters but a newline, as few as possible up to
(\d+(?:[,.]\d+)*) - one or more digits followed with zero or more sequences of , or . followed with one or more digits (Group 2)
$ - end of the string
The /g modifier ensures we get all matches and /m modifier makes the ^ and $ match start and end of a line respectively.
I tried the following one:
(^(\d+))|(\d+$)
And its seems to work on the regexr.com thingy. But matching them up might require some assumptions that each line has at least two numbers.
You need to make the .* non-greedy by changing it to .*? and add + to the second digit sequence match.
^(\d+).*?(\d+)$
If you want to match the full last number, use this:
^(\d+).*?([\d\.,]+)$
Example

Categories

Resources