Regex to match string from the back - javascript

Let's say we have a string "text\t1\nText that has to be extracted" what regex can be used so that we check the string from the back that is from the last " to n because the start of the string can change. In this case, I need to get only Text that has to be extracted. What generic regex can we use here?
I used this (?<=\\n1\\n)(.*)(?=“) but this will not work if the pattern before n changes to n2 or ntext.
Any help is appreciated.

You may use this regex:
/(?<=\\n)[^"\\]+(?="$)/
RegEx Demo
RegEx Details:
(?<=\\n): Lookbehind to make sure we have a \n before the current position
[^"\\]+: Match 1+ of any character that is not " and not \
(?="$): Make sure we have a " before line end ahead

Can't you just split and take the last element?
var item = "text\n1\nText that has to be extracted";
var last = item.split(/\n/g).reverse()[0];
console.log(last) // "Text that has to be extracted"

/^(\d+)\n([^\n"]+)"$/ may have some edge cases, but will find the number (one or more digits), followed by a newline, followed by any character that is neither newline nor a double quote, followed by a literal double quote.
This would require that the double quote occurs immediately before the end-of-line (EOL), but if that's not required (for example, if you have a semi-colon after the closing quote), remove $ from the end.
Edit
Just noticed that it's the literal text \n and not a newline character.
/(?<=\\n)(\d+)\\n((?:[^\\"]+|\\.)*)"/
Regex101 example
Breakdown:
(?<=\\n) looks for a \ followed by the letter n.
(\d+) captures the 1-or-more digits.
\\n matches a literal \ followed by the letter n.
(...*) matches some text that repeats 0 or more times.
(?:...|...) matches any character that are neither a literal \ character nor a double quote character... OR a literal \ character that is followed by "anything" so you can have \n or \" etc. The entire group is matched repeatedly.
" at the end ensures that you're inside (well, we hope) a double-quoted string on the same line.

Related

RegExp avoid double space and space before characters

I'm trying to write a regular expression in order to not allow double spaces anywhere in a string, and also force a single space before a MO or GO mandatory, with no space allowed at the beginning and at the end of the string.
Example 1 : It is 40 GO right
Example 2 : It is 40GO wrong
Example 3 : It is 40 GO wrong
Here's what I've done so far ^[^ ][a-zA-Z0-9 ,()]*[^;'][^ ]$, which prevents spaces at the beginning and at the end, and also the ";" character. This one works like a charm.
My issue is not allowing double spaces anywhere in the string, and also forcing spaces right before MO or GO characters.
After a few hours of research, I've tried these (starting from the previous RegExp I wrote):
To prevent the double spaces: ^[^ ][a-zA-Z0-9 ,()]*((?!.* {2}).+)[^;'][^ ]$
To force a single space before MO: ^[^ ][a-zA-Z0-9 ,()]*(?=\sMO)*[^;'][^ ]$
But neither of the last two actually work. I'd be thankful to anyone that helps me figure this out
The lookahead (?!.* {2} can be omitted, and instead start the match with a non whitespace character and end the match with a non whitespace character and use a single space in an optionally repeated group.
If the string can not contain a ' or ; then using [^;'][^ ]$ means that the second last character should not be any of those characters.
But you can omit that part, as the character class [a-zA-Z0-9,()] does not match ; and '
Note that using a character class like [^ ] and [^;'] actually expect a single character, making the pattern that you tried having a minimum length.
Instead, you can rule out the presence of GO or MO preceded by a non whitespace character.
^(?!.*\S[MG]O\b)[a-zA-Z0-9,()]+(?: [a-zA-Z0-9,()]+)*$
The pattern matches:
^ Start of string
(?!.*\S[MG]O\b) Negative lookahead, assert not a non whitspace character followed by either MO or GO to the right. The word boundary \b prevents a partial word match
[a-zA-Z0-9,()]+ Start the match with 1+ occurrences of any of the listed characters (Note that there is no space in it)
(?: [a-zA-Z0-9,()]+)* Optionally repeat the same character class with a leading space
$ End of string
Regex demo

How to extract the last word in a string with a JavaScript regex?

I need is the last match. In the case below the word test without the $ signs or any other special character:
Test String:
$this$ $is$ $a$ $test$
Regex:
\b(\w+)\b
The $ represents the end of the string, so...
\b(\w+)$
However, your test string seems to have dollar sign delimiters, so if those are always there, then you can use that instead of \b.
\$(\w+)\$$
var s = "$this$ $is$ $a$ $test$";
document.body.textContent = /\$(\w+)\$$/.exec(s)[1];
If there could be trailing spaces, then add \s* before the end.
\$(\w+)\$\s*$
And finally, if there could be other non-word stuff at the end, then use \W* instead.
\b(\w+)\W*$
In some cases a word may be proceeded by non-word characters, for example, take the following sentence:
Marvelous Marvin Hagler was a very talented boxer!
If we want to match the word boxer all previous answers will not suffice due the fact we have an exclamation mark character proceeding the word. In order for us to ensure a successful capture the following expression will suffice and in addition take into account extraneous whitespace, newlines and any non-word character.
[a-zA-Z]+?(?=\s*?[^\w]*?$)
https://regex101.com/r/D3bRHW/1
We are informing upon the following:
We are looking for letters only, either uppercase or lowercase.
We will expand only as necessary.
We leverage a positive lookahead.
We exclude any word boundary.
We expand that exclusion,
We assert end of line.
The benefit here are that we do not need to assert any flags or word boundaries, it will take into account non-word characters and we do not need to reach for negate.
var input = "$this$ $is$ $a$ $test$";
If you use var result = input.match("\b(\w+)\b") an array of all the matches will be returned next you can get it by using pop() on the result or by doing: result[result.length]
Your regex will find a word, and since regexes operate left to right it will find the first word.
A \w+ matches as many consecutive alphanumeric character as it can, but it must match at least 1.
A \b matches an alphanumeric character next to a non-alphanumeric character. In your case this matches the '$' characters.
What you need is to anchor your regex to the end of the input which is denoted in a regex by the $ character.
To support an input that may have more than just a '$' character at the end of the line, spaces or a period for instance, you can use \W+ which matches as many non-alphanumeric characters as it can:
\$(\w+)\W+$
Avoid regex - use .split and .pop the result. Use .replace to remove the special characters:
var match = str.split(' ').pop().replace(/[^\w\s]/gi, '');
DEMO

how to understand this regex pattern

Javascript regex pattern I find in less:
/^([#.](?:[\w-]|\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9]))+)\s*\(/
especially this section:
\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9])
([#.](?:[\w-]|\\(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9]))+)\s*\(
Let's work it from the inside out, using MDN as reference when necessary:
(?:[A-Fa-f0-9]{1,6} ?|[^A-Fa-f0-9])
(?:) is a non-capturing parenthesis. It groups and matches, but doesn't save the results. Inside that group is 1-6 hex digits followed by an optional space or any character other than a hex character.
(?:[\w-]|\\ above)+
Again, a non-capturing parenthesis, this time of \w, which is any alphanumeric character + _, and since there's [\w-], that's "any alphanum + -_". Then there's an or, a \ character, and the above. Together, that makes this parenthesis group read as: "Any single alphanumeric character, underscore or hyphen, or a backslash followed by either anything not a hexdigit or a hexstring of 1 to 6 characters." The + means "at least 1 instance of the group."
^([#.]above)\s*(
Now we have ^[#.] which means "the line must start with # or . followed by the above, with any number of spaces, followed by a left parenthesis.
TL;DR:
When you add that all up, you get:
"A line that starts with either # or . followed by one or more of:
alphanumeric characters, _ or - OR
a backslash followed by a one to six digit hexstring followed by a single optional space OR
a backslash followed by a single nonhexdigit character
followed by any number of whitespace and then a (".
If a match is found, the entire part before the whitespace and ( is stored in the result of the search.

How to match a string contain specific text preceded by any combination of alphanumerics and other charcters in regex

I want to match strings which have a specific text in start but after that any combination of alphanumeric and character value and string ends with double quotes ". Here is the sample string
fixed_words_/abcd123/"
in this string, fixed_words_ will always be same and in the end will be " but in between there can be digits, alphabets, underscores and slashes.
I tried mystring.match(/fixed_words_\w*"/g) but its not working. I am sorry but I am new to regex so don't mind if its a stupid question.
Instead of \w, have a character class that can match either \w (alphanumerics/underscores) or a slash:
mystring.match(/fixed_words_[\/\w]*"/g)
The above assumes that your expression can appear anywhere (or multiple times!) in mystring. If you want mystring to contain only your expression, add a start-of-string anchor (^) at the beginning, an end-of-string anchor ($) at the end, and get rid of the g flag permitting multiple matches:
mystring.match(/^fixed_words_[\/\w]*"$/)
Use the following regex to match your string: ^\s*fixed_words_[^"]"\s*$
[^"]* will match all the characters until it finds a double quote (") character.

Javascript RegExp Tokenizing

Given a string, I want to use a regular expression to tokenize it. The pattern is as follows: any character (including new line, etc.), until "<", followed by a space zero or more times, followed by "%".
I tried
var patt = /(.)*<(\s)*%/;
but it does not yield the desired result. I would appreciate an explanation along with the pattern.
Use this:
"some string".split(/.*<\s*%/);
/^[\s\S]*?< *%/
should do what you want.
^ causes it to match at the beginning of the string.
[\s\S] matches any character. Literally, it means any space or non-space character, and works around the fact that . does not match newlines.
*? matches zero or more but the fewest necessary for the rest of the pattern to match.
< matches a literal '<'
* (note the space) matches zero or more spaces. This is more readable if written as [ ]*.
% finally matches that character.
If you want to match the entire string (i.e. the % should be the last character in the string), then you can put a $ before the last /.

Categories

Resources