Regex does not apply to whole string - javascript

the following regex
var str = "1234,john smith,jack jone";
var match = str.match(/([^,]*,[^,]*,[^ ]*)/g);
alert(match);
returns
1234,john smith,jack
But what I am trying to get is the whole string which is
1234,john smith,jack jones
Basically my script does the job only for the first whitespace between commas but I want to do it everytime there is a white space between commas.
Can anyone help me out pls.

Your pattern excludes spaces from the last section so as soon as it encounters a space in after the third comma, that's the end of the match. You might want to try this instead:
var match = str.match(/[^,]*,[^,]*,.*/g);
This will allow anything after the second comma, including spaces or more commas (since your original pattern allowed commas after the the second).
If you'd like to match pattern only on a single, use start / end anchors (^ / $) as well as the multiline flag (m), like us this:
var match = str.match(/^[^,]*,[^,]*,.*$/mg);
You can try it out with this simple demo.

Why you're not using split ?
"1234,john smith,jack jone".split(/,/)
or
"1234,john smith,jack jone".split(",")

Related

Javascript regex for tag in comments

I've been working on a web app in which users can comment and reply to comments, this uses a tagging system. The users are being tagged by their name which can contain more words so I've decided to mark the takes like this:
&&John Doe&&
So a comment might look like this:
&&John Doe&&, are you sure that &&Alice Johnson&& is gone?
I'm trying to write a regex to match use in a string.replace() javascript function, so the regex must match every single tag in the string.
So far I have:
^&&.+{2, 64}&&$
This isn't working so I'm sure something is wrong, in case you didn't understand what I meant, the regex from above is supposed to match strings like this:
&&anythingbetween2and64charslong&&.
Thanks in advance!
(.*?)&& means "everything until &&" :
var before = document.getElementById("before");
var after = document.getElementById("after");
var re = /&&(.*?)&&/g, b = "<b>$1</b>";
after.innerHTML = before.textContent.replace(re, b);
<p id="before">&&John Doe&&, are you sure that &&Alice Johnson&& is gone?</p>
<p id="after"></p>
try &{2}.{2,64}&{2}
if you want to get the match in between add parentheses for the match group
&{2}(.{2,64})&{2}
right now your are only checking strings where the entire line matches
the ^ character means beginning of line
the $ character means end of line
\A means beginning of entire string
\Z means end of entire string
Here's what you need:
str.match(/&&.{2,64}?&&/g)
you need to remove ^ and $ from the start and the end since they match the start and the end of the string.
add a /g flag at the end so all the matches will be matched
? after the {} makes the match non-greedy, so it will match the shortest possible string between "&&" instead of the longest (will give you "&&John Doe&&" instead of "&&John Doe&&, are you sure that &&Alice Johnson&&")
Read up on greediness: Repetition with Star and Plus
This regex will match any Unicode letter between && signs:
str.match(/\&\&[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)*\&\&/g);
Here,
\p{L} --> Any unicode letter, the names can be any language and letter
\p{N} --> Any unicode digit
[\p{L}\p{N}]+ --> A word constructed with unicode letters or digits
\s+ --> Gaps between words, max 3 length
[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)* --> All word groups

Not sure why this Regex is returning true

Trying to use this regex to verify usernames and this is what I have :
var goodUsername = /[a-zA-Z0-9_]/g;
console.log(goodUsername.test("HELO $"));
But wether or not I have $ in there it returns true. Not sure why.
I basically only want letters, numbers and _ in usernames and that's it
It seems to work here https://regex101.com/r/nP4iG7/1
The RegEx that you use searches any match in the subject string. In your case HELO matches the criteria. If you like to apply the criteria to the whole string you should define the string begin and end using
var goodUsername = /^[a-zA-Z0-9_]+$/;
console.log(goodUsername.test("HELO $"));//false
You need to add anchors..
/^[a-zA-Z0-9_]+$/;
Anchors help to do exact matching. ^ start of the line anchor, $ end of the line anchor. And also you need to repeat the char class one or more times otherwise it would match a string which contains exactly one character.
You could search for any characters not in the list (a "negated character set"):
var badUsername = /[^a-zA-Z0-9_]/;
console.log(!badUsername.test("HELO $"));
or more simply
var badUsername = /\W/;
since \W is defined as
Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_].
If you prefer to do a positive match, using anchors as other answers have suggested, you can shorten your regexp by using \w:
var goodUsername = /^\w+$/;

Add HTML tags to this regex string

I'm using a tiny little JS plugin to truncate multiple lines of text on a site I'm working on.
The only problem is that the script is counting HTML tags for example in the character count which is throwing things off a little.
This is how the script currently excludes characters;
regex = /[!-\/:-#\[-`{-~]$/
Which basically just strips out certain punctuation characters.
I've tried changing it to this;
regex = [!-\/:-#\[-`{-~]$<[^>]*>
But, not being too familiar with regex, it didn't seem to work.
If someone could nudge me in the right direction that would be great.
In your initial regex you're looking for single characters that matches the tail of the string - either it be a character, word, line. Note the dollar sign '$'.
regex = /[!-\/:-#\[-`{-~]$/
Now you want to match anything between < and >.
regex = /[!-\/:-#\[-`{-~]$|<[^>]*$/
Note that you'll match: <, <aaaa, <aaaa< until the end of the string that you are matching against.
greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*/
non_greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*?/
If you remove the second '$' - greedy_regex - it will do a greedy match, matching <b>c</b> of a<b>c</b>d. Using the ? as in non_greedy_regex it will match the '` only.

match a string not after another string

This
var re = /[^<a]b/;
var str = "<a>b";
console.log(str.match(re)[0]);
matches >b.
However, I don't understand why this pattern /[^<a>]b/ doesn't match anything. I want to capture only the "b".
The reason why /[^<a>]b/ doesn't do anything is that you are ignoring <, a, and > as individual characters, so rewriting it as /[^><a]b/ would do the same thing. I doubt this is what you want, though. Try the following:
var re = /<a>(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
This regex looks for a string that looks like <a>b first, but it captures the b with the parentheses. To access the b, simply use [1] when you call .match instead of [0], which would return the entire string (<a>b).
What you're using here is a match for a b preceded by any character that is not listed in the group. The syntax [^a-z+-] where the a-z+- is a range of characters (in this case, the range of the lowercase Latin letters, a plus sign and a minus sign). So, what your regex pattern matches is any b preceded by a character that is NOT < or a. Since > doesn't fall in that range, it matches it.
The range selector basically works the same as a list of characters that are seperated by OR pipes: [abcd] matches the same as (a|b|c|d). Range selectors just have an extra functionality of also matching that same string via [a-d], using a dash in between character ranges. Putting a ^ at the start of a range automatically turns this positive range selector into a negative one, so it will match anything BUT the characters in that range.
What you are looking for is a negative lookahead. Those can exclude something from matching longer strings. Those work in this format: (?!do not match) where do not match uses the normal regex syntax. In this case, you want to test if the preceding string does not match <a>, so just use:
(?!<a>)(.{3}|^.{0,2})b
That will match the b when it is either preceded by three characters that are not <a>, or by fewer characters that are at the start of the line.
PS: what you are probably looking for is the "negative lookbehind", which sadly isn't available in JavaScript regular expressions. The way that would work is (?<!<a>)b in other languages. Because JavaScript doesn't have negative lookbehinds, you'll have to use this alternative regex.
you could write a pattern to match anchor tag and then replace it with empty string
var str = "<a>b</a>";
str = str.replace(/((<a[\w\s=\[\]\'\"\-]*>)|</a>)/gi,'')
this will replace the following strings with 'b'
<a>b</a>
<a class='link-l3'>b</a>
to better get familiar with regEx patterns you may find this website very useful regExPal
Your code :
var re = /[^<a>]b/;
var str = "<a>b";
console.log(str.match(re));
Why [^<a>]b is not matching with anything ?
The meaning of [^<a>]b is any character except < or a or > then b .
Hear b is followed by > , so it will not match .
If you want to match b , then you need to give like this :
var re = /(?:[\<a\>])(b)/;
var str = "<a>b";
console.log(str.match(re)[1]);
DEMO And EXPLANATION

Regex from character until end of string

Hey. First question here, probably extremely lame, but I totally suck in regular expressions :(
I want to extract the text from a series of strings that always have only alphabetic characters before and after a hyphen:
string = "some-text"
I need to generate separate strings that include the text before AND after the hyphen. So for the example above I would need string1 = "some" and string2 = "text"
I found this and it works for the text before the hyphen, now I only need the regex for the one after the hyphen.
Thanks.
You don't need regex for that, you can just split it instead.
var myString = "some-text";
var splitWords = myString.split("-");
splitWords[0] would then be "some", and splitWords[1] will be "text".
If you actually have to use regex for whatever reason though - the $ character marks the end of a string in regex, so -(.*)$ is a regex that will match everything after the first hyphen it finds till the end of the string. That could actually be simplified that to just -(.*) too, as the .* will match till the end of the string anyway.

Categories

Resources