Regex delimit the start of a string and the end - javascript

I'm been having trouble with regex, which I doesn't understand at all.
I have a string '#anything#that#i#say' and want that the regex detect one word per #, so it will be [#anything, #that, #i, #say].
Need to work with spaces too :(
The closest that I came is [#\w]+, but this only get 1 word and I want separated.

You're close; [#\w] will match anything that is either a # or a word character. But what you want is to match a single # followed by any number of word characters, like this: #\w+ without the brackets
var str = "#anything#that#i#say";
var regexp = /#\w+/gi;
console.log(str.match(regexp));
It's possible to have this deal with spaces as well, but I'd need to see an example of what you mean to tell you how; there are lots of ways that "need to work with spaces" can be interpreted, and I'd rather not guess.

use expression >> /#\s*(\w+)/g
\s* : to check if zero or more spaces you have between # and word
This will match 4 word in your string '#anything#that#i#say'
even your string is containing space between '#anything# that#i# say'
sample to check: http://www.regextester.com/?fam=97638

Related

Replace a phrase in a string that is being broken up into 2 separate lines [duplicate]

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

Javascript regex for tag in comments

I've been working on a web app in which users can comment and reply to comments, this uses a tagging system. The users are being tagged by their name which can contain more words so I've decided to mark the takes like this:
&&John Doe&&
So a comment might look like this:
&&John Doe&&, are you sure that &&Alice Johnson&& is gone?
I'm trying to write a regex to match use in a string.replace() javascript function, so the regex must match every single tag in the string.
So far I have:
^&&.+{2, 64}&&$
This isn't working so I'm sure something is wrong, in case you didn't understand what I meant, the regex from above is supposed to match strings like this:
&&anythingbetween2and64charslong&&.
Thanks in advance!
(.*?)&& means "everything until &&" :
var before = document.getElementById("before");
var after = document.getElementById("after");
var re = /&&(.*?)&&/g, b = "<b>$1</b>";
after.innerHTML = before.textContent.replace(re, b);
<p id="before">&&John Doe&&, are you sure that &&Alice Johnson&& is gone?</p>
<p id="after"></p>
try &{2}.{2,64}&{2}
if you want to get the match in between add parentheses for the match group
&{2}(.{2,64})&{2}
right now your are only checking strings where the entire line matches
the ^ character means beginning of line
the $ character means end of line
\A means beginning of entire string
\Z means end of entire string
Here's what you need:
str.match(/&&.{2,64}?&&/g)
you need to remove ^ and $ from the start and the end since they match the start and the end of the string.
add a /g flag at the end so all the matches will be matched
? after the {} makes the match non-greedy, so it will match the shortest possible string between "&&" instead of the longest (will give you "&&John Doe&&" instead of "&&John Doe&&, are you sure that &&Alice Johnson&&")
Read up on greediness: Repetition with Star and Plus
This regex will match any Unicode letter between && signs:
str.match(/\&\&[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)*\&\&/g);
Here,
\p{L} --> Any unicode letter, the names can be any language and letter
\p{N} --> Any unicode digit
[\p{L}\p{N}]+ --> A word constructed with unicode letters or digits
\s+ --> Gaps between words, max 3 length
[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)* --> All word groups

Match word ending with either one of two special characters in the string

I'm trying to check if word. or word: is part of the string.
if (/word\b/.test(str) )
This is the best solution I came up with, but I'd like to match only for word. or word:.
I was trying something in the following lines, but can't get it to work:
if (/word\/(.|:)/i.test(str)
How to go about this?
You may leverage a character class [.:] to match either a dot or a colon, and then add a non-word boundary \B to make sure there is a non-word char after the dot/colon, or the end of string:
if (/word[.:]\B/i.test(str)
As an alternative, you may require a whitespace or the end of string after . or ::
if (/word[.:](?=\s|$)/i.test(str)

Regex get all text from # to quotation

Okay so I currently have:
/(#([\"]))/g;
I want to be able to check for a string like:
#23ad23"
Whats wrong with my regex?
Your regex (/(#([\"]))/g) breaks down like this:
without start/end delimiters/flags and capturing braces..
#[\"]
which just means #, followed by ", but the square brackets for the class are unnecessary, as there is only one item, so equivalent to...
#"
I think you want to match all characters between # and " inclusive (and captured exclusively).
Start with regex like this:
#.+?"
Which means # followed by anything (.) one or more times (+) un-greedily (?) followed by "
so with the capturing brackets, and delimeters...
/(#(.+?)")/g
Is this how you mean?
/(#([^\"]+))/g;
This will include everything until it reaches the " char.
For minimum match count (bigger-length matches): #(.+)\"
For maximum match count (smaller-length matches): #(.+?)\"

Regex for number surrounded by slashes

Like the title says, I have a (faulty) Regex in JavaScript, that should check for a "2" character (in this case) surrounded by slashes. So if the URL was http://localhost/page/2/ the Regex would pass.
In my case I have something like http://localhost/?page=2 and the Regex still passes.
I'm not sure why. Could anyone tell me what's wrong with it?
/^(.*?)\b2\b(.*?$)/
(I'm going to tell you, I didn't write this code and I have no idea how it works, cause I'm really bad with Regex)
Seems too simple but shouldn't this work?:
/\/2\//
http://jsfiddle.net/QHac8/1/
As it's javascript you have to escape the forward slashes as they are the delimiters for a regex string.
or if you want to match any number:
/\/\d+\//
You don't check for a digit surrounded by slashes. The slashes you see are only your regex delimiters. You check for a 2 with a word boundary \b on each side. This is true for /2/ but also for =2
If you want to allow only a 2 surrounded by slashes try this
/^(.*?)\/2\/(.*?)$/
^ means match from the start of the string
$ match till the end of the string
(.*?) those parts are matching everything before and after your 2 and those parts are stored in capturing groups.
If you don't need those parts, then Richard D is right and the regex /\/2\// is fine for you.

Categories

Resources