Javascript Regex match any word that starts with '#' in a string - javascript

I'm very new at regex. I'm trying to match any word that starts with '#' in a string that contains no newlines (content was already split at newlines).
Example (not working):
var string = "#iPhone should be able to compl#te and #delete items"
var matches = string.match(/(?=[\s*#])\w+/g)
// Want matches to contain [ 'iPhone', 'delete' ]
I am trying to match any instance of '#', and grab the thing right after it, so long as there is at least one letter, number, or symbol following it. A space or a newline should end the match. The '#' should either start the string or be preceded by spaces.
This PHP solution seems good, but it uses a look backwards type of functionality that I don't know if JS regex has:
regexp keep/match any word that starts with a certain character

var re = /(?:^|\W)#(\w+)(?!\w)/g, match, matches = [];
while (match = re.exec(s)) {
matches.push(match[1]);
}
Check this demo.
let s = "#hallo, this is a test #john #doe",
re = /(?:^|\W)#(\w+)(?!\w)/g,
match, matches = [];
while (match = re.exec(s)) {
matches.push(match[1]);
}
console.log(matches);

Try this:
var matches = string.match(/#\w+/g);
let string = "#iPhone should be able to compl#te and #delete items",
matches = string.match(/#\w+/g);
console.log(matches);

You actually need to match the hash too. Right now you're looking for word characters that follow a position that is immediately followed by one of several characters that aren't word characters. This fails, for obvious reasons. Try this instead:
string.match(/(?=[\s*#])[\s*#]\w+/g)
Of course, the lookahead is redundant now, so you might as well remove it:
string.match(/(^|\s)#(\w+)/g).map(function(v){return v.trim().substring(1);})
This returns the desired: [ 'iPhone', 'delete' ]
Here is a demonstration: http://jsfiddle.net/w3cCU/1/

Related

Remove hashtag symbol js, by regex

Tried to search on the forum but could not find anything that would precisely similar to what i need. Im basically trying to remove the # symbol from results that im receving, here is the dummy example of the regex.
let postText = 'this is a #test of #hashtags';
var regexp = new RegExp('#([^\\s])', 'g');
postText = postText.replace(regexp, '');
console.log(postText);
It gives the following result
this is a est of ashtags
What do i need to change around so that it removes just the hashtags without cutting the first letter of each word
You need a backreference $1 as the replacement:
let postText = 'this is a #test of #hashtags';
var regexp = /#(\S)/g;
postText = postText.replace(regexp, '$1');
console.log(postText);
// Alternative with a lookahead:
console.log('this is a #test of #hashtags'.replace(/#(?=\S)/g, ''));
Note I suggest replacing the constructor notation with a regex literal notation to make the regex a bit more readable, and changing [^\s] with a shorter \S (any non-whitespace char).
Here, /#(\S)/g matches multiple occurrences (due to g modifier) of # and any non-whitespace char right after it (while capturing it into Group 1) and String#replace will replace the found match with that latter char.
Alternatively, to avoid using backreferences (also called placeholders) you may use a lookahead, as in .replace(/#(?=\S)/g, ''), where (?=\S) requires a non-whitespace char immediately to the right of the current location. If you need to remove # at the end of the string, too, replace (?=\S) with (?!\s) that will fail the match if the next char is a whitespace.
Probably easier will be to write your own function which probably will look like this: (covers the usecase when symbol may be repeated)
function replaceSymbol(symbol, string) {
if (string.indexOf(symbol) < 0) {
return string;
}
while(string.indexOf(symbol) > -1) {
string = string.replace(symbol, '');
}
return string;
}
var a = replaceSymbol('#', '##s##u#c###c#e###ss is he#re'); // 'success is here'
You might be able to use the following :
let postText = 'this is a #test of #hashtags';
postText = postText.replace(/#\b/g, '');
It relies on the fact that a #hashtag contains a word-boundary between the # and the word that follows it. By matching that word-boundary with \b, we make sure not to match single #.
However, it might match a bit more than you would expect, because the definition of 'word character' in regex isn't obvious : it includes numbers (so #123 would be matched) and more confusingly, the _ character (so #___ would be matched).
I don't know if there's an authoritative source defining whether those are acceptable hashtags or not, so I'll let you judge whether this suits your needs.
You only need the #, the stuff in parens match anything else after said #
postText = postText.replace('#', '');
This will replace all #

regex match not outputting the adjacent matches javascript

i was experimenting on regex in javascript. Then i came across an issue such that let consider string str = "+d+a+", I was trying to output those characters in the string which are surrounded by +, I used str.match(/\+[a-z]\+/ig), so here what I'm expecting is ["+d+","+a+"], but what i got is just ["+d+"], "+a+" is not showing in the output. Why?
.match(/.../g) returns all non-overlapping matches. Your regex requires a + sign on each side. Given your target string:
+d+a+
^^^
^^^
Your matches would have to overlap in the middle in order to return "+a+".
You can use look-ahead and a manual loop to find overlapping matches:
var str = "+d+a+";
var re = /(?=(\+[a-z]\+))/g;
var matches = [], m;
while (m = re.exec(str)) {
matches.push(m[1]);
re.lastIndex++;
}
console.log(matches);
With regex, when a character gets consumed with a match, then it won't count for the next match.
For example, a regex like /aba/g wouldn't find 2 aba's in a string like "ababa".
Because the second "a" was already consumed.
However, that can be overcome by using a positive lookahead (?=...).
Because lookaheads just check what's behind without actually consuming it.
So a regex like /(ab)(?=(a))/g would return 2 capture groups with 'ab' and 'a' for each 'aba'.
But in this case it just needs to be followed by 1 fixed character '+'.
So it can be simplified, because you don't really need capture groups for this one.
Example snippet:
var str = "+a+b+c+";
var matches = str.match(/\+[a-z]+(?=\+)/g).map(function(m){return m + '+'});
console.log(matches);

Regex to match all words except those in parentheses - javascript

I'm using the following regex to match all words:
mystr.replace(/([^\W_]+[^\s-]*) */g, function (match, p1, index, title) {...}
Note that words can contain special characters like German Umlauts.
How can I match all words excluding those inside parentheses?
If I have the following string:
here wäre c'è (don't match this one) match this
I would like to get the following output:
here
wäre
c'è
match
this
The trailing spaces don't really matter.
Is there an easy way to achieve this with regex in javascript?
EDIT:
I cannot remove the text in parentheses, as the final string "mystr" should also contain this text, whereas string operations will be performed on text that matches. The final string contained in "mystr" could look like this:
Here Wäre C'è (don't match this one) Match This
Try this:
var str = "here wäre c'è (don't match this one) match this";
str.replace(/\([^\)]*\)/g, '') // remove text inside parens (& parens)
.match(/(\S+)/g); // match remaining text
// ["here", "wäre", "c'è", "match", "this"]
Thomas, resurrecting this question because it had a simple solution that wasn't mentioned and that doesn't require replacing then matching (one step instead of two steps). (Found your question while doing some research for a general question about how to exclude patterns in regex.)
Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):
\(.*?\)|([^\W_]+[^\s-]*)
The left side of the alternation matches complete (parenthesized phrases). We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.
This program shows how to use the regex (see the matches in the online demo):
<script>
var subject = 'here wäre c\'è (don\'t match this one) match this';
var regex = /\(.*?\)|([^\W_]+[^\s-]*)/g;
var group1Caps = [];
var match = regex.exec(subject);
// put Group 1 captures in an array
while (match != null) {
if( match[1] != null ) group1Caps.push(match[1]);
match = regex.exec(subject);
}
document.write("<br>*** Matches ***<br>");
if (group1Caps.length > 0) {
for (key in group1Caps) document.write(group1Caps[key],"<br>");
}
</script>
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

Regex produces different result in javascript

Why does this regex return an entirely different result in javascript as compared to an on-line regex tester, found at http://www.gskinner.com/RegExr/
var patt = new RegExp(/\D([0-9]*)/g);
"/144444455".match(patt);
The return in the console is:
["/144444455"]
While it does return the correct group in the regexr tester.
All I'm trying to do is extract the first amount inside a piece of text. Regardless if that text starts with a "/" or has a bunch of other useless information.
The regex does exactly what you tell it to:
\D matches a non-digit (in this case /)
[0-9]* matches a string of digits (144444455)
You will need to access the content of the first capturing group:
var match = patt.exec(subject);
if (match != null) {
result = match[1];
}
Or simply drop the \D entirely - I'm not sure why you think you need it in the first place...
Then, you should probably remove the /g modifier if you only want to match the first number, not all numbers in your text. So,
result = subject.match(/\d+/);
should work just as well.

Javascript regular expression matching prior and trailing characters

I have this string in a object:
<FLD>dsfgsdfgdsfg;NEW-7db5-32a8-c907-82cd82206788</FLD><FLD>dsfgsdfgsd;NEW-480e-e87c-75dc-d70cd731c664</FLD><FLD>dfsgsdfgdfsgfd;NEW-0aad-440a-629c-3e8f7eda4632</FLD>
this.model.get('value_long').match(/[<FLD>\w+;](NEW[-|\d|\w]+)[</FLD>]/g)
Returns:
[";NEW-7db5-32a8-c907-82cd82206788<", ";NEW-480e-e87c-75dc-d70cd731c664<", ";NEW-0aad-440a-629c-3e8f7eda4632<"]
What is wrong with my regular expression that it is picking up the preceding ; and trailing <
here is a link to the regex
http://regexr.com?30k3m
Updated:
this is what I would like returned:
["NEW-7db5-32a8-c907-82cd82206788", "NEW-480e-e87c-75dc-d70cd731c664", "NEW-0aad-440a-629c-3e8f7eda4632"]
here is a JSfiddle for it
http://jsfiddle.net/mwagner72/HHMLK/
Square brackets create a character class, which you do not want here, try changing your regex to the following:
<FLD>\w+;(NEW[-\d\w]+)</FLD>
Since it looks like you want to grab the capture group from each match, you can use the following code to construct an array with the capture group in it:
var regex = /<FLD>\w+;(NEW[\-\d\w]+)<\/FLD>/g;
var match = regex.exec(string);
var matches = [];
while (match !== null) {
matches.push(match[1]);
match = regex.exec(string);
}
[<FLD>\w+;] would match one of the characters inside of the square brackets, when I think what you actually want to do is match all of those. Also for the other character class, [-|\d|\w], you can remove the | because it is already implied in a character class, | should only be used for alternation inside of a group.
Here is an updated link with the new regex: http://jsfiddle.net/RTkzx/1

Categories

Resources