JavaScript Regex Help! - javascript

I am trying to match a # tags within this string:
#sam #gt #channel:sam dfgfdh sam#sam
Now in regex testers this works #[\S]+ (with settings on JS testing) to pick out all strings starting with # so in them I get:
#sam #gt #channel:sam #sam
But then in browsers using this code:
function detect_extractStatusUsers(status){
var e = new RegExp('#[\S]+', 'i');
m = e.exec(status);
var s= "";
if (m != null) {
for (i = 0; i < m.length; i++) {
s = s + m[i] + "\n";
}
alert(s);
}
return true;
}
I can only get one single match of # (if I'm lucky, normally no match).
I must be missing something here and my eyes have just been looking at this for too long to see what it is.
Can anyone see what's wrong in this function?
Thanks,

You need to:
use the global search g setting
escape your \
use match instead of exec
var e = new RegExp('#[\\S]+', 'gi');
m = status.match(e);

You need to call exec repeatedly, until it returns null. Each time it will return a match object, containing all the captures for that match.
I've taken the liberty of rewriting your function the way I would have written it:
function detect_extractStatusUsers(status){
var rx = /(#[\S]+)/gi,
match,
tags = [];
while (match = e.exec(status)) {
tags.push(match[1]);
}
if (tags.length > 0) {
alert(tags.join('\n'));
}
}

Related

how to replace a char in string with many chars

I want to change a char in string with many values
I have string like this :
date_format = "%m/%d/%Y";
And i want to replace ever % with the char which after, so the date variable should be like this:
date_format="mm/dd/YY";
Here is what I tried so far, but i can't get it to work, so I need some help here:
function replaceon(str, index, chr) {
if (index > str.length - 1) return str;
return str.substr(0, index) + chr + str.substr(index + 1);
}
function locations(substring, string) {
var a = [],
i = -1;
while ((i = string.indexOf(substring, i + 1)) >= 0) a.push(i);
return a;
}
function corrent_format(date_format) {
var my_locations = locations('%', date_format);
console.log(my_locations.length);
for (var i = 0; i < my_locations.length; i++) {
replaceon(date_format, my_locations[i], date_format[my_locations[i] + 1]);
}
return date_format;
}
console.log(corrent_format(date_format));
You can try this:
"%m/%d/%Y".replace(/%([^%])/g,"$1$1")
Hope this hepls.
You can use a regular expression for that:
var date_format="%m/%d/%Y";
var res = date_format.replace(/%(.)/g, "$1$1");
console.log(res);
function act(str) {
var res = "";
for (var i = 0; i < (str.length - 1); i++) {
if (str[i] === "%")
res += str[i + 1];
else
res += str[i];
}
res += str[i];
return res;
}
var date_format = "%m/%d/%Y";
console.log(act(date_format));
Your code is not working because the date_format variable is not being modified by the corrent_format function. The replaceon function returns a new string. If you assign the result to date_format, you should get the expected result:
for (var i = 0; i < my_locations.length; i++) {
date_format = replaceon(date_format, my_locations[i], date_format[my_locations[i]+1])
}
Alternatively, you could perform the replacement using String.replace and a regular expression:
date_format.replace(/%(.)/g, '$1$1');
For the regex-challenged among us, here's a translation of /%(.)/g, '$1$1':
/ means that the next part is going to be regex.
% find a %.
. any single character, so %. would match %m, %d, and/or %Y.
(.) putting it in parens means to capture the value to use later on.
/g get all the matches in the source string (instead of just the first one).
?1 references the value we captured before in (.).
?1?1 repeat the captured value twice.
So, replace every %. with whatever's in the ., times two.
Now, this regex expression is the most concise and quickest way to do the job at hand. But maybe you can't use regular expressions. Maybe you have a dyslexic boss who has outlawed their use. (Dyslexia and regex are uneasy companions at best.) Maybe you haven't put in the 47 hours screaming at regular expressions that aren't doing what you want, that you're required to put in before you're allowed to use them. Or maybe you just hate regular expressions.
If any of these apply, you can also do this:
var x = '%m/%d/%y';
x = x.replace('%', 'm');
x = x.replace('%', 'd');
x = x.replace('%', 'y');
alert(x);
This takes advantage of the fact that the replace function only replaces the first match found.
But seriously, don't use this. Use regex. It's always better to invest that 20 hours working out a regex expression that condenses the 20 lines of code you wrote in 15 minutes down to one. Unless you have to get it done sometime tonight, and whatever you're trying just doesn't work, and it's getting close to midnight, and you're getting tired...well, no. Use regex. Really. Resist the temptation to avoid finding a regex solution. You'll be glad you did when you wake up at your desk in the morning, having missed your deadline, and get to go home and spend more time with your family, courtesy of your generous severance package.

Parsing BBCode in Javascript

I am using this (http://coursesweb.net/javascript/convert-bbcode-html-javascript_cs) as my script for parsing BBCode. I have extended the BBCodes that it can process, however I am encountering a problem when a newline immediately follows an opening tag, e.g.
[code]
code....
[/code]
The problem does not occur if the code is 'inline'
[code]code....[/code]`
The regex being used to match what's inside these tags is (.*?) which I know does not match newlines. I have tried ([^\r\n]) to match newlines but this hasn't worked either.
I imagine it's a simple issue but I have little experience with regex so any help would be appreciated
EDIT: this is the full list of regex's I am using
var tokens = {
'URL' : '((?:(?:[a-z][a-z\\d+\\-.]*:\\/{2}(?:(?:[a-z0-9\\-._~\\!$&\'*+,;=:#|]+|%[\\dA-F]{2})+|[0-9.]+|\\[[a-z0-9.]+:[a-z0-9.]+:[a-z0-9.:]+\\])(?::\\d*)?(?:\\/(?:[a-z0-9\\-._~\\!$&\'*+,;=:#|]+|%[\\dA-F]{2})*)*(?:\\?(?:[a-z0-9\\-._~\\!$&\'*+,;=:#\\/?|]+|%[\\dA-F]{2})*)?(?:#(?:[a-z0-9\\-._~\\!$&\'*+,;=:#\\/?|]+|%[\\dA-F]{2})*)?)|(?:www\\.(?:[a-z0-9\\-._~\\!$&\'*+,;=:#|]+|%[\\dA-F]{2})+(?::\\d*)?(?:\\/(?:[a-z0-9\\-._~\\!$&\'*+,;=:#|]+|%[\\dA-F]{2})*)*(?:\\?(?:[a-z0-9\\-._~\\!$&\'*+,;=:#\\/?|]+|%[\\dA-F]{2})*)?(?:#(?:[a-z0-9\\-._~\\!$&\'*+,;=:#\\/?|]+|%[\\dA-F]{2})*)?)))',
'LINK' : '([a-z0-9\-\./]+[^"\' ]*)',
'EMAIL' : '((?:[\\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*(?:[\\w\!\#$\%\'\*\+\-\/\=\?\^\`{\|\}\~]|&)+#(?:(?:(?:(?:(?:[a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(?:\\d{1,3}\.){3}\\d{1,3}(?:\:\\d{1,5})?))',
'TEXT' : '(.*?)',
'SIMPLETEXT' : '([a-zA-Z0-9-+.,_ ]+)',
'INTTEXT' : '([a-zA-Z0-9-+,_. ]+)',
'IDENTIFIER' : '([a-zA-Z0-9-_]+)',
'COLOR' : '([a-z]+|#[0-9abcdef]+)',
'NUMBER' : '([0-9]+)',
'ALL' : '([^\r\n])',
};
EDIT 2: Full JS for matching
var token_match = /{[A-Z_]+[0-9]*}/ig;
var _getRegEx = function(str) {
var matches = str.match(token_match);
var nrmatches = matches.length;
var i = 0;
var replacement = '';
if (nrmatches <= 0) {
return new RegExp(preg_quote(str), 'g'); // no tokens so return the escaped string
}
for(; i < nrmatches; i += 1) {
// Remove {, } and numbers from the token so it can match the
// keys in tokens
var token = matches[i].replace(/[{}0-9]/g, '');
if (tokens[token]) {
// Escape everything before the token
replacement += preg_quote(str.substr(0, str.indexOf(matches[i]))) + tokens[token];
// Remove everything before the end of the token so it can be used
// with the next token. Doing this so that parts can be escaped
str = str.substr(str.indexOf(matches[i]) + matches[i].length);
}
}
replacement += preg_quote(str);
return new RegExp(replacement, 'gi');
};
var _getTpls = function(str) {
var matches = str.match(token_match);
var nrmatches = matches.length;
var i = 0;
var replacement = '';
var positions = {};
var next_position = 0;
if (nrmatches <= 0) {
return str; // no tokens so return the string
}
for(; i < nrmatches; i += 1) {
// Remove {, } and numbers from the token so it can match the
// keys in tokens
var token = matches[i].replace(/[{}0-9]/g, '');
var position;
// figure out what $# to use ($1, $2)
if (positions[matches[i]]) {
position = positions[matches[i]];
} else {
// token doesn't have a position so increment the next position
// and record this token's position
next_position += 1;
position = next_position;
positions[matches[i]] = position;
}
if (tokens[token]) {
replacement += str.substr(0, str.indexOf(matches[i])) + '$' + position;
str = str.substr(str.indexOf(matches[i]) + matches[i].length);
}
}
replacement += str;
return replacement;
};
This does the trick for me: (updated this one too to avoid confusion)
\[code\]([\s\S]*?)\[\/code\]
See regexpal and enter the following:
[code]
code....
[/code]
[code]code.... [/code]
Update:
Fixed the regex to the following and this works in the Chrome Console for me:
/\[code\]([\s\S]*?)\[\/code\]/g.exec("[code]hello world \n[/code]")
JavaScript does not handle multi-line RegExp matches. Instead you have to use the [\s\S] trick described in this SO answer. Perhaps?
/\[code\][\s\S]*\[code\]/
Also RegExps probably isn't the best choice for parsing syntax. It's is extremely over complicated. I would suggest parsing the string and building an Abstract Syntax Tree then rendering the HTML from that.

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

URL extraction from string

I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs.
$("#links").change(function() {
//var matches = new array();
var linksStr = $("#links").val();
var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
var matches = linksStr.match(pattern);
for(var i = 0; i < matches.length; i++) {
alert(matches[i]);
}
})
It doesn't capture this url (I need it to):
http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
But it captures this
http://www.wupload.com
Several things:
The main reason it didn't work, is when passing strings to RegExp(), you need to slashify the slashes. So this:
"^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"
Should be:
"^(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
Next, you said that FF reported, "Regular expression too complex". This suggests that linksStr is several lines of URL candidates.
Therefore, you also need to pass the m flag to RegExp().
The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". So, also use the i flag with RegExp().
Whitespace always creeps in, especially in multiline values. Use a leading \s* and $.trim() to deal with it.
Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar are not allowed?
Putting it all together (except for item 5), it becomes:
var linksStr = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar \n"
+ " http://XXXupload.co.uk/fun.exe \n "
+ " WWW.Yupload.mil ";
var pattern = new RegExp (
"^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
, "img"
);
var matches = linksStr.match(pattern);
for (var J = 0, L = matches.length; J < L; J++) {
console.log ( $.trim (matches[J]) );
}
Which yields:
http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
http://XXXupload.co.uk/fun.exe
WWW.Yupload.mil
Why not do make:
URLS = str.match(/https?:[^\s]+/ig);
(https?\:\/\/)([a-z\/\.0-9A-Z_-\%\&\=]*)
this will locate any url in text

Why is my RegExp ignoring start and end of strings?

I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}

Categories

Resources