Regular expression to parse jQuery-selector-like string - javascript

text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
regex = /(.*?)\.filter\((.*?)\)/;
matches = text.match(regex);
log(matches);
// matches[1] is '#container a'
//matchss[2] is '.top'
I expect to capture
matches[1] is '#container a'
matches[2] is '.top'
matches[3] is '.bottom'
matches[4] is '.middle'
One solution would be to split the string into #container a and rest. Then take rest and execute recursive exec to get item inside ().
Update: I am posting a solution that does work. However I am looking for a better solution. Don't really like the idea of splitting the string and then processing
Here is a solution that works.
matches = [];
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var regex = /(.*?)\.filter\((.*?)\)/;
var match = regex.exec(text);
firstPart = text.substring(match.index,match[1].length);
rest = text.substring(matchLength, text.length);
matches.push(firstPart);
regex = /\.filter\((.*?)\)/g;
while ((match = regex.exec(rest)) != null) {
matches.push(match[1]);
}
log(matches);
Looking for a better solution.

This will match the single example you posted:
<html>
<body>
<script type="text/javascript">
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.)]*(?=\))/g);
document.write(matches);
</script>
</body>
</html>
which produces:
#container a,.top,.bottom,.middle
EDIT
Here's a short explanation:
^ # match the beginning of the input
[^.]* # match any character other than '.' and repeat it zero or more times
#
| # OR
#
\. # match the character '.'
[^.)]* # match any character other than '.' and ')' and repeat it zero or more times
(?= # start positive look ahead
\) # match the character ')'
) # end positive look ahead
EDIT part II
The regex looks for two types of character sequences:
one ore more characters starting from the start of the string up to the first ., the regex: ^[^.]*
or it matches a character sequence starting with a . followed by zero or more characters other than . and ), \.[^.)]*, but must have a ) ahead of it: (?=\)). This last requirement causes .filter not to match.

You have to iterate, I think.
var head, filters = [];
text.replace(/^([^.]*)(\..*)$/, function(_, h, rem) {
head = h;
rem.replace(/\.filter\(([^)]*)\)/g, function(_, f) {
filters.push(f);
});
});
console.log("head: " + head + " filters: " + filters);
The ability to use functions as the second argument to String.replace is one of my favorite things about Javascript :-)

You need to do several matches repeatedly, starting where the last match ends (see while example at https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property. For example, assume you have this script:
var myRe = /ab*/g;
var str = "abbcdefabh";
var myArray;
while ((myArray = myRe.exec(str)) != null)
{
var msg = "Found " + myArray[0] + ". ";
msg += "Next match starts at " + myRe.lastIndex;
print(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 9
However, this case would be better solved using a custom-built parser. Regular expressions are not an effective solution to this problem, if you ask me.

var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var result = text.split('.filter');
console.log(result[0]);
console.log(result[1]);
console.log(result[2]);
console.log(result[3]);

text.split() with regex does the trick.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var parts = text.split(/(\.[^.()]+)/);
var matches = [parts[0]];
for (var i = 3; i < parts.length; i += 4) {
matches.push(parts[i]);
}
console.log(matches);

Related

Replace after char '-' or '/' match

I'm trying to execute regex replace after match char, example 3674802/3 or 637884-ORG
The id can become one of them, in that case, how can I use regex replace to match to remove after the match?
Input var id = 3674802/3 or 637884-ORG;
Expected Output 3674802 or 637884
You could use sbustring method to take part of string only till '/' OR '-':
var input = "3674802/3";
var output = input.substr(0, input.indexOf('/'));
var input = "637884-ORG";
var output = input.substr(0, input.indexOf('-'));
var input = "3674802/3";
if (input.indexOf('/') > -1)
{
input = input.substr(0, input.indexOf('/'));
}
console.log(input);
var input = "637884-ORG";
if (input.indexOf('-') > -1)
{
input = input.substr(0, input.indexOf('-'));
}
console.log(input);
You can use a regex with a lookahead assertion
/(\d+)(?=[/-])/g
var id = "3674802/3"
console.log((id.match(/(\d+)(?=[/-])/g) || []).pop())
id = "637884-ORG"
console.log((id.match(/(\d+)(?=[/-])/g) || []).pop())
You don't need Regex for this. Regex is far more powerful than what you need.
You get away with the String's substring and indexOf methods.
indexOf takes in a character/substring and returns an integer. The integer represents what character position the character/substring starts at.
substring takes in a starting position and ending position, and returns the new string from the start to the end.
If are having trouble getting these to work; then, feel free to ask for more clarification.
You can use the following script:
var str = '3674802/3 or 637884-ORG';
var id = str.replace(/(\d+)[-\/](?:\d+|[A-Z]+)/g, '$1');
Details concerning the regex:
(\d+) - A seuence of digits, the 1st capturing group.
[-\/] - Either a minus or a slash. Because / are regex delimiters,
it must be escaped with a backslash.
(?: - Start of a non-capturing group, a "container" for alternatives.
\d+ - First alternative - a sequence of digits.
| - Alternative separator.
[A-Z]+ - Second alternative - a sequence of letters.
) - End of the non-capturing group.
g - global option.
The expression to replace with: $1 - replace the whole finding with
the first capturing group.
Thanks To everyone who responded to my question, was really helpful to resolve my issue.
Here is My answer that I built:
var str = ['8484683*ORG','7488575/2','647658-ORG'];
for(i=0;i<str.length;i++){
var regRep = /((\/\/[^\/]+)?\/.*)|(\-.*)|(\*.*)/;
var txt = str[i].replace(regRep,"");
console.log(txt);
}

match numbers without a prefix

I need help with regular expression.
Using javascript I am going through each line of a text file and I want to replace any match of [0-9]{6,9} with a '*', but, I don't want to replace numbers with prefix 100. So, a number like 1110022 should be replaced (matched), but 1004567 should not (no match).
I need a single expression that will do the trick (just the matching part). I can’t use ^ or $ because the number can appear in the middle of the line.
I have tried (?!100)[0-9]{6,9}, but it doesn't work.
More examples:
Don't match: 10012345
Match: 1045677
Don't match:
1004567
Don't match: num="10034567" test
Match just the middle number in the line: num="10048876" 1200476, 1008888
Thanks
You need to use a leading word boundary to check if a number starts with some specific digit sequence:
\b(?!100)\d{6,9}
See the regex demo
Here, the 100 is checked right after a word boundary, not inside a number.
If you need to replace the matches with just a single asterisk, just use the "*" as a replacement string (see snippet right below).
var re = /\b(?!100)\d{6,9}/g;
var str = 'Don\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, '*') + "</pre>";
<div id="r"/>
Or, if you need to replace each digit with *, you need to use a callback function inside a replace:
String.prototype.repeat = function (n, d) {
return --n ? this + (d || '') + this.repeat(n, d) : '' + this
};
var re = /\b(?!100)\d{6,9}/g;
var str = '123456789012 \nDon\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, function(m) { return "*".repeat(m.length); }) + "</pre>";
<div id="r"/>
The repeat function is borrowed from BitOfUniverse's answer.

String that doesn't contain character group

I wrote regex for finding urls in text:
/(http[^\s]+)/g
But now I need same as that but that expression doesn't contain certain substring, for instance I want all those urls which doesn't contain word google.
How can I do that?
Here is a way to achieve that:
http:\/\/(?!\S*google)\S+
See demo
JS:
var re = /http:\/\/(?!\S*google)\S+/g;
var str = 'http://ya.ru http://yahoo.com http://google.com';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[0] + "<br/>";
}
<div id="r"/>
Regex breakdown:
http:\/\/ - a literal sequence of http://
(?!\S*google) - a negative look-ahead that performs a forward check from the current position (i.e. right after http://), and if it finds 0-or-more-non-spaces-heregoogle the match will be cancelled.
\S+ - 1 or more non-whitespace symbols (this is necessary since the lookahead above does not really consume the characters it matches).
Note that if you have any punctuation after the URL, you may add \b right at the end of the pattern:
var re1 = /http:\/\/(?!\S*google)\S+/g;
var re2 = /http:\/\/(?!\S*google)\S+\b/g;
document.write(
JSON.stringify(
'http://ya.ru, http://yahoo.com, http://google.com'.match(re1)
) + "<br/>"
);
document.write(
JSON.stringify(
'http://ya.ru, http://yahoo.com, http://google.com'.match(re2)
)
);

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

RegEx needed to split javascript string on "|" but not "\|"

We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.
ex we would like to see the following string split into the following components
1|2|3\|4|5
1
2
3\|4
5
I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.
Instead of a split, do a global match (the same way a lexical analyzer would):
match anything other than \\ or |
or match any escaped char
Something like this:
var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);
A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.
What you're looking for is a "negative look-behind matching regular expression".
This isn't pretty, but it should split the list for you:
var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});
This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.
A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).
Without using Regex, if I understood what you desire, this should do the job:
function doSplit(input) {
var output = [];
var currPos = 0,
prevPos = -1;
while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
if (input[currPos-1] == "\\") continue;
var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
prevPos = currPos;
output.push(recollect);
}
var recollect = input.substr(prevPos + 1);
output.push(recollect);
return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]

Categories

Resources