I cannot make my javascript regexp to work - javascript

I need to parse in javascript the value entered by the user in an html text field.
That is my first regexp experience.
Here is my code :
var s = 'research library "not available" author:"Bernard Shaw"';
var tableau = s.split(/(?:[^\s"]+|"[^"]*")/);
for (var i=0; i<tableau.length; i++) {
document.write("tableau[" + i + "] = " + tableau[i] + "<BR>");
}
I am expecting to see something like this:
tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:
tableau[4] = "Bernard Shaw"
But instead I got this:
tableau[0] =
tableau[1] =
tableau[2] =
tableau[3] =
tableau[4] =
tableau[5] =
Actually, what I really need is to split this value :
research library "not available" author:"Bernard Shaw"
into this array :
tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:"Bernard Shaw"
But I think there is a problem with positive lookbehind in javascript or something like this.
I did many tries without more success:
How do I split a string with multiple separators in javascript?
Regex split string preserving quotes
Positive look behind in JavaScript regular expression
javascript split string by space, but ignore space in quotes (notice not to split by the colon too)
I think I really need some help...

It seems like you want to split on the whitespace outside the double-quotes. In that case you can try this regex:
var tableau = s.split(/\s(?=(?:[^"]*"[^"]*")*[^"]*$)/);
this will split on whitespace, followed by an even number of double quotes.
Explanation:
\s # Split on whitespace
(?= # Followed by
(?: # Non-capture group with 2 quotes
[^"]* # 0 or more non-quote characters
" # 1 quote
[^"]* # 0 or more non-quote characters
" # 1 quote
)* # 0 or more repetition of previous group(multiple of 2 quotes will be even)
[^"]* # Finally 0 or more non-quotes
$ # Till the end (This is necessary)
)
This will give you your final desired output:
tableau[0] = research
tableau[1] = library
tableau[2] = "not available"
tableau[3] = author:"Bernard Shaw"

Regex might not be the way to go. Instead, you might write a tiny parser that marches along a character at a time and builds an array. Something like this (http://jsfiddle.net/WTMct/1):
function parse(str) {
var arr = [];
var quote = false; // true means we're inside a quoted field
// iterate over each character, keep track of current field index (i)
for (var i = c = 0; c < str.length; c++) {
var cc = str[c], nc = str[c+1]; // current character, next character
arr[i] = arr[i] || ''; // create a new array value (start with empty string) if necessary
// If it's just one quotation mark, begin/end quoted field
if (cc == '"') { quote = !quote; continue; }
// If it's a space, and we're not in a quoted field, move on to the next field
if (cc == ' ' && !quote) { ++i; continue; }
// Otherwise, append the current character to the current field
arr[i] += cc;
}
return arr;
}
Then
parse('research library "not available" author:"Bernard Shaw"')
returns ["research", "library", "not available", "author:Bernard Shaw"].

You can also match the string
var output=s.match(/"[^"]*"|\S+/g);

Related

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

Regular expression in Javascript to check for # symbol

I am trying to detect whether a block of text (from a textarea) contains words that are prefixed with the #sign.
For example in the following text: Hey #John, i just saw #Smith
It will detect John and Smith respectively without the # symbol. I reckoned something like this would work:
#\w\w+
My question is how do i make javascript filter the text, assuming it is stored in a variable comment?
It should output only the names in the text that are prefixed with # without the # symbol.
Regards.
You use the g (global) flag, a capture group, and a loop calling RegExp#exec, like this:
var str = "Hi there #john, it's #mary, my email is mary#example.com.";
var re = /\B#(\w+)/g;
var m;
for (m = re.exec(str); m; m = re.exec(str)) {
console.log("Found: " + m[1]);
}
Output:
Found: john
Found: mary
Live example | source
With thanks to #Alex K for the boundary recommendation!
comment.match(/#\w+/g) will give you an array of the matches (["#John", "#Smith"]).
I added a check to the regex so that it won't match email addresses, in case you're interested.
var comment = "Hey #John, I just saw #Smith."
+ " (john#example.com)";
// Parse tags using ye olde regex.
var tags = comment.match(/\B#\w+/g);
// If no tags were found, turn "null" into
// an empty array.
if (!tags) {
tags = [];
}
// Remove leading space and "#" manually.
// Can't incorporate this into regex as
// lookbehind not always supported.
for (var i = 0; i < tags.length; i++) {
tags[i] = tags[i].substr(1);
}
var re = /#(\w+)/g; //set the g flag to match globally
var match;
while (match = re.exec(text)) {
//match is an array representing how the regex matched the text.
//match.index the position where it matches.
//it returns null if there are no matches, ending the loop.
//match[0] is the text matched by the entire regex,
//match[1] is the text between the first capturing group.
//each set of matching parenthesis is a capturing group.
}

Split string in javascript by lines, preserving newlines?

How would I split a javascript string such as foo\nbar\nbaz to an array of lines, while preserving the newlines? I'd like to get ['foo\n', 'bar\n', 'baz'] as output;
I'm aware there are numerous possible answers - I'm just curious to find a stylish one.
With perl I'd use a zero-width lookbehind assertion: split /(?<=\n)/, but they are not supported in javascript regexs.
PS. Extra points for handling different line endings (at least \r\n) and handling the missing last newline (as in my example).
You can perform a global match with this pattern: /[^\n]+(?:\r?\n|$)/g
It matches any non-newline character then matches an optional \r followed by \n, or the end of the string.
var input = "foo\r\n\nbar\nbaz";
var result = input.match(/[^\n]+(?:\r?\n|$)/g);
Result: ["foo\r\n", "bar\n", "baz"]
how about this?
"foo\nbar\nbaz".split(/^/m);
Result
["foo
", "bar
", "baz"]
The other answers and answers in comments are all flawed in different ways. I needed a function that works correctly on any string or file.
Here is a simple and correct answer:
function split_lines(s) {
return s.match(/[^\n]*\n|[^\n]+/g);
}
input = "foo\r\n\nbar\n\r\nba\rz\r\r\r";
a = split_lines(input);
Array(5) [ "foo\r\n", "\n", "bar\n", "\r\n", "ba\rz\r\r\r" ]
It effectively splits at each newline \n but includes the \n, and includes a final line without trailing \n if and only if it is not empty. It includes all input characters in the output. We don't need any special treatment for \r.
I've tested this on a large chunk of random data, it does preserve all input characters, and \n only occur at the end of the lines.
Here's a test script:
function split_lines(s) {
return s.match(/[^\n]*\n|[^\n]+/g);
}
function gen_random_string(n, ncharset=256, nlprob=0.05, crprob=0.05) {
var s = "";
for (let i = 0; i < n; ++i) {
var r = Math.random();
if (r < nlprob)
s += "\n";
else if (r < nlprob + crprob)
s += "\r";
else {
var cc = Math.floor(r / (1 - nlprob - crprob) * ncharset);
var c = String.fromCharCode(cc);
s += c;
}
}
return s;
}
function test(...args) {
var s = gen_random_string(...args);
console.log(`generated random string of length ${s.length} with args:`, ...args);
var ok = true, ok1;
var a = split_lines(s);
console.log(`split into ${a.length} lines`);
ok1 = s === a.join('');
ok = ok && ok1;
console.log("split lines combine to give the original string?", ok1 ? "OK" : "FAIL");
for (var i = 0; i < a.length; ++i) {
var s1 = a[i];
ok1 = s1.endsWith("\n") || i == a.length-1;
ok = ok && ok1;
ok1 = !s1.slice(0, -1).includes("\n");
ok = ok && ok1;
}
console.log("tested each line other than the last ends with \\n");
console.log("tested each line does not contain \\n before the last character");
console.log("Final result", ok ? "OK" : "FAIL");
}
test(10000, 256);
test(10000, 65536);
I'd stay away from split with regular expressions since IE has a failed implementation of it. Use match instead.
"foo\nbar\nbaz".match(/^.*(\r?\n|$)/mg)
Result: ["foo\n", "bar\n", "baz"]
One simple but crude method would be first to replace "\n"s with a 2 special characters. Split on the second one, and replace the first with "\n" after splitting. Not efficient and not elegant, but definitely works.

regex validating single occurences of characters

I want to check an input string to validate a proper text. The validation will be done with javascript and right now I'm using this code:
keychar = String.fromCharCode(keynum);
var text = txtBox.value + keychar;
textcheck = /(?!.*(.)\1{1})^[fenFN,]*$/;
return textcheck.test(text);
The strings that are allowed are for example:
f
e
f,e
n,f,e,F,N
Examples of not allowed:
ff
fe
f,f
f,ee
f,e,n,f
n,,(although this could be ok)
Is this possible to solve with regex in Javascript?
Although it is possible using regex, it produces a rather big regex that might be hard to comprehend (and therefor maintain). I'd go for a "manual" option as Benjam suggested.
Using regex however, you could do it like this:
var tests = [
'f',
'e',
'f,e',
'n,f,e,F,N',
'ff',
'fe',
'f,f',
'f,ee',
'f,e,n,f',
'n,,',
'f,e,e'
];
for(var i = 0; i < tests.length; i++) {
var t = tests[i];
print(t + ' -> ' + (t.match(/^([a-zA-Z])(?!.*\1)(,([a-zA-Z])(?!.*\3))*$/) ? 'pass' : 'fail'));
}
which will print:
f -> pass
e -> pass
f,e -> pass
n,f,e,F,N -> pass
ff -> fail
fe -> fail
f,f -> fail
f,ee -> fail
f,e,n,f -> fail
n,, -> fail
f,e,e -> fail
as you can see on Ideone.
A small explanation:
^ # match the start of the input
([a-zA-Z]) # match a single ascii letter and store it in group 1
(?!.*\1) # make sure there's no character ahead of it that matches what is inside group 1
( # open group 2
,([a-zA-Z])(?!.*\3) # match a comma followed by a single ascii letter (in group 3) that is not repeated
)* # close group 2 and repeat it zero or more times
$ # match the endof the input
I don't think you can do it with regexps alone, as they are not very good at looking around in the text for duplicates. I'm sure it can be done, but it won't be pretty at all.
What you might want to do is parse the string character by character and store the current character in an array, and while you're parsing the string, check to see if that character has already been used, as follows:
function test_text(string) {
// split the string into individual pieces
var arr = string.split(',');
var used = [];
// look through the string for duplicates
var idx;
for (idx in arr) {
// check for duplicate letters
if (used.indexOf(arr[idx])) {
return false;
}
// check for letters that did not have a comma between
if (1 < arr[idx].length) {
return false;
}
used.push(arr[idx]);
}
return true;
}
You might also want to make sure that the browser you are running this on supports Array.indexOf by including this script somewhere: Mozilla indexOf

Regular expression to parse jQuery-selector-like string

text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
regex = /(.*?)\.filter\((.*?)\)/;
matches = text.match(regex);
log(matches);
// matches[1] is '#container a'
//matchss[2] is '.top'
I expect to capture
matches[1] is '#container a'
matches[2] is '.top'
matches[3] is '.bottom'
matches[4] is '.middle'
One solution would be to split the string into #container a and rest. Then take rest and execute recursive exec to get item inside ().
Update: I am posting a solution that does work. However I am looking for a better solution. Don't really like the idea of splitting the string and then processing
Here is a solution that works.
matches = [];
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var regex = /(.*?)\.filter\((.*?)\)/;
var match = regex.exec(text);
firstPart = text.substring(match.index,match[1].length);
rest = text.substring(matchLength, text.length);
matches.push(firstPart);
regex = /\.filter\((.*?)\)/g;
while ((match = regex.exec(rest)) != null) {
matches.push(match[1]);
}
log(matches);
Looking for a better solution.
This will match the single example you posted:
<html>
<body>
<script type="text/javascript">
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.)]*(?=\))/g);
document.write(matches);
</script>
</body>
</html>
which produces:
#container a,.top,.bottom,.middle
EDIT
Here's a short explanation:
^ # match the beginning of the input
[^.]* # match any character other than '.' and repeat it zero or more times
#
| # OR
#
\. # match the character '.'
[^.)]* # match any character other than '.' and ')' and repeat it zero or more times
(?= # start positive look ahead
\) # match the character ')'
) # end positive look ahead
EDIT part II
The regex looks for two types of character sequences:
one ore more characters starting from the start of the string up to the first ., the regex: ^[^.]*
or it matches a character sequence starting with a . followed by zero or more characters other than . and ), \.[^.)]*, but must have a ) ahead of it: (?=\)). This last requirement causes .filter not to match.
You have to iterate, I think.
var head, filters = [];
text.replace(/^([^.]*)(\..*)$/, function(_, h, rem) {
head = h;
rem.replace(/\.filter\(([^)]*)\)/g, function(_, f) {
filters.push(f);
});
});
console.log("head: " + head + " filters: " + filters);
The ability to use functions as the second argument to String.replace is one of my favorite things about Javascript :-)
You need to do several matches repeatedly, starting where the last match ends (see while example at https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property. For example, assume you have this script:
var myRe = /ab*/g;
var str = "abbcdefabh";
var myArray;
while ((myArray = myRe.exec(str)) != null)
{
var msg = "Found " + myArray[0] + ". ";
msg += "Next match starts at " + myRe.lastIndex;
print(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 9
However, this case would be better solved using a custom-built parser. Regular expressions are not an effective solution to this problem, if you ask me.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var result = text.split('.filter');
console.log(result[0]);
console.log(result[1]);
console.log(result[2]);
console.log(result[3]);
text.split() with regex does the trick.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var parts = text.split(/(\.[^.()]+)/);
var matches = [parts[0]];
for (var i = 3; i < parts.length; i += 4) {
matches.push(parts[i]);
}
console.log(matches);

Categories

Resources