I have strings like this:
ab
rx'
wq''
pok'''
oyu,
mi,,,,
Basically, I want to split the string into two parts. The first part should have the alphabetical characters intact, the second part should have the non-alphabetical characters.
The alphabetical part is guaranteed to be 2-3 lowercase characters between a and z; the non-alphabetical part can be any length, and is gauranteed to only be the characters , or ', but not both in the one string (e.g. eex,', will never occur).
So the result should be:
[ab][]
[rx][']
[wq]['']
[pok][''']
[oyu][,]
[mi][,,,,]
How can I do this? I'm guessing a regular expression but I'm not particularly adept at coming up with them.
Regular expressions have is a nice special called "word boundary" (\b). You can use it, well, to detect the boundary of a word, which is a sequence of alpha-numerical characters.
So all you have to do is
foo.split(/\b/)
For example,
"pok'''".split(/\b/) // ["pok", "'''"]
If you can 100% guarantee that:
Letter-strings are 2 or 3 characters
There are always one or more primes/commas
There is never any empty space before, after or in-between the letters and the marks
(aside from line-break)
You can use:
/^([a-zA-Z]{2,3})('+|,+)$/gm
var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("pok'''");
arr === ["pok'''", "pok", "'''"];
var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("baf,,,");
arr === ["baf,,,", "baf", ",,,"];
Of course, save yourself some sanity, and save that RegEx as a var.
And as a warning, if you haven't dealt with RegEx like this:
If a match isn't found -- if you try to match foo','' by mixing marks, or you have 0-1 or 4+ letters, or 0 marks... ...then instead of getting an array back, you'll get null.
So you can do this:
var reg = /^([a-zA-Z]{2,3})('+|,+)$/gm,
string = "foobar'',,''",
result_array = reg.exec(string) || [string];
In this case, the result of the exec is null; by putting the || (or) there, we can return an array that has the original string in it, as index-0.
Why?
Because the result of a successful exec will have 3 slots; [*string*, *letters*, *marks*].
You might be tempted to just read the letters like result_array[1].
But if the match failed and result_array === null, then JavaScript will scream at you for trying null[1].
So returning the array at the end of a failed exec will allow you to get result_array[1] === undefined (ie: there was no match to the pattern, so there are no letters in index-1), rather than a JS error.
You could try something like that:
function splitString(string){
var match1 = null;
var match2 = null;
var stringArray = new Array();
match1 = string.indexOf(',');
match2 = string.indexOf('`');
if(match1 != 0){
stringArray = [string.slice(0,match1-1),string.slice(match1,string.length-1];
}
else if(match2 != 0){
stringArray = [string.slice(0,match2-1),string.slice(match2,string.length-1];
}
else{
stringArray = [string];
}
}
var str = "mi,,,,";
var idx = str.search(/\W/);
if(idx) {
var list = [str.slice(0, idx), str.slice(idx)]
}
You'll have the parts in list[0] and list[1].
P.S. There might be some better ways than this.
yourStr.match(/(\w{2,3})([,']*)/)
if (match = string.match(/^([a-z]{2,3})(,+?$|'+?$)/)) {
match = match.slice(1);
}
Related
I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.
Remove strings with numbers and special characters using regular expression.Here is my code
var string = "[Account0].&[1]+[Account1].&[2]+[Account2].&[3]+[Account3].&[4]";
var numbers = string.match(/(\d+)/gi);
alert(numbers.join(','));
here output is : 0,1,1,2,2,3,3,4
But i want the following output 1,2,3,4
Can any one please help me.
Thanks,
Seemd what you want is [\d+], use exec like this,
var myRe = /\[(\d+)\]/gi;
var myArray, numbers = [];
while ((myArray = myRe.exec(string)) !== null) {
numbers.push(myArray[1]);
}
http://jsfiddle.net/xE265/
You can do:
string = "[Account0].&[1]+[Account1].&[2]+[Account2].&[3]+[Account3].&[4]";
repl = string.replace(/.*?\[(\d+)\][^\[]*/g, function($0, $1) { return $1 });
//=> "1234"
I guess the simplest solution in this case would be:
\[(\d+)\]
simply saying that you only want the digits enclosed by brackets.
Regards
I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.
I'm having trouble with removing all characters up to and including the 3 third slash in JavaScript. This is my string:
http://blablab/test
The result should be:
test
Does anybody know the correct solution?
To get the last item in a path, you can split the string on / and then pop():
var url = "http://blablab/test";
alert(url.split("/").pop());
//-> "test"
To specify an individual part of a path, split on / and use bracket notation to access the item:
var url = "http://blablab/test/page.php";
alert(url.split("/")[3]);
//-> "test"
Or, if you want everything after the third slash, split(), slice() and join():
var url = "http://blablab/test/page.php";
alert(url.split("/").slice(3).join("/"));
//-> "test/page.php"
var string = 'http://blablab/test'
string = string.replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'')
alert(string)
This is a regular expression. I will explain below
The regex is /[\s\S]*\//
/ is the start of the regex
Where [\s\S] means whitespace or non whitespace (anything), not to be confused with . which does not match line breaks (. is the same as [^\r\n]).
* means that we match anywhere from zero to unlimited number of [\s\S]
\/ Means match a slash character
The last / is the end of the regex
var str = "http://blablab/test";
var index = 0;
for(var i = 0; i < 3; i++){
index = str.indexOf("/",index)+1;
}
str = str.substr(index);
To make it a one liner you could make the following:
str = str.substr(str.indexOf("/",str.indexOf("/",str.indexOf("/")+1)+1)+1);
You can use split to split the string in parts and use slice to return all parts after the third slice.
var str = "http://blablab/test",
arr = str.split("/");
arr = arr.slice(3);
console.log(arr.join("/")); // "test"
// A longer string:
var str = "http://blablab/test/test"; // "test/test";
You could use a regular expression like this one:
'http://blablab/test'.match(/^(?:[^/]*\/){3}(.*)$/);
// -> ['http://blablab/test', 'test]
A string’s match method gives you either an array (of the whole match, in this case the whole input, and of any capture groups (and we want the first capture group)), or null. So, for general use you need to pull out the 1th element of the array, or null if a match wasn’t found:
var input = 'http://blablab/test',
re = /^(?:[^/]*\/){3}(.*)$/,
match = input.match(re),
result = match && match[1]; // With this input, result contains "test"
let str = "http://blablab/test";
let data = new URL(str).pathname.split("/").pop();
console.log(data);
I'm trying to extract a substring from a file with JavaScript Regex. Here is a slice from the file :
DATE:20091201T220000
SUMMARY:Dad's birthday
the field I want to extract is "Summary". Here is the approach:
extractSummary : function(iCalContent) {
/*
input : iCal file content
return : Event summary
*/
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr);
}
function extractSummary(iCalContent) {
var rx = /\nSUMMARY:(.*)\n/g;
var arr = rx.exec(iCalContent);
return arr[1];
}
You need these changes:
Put the * inside the parenthesis as
suggested above. Otherwise your matching
group will contain only one
character.
Get rid of the ^ and $. With the global option they match on start and end of the full string, rather than on start and end of lines. Match on explicit newlines instead.
I suppose you want the matching group (what's
inside the parenthesis) rather than
the full array? arr[0] is
the full match ("\nSUMMARY:...") and
the next indexes contain the group
matches.
String.match(regexp) is
supposed to return an array with the
matches. In my browser it doesn't (Safari on Mac returns only the full
match, not the groups), but
Regexp.exec(string) works.
You need to use the m flag:
multiline; treat beginning and end characters (^ and $) as working
over multiple lines (i.e., match the beginning or end of each line
(delimited by \n or \r), not only the very beginning or end of the
whole input string)
Also put the * in the right place:
"DATE:20091201T220000\r\nSUMMARY:Dad's birthday".match(/^SUMMARY\:(.*)$/gm);
//------------------------------------------------------------------^ ^
//-----------------------------------------------------------------------|
Your regular expression most likely wants to be
/\nSUMMARY:(.*)$/g
A helpful little trick I like to use is to default assign on match with an array.
var arr = iCalContent.match(/\nSUMMARY:(.*)$/g) || [""]; //could also use null for empty value
return arr[0];
This way you don't get annoying type errors when you go to use arr
This code works:
let str = "governance[string_i_want]";
let res = str.match(/[^governance\[](.*)[^\]]/g);
console.log(res);
res will equal "string_i_want". However, in this example res is still an array, so do not treat res like a string.
By grouping the characters I do not want, using [^string], and matching on what is between the brackets, the code extracts the string I want!
You can try it out here: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp
Good luck.
(.*) instead of (.)* would be a start. The latter will only capture the last character on the line.
Also, no need to escape the :.
You should use this :
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr[0]);
this is how you can parse iCal files with javascript
function calParse(str) {
function parse() {
var obj = {};
while(str.length) {
var p = str.shift().split(":");
var k = p.shift(), p = p.join();
switch(k) {
case "BEGIN":
obj[p] = parse();
break;
case "END":
return obj;
default:
obj[k] = p;
}
}
return obj;
}
str = str.replace(/\n /g, " ").split("\n");
return parse().VCALENDAR;
}
example =
'BEGIN:VCALENDAR\n'+
'VERSION:2.0\n'+
'PRODID:-//hacksw/handcal//NONSGML v1.0//EN\n'+
'BEGIN:VEVENT\n'+
'DTSTART:19970714T170000Z\n'+
'DTEND:19970715T035959Z\n'+
'SUMMARY:Bastille Day Party\n'+
'END:VEVENT\n'+
'END:VCALENDAR\n'
cal = calParse(example);
alert(cal.VEVENT.SUMMARY);