Is there a way to do a substring in Javascript but use string characters as the parameters for what you want to select? - javascript

So a substring can take two parameters, the index to start at and the index to stop at like so
var str="Hello beautiful world!";
document.write(str.substring(3,7));
but is there a way to designate the start and stopping points as a set of characters to grab, so instead of the starting point being 3 I would want it to be "lo" and instead of the end point being 7 I would want it to be "wo" so I would be grabbing "lo beautiful wo". Is there a Javascript function that serves that purpose already?

Sounds like you want to use regular expressions and string.match() instead:
var str="Hello beautiful world!";
document.write(str.match(/lo.*wo/)[0]); // document.write("lo beautiful wo");
Note, match() returns an array of matches, which might be null if there is no match. So you should include a null check.
If you're not familiar with regexes, this is a pretty good source:
http://www.w3schools.com/jsref/jsref_obj_regexp.asp

use the method indexOf: document.write(str.substring(3,str.indexOf('wo')+2));

Yup, you can do this easily with regular expressions:
var substr = /lo.+wo/.exec( 'Hello beautiful world!' )[0];
console.log( substr ); //=> 'lo beautiful wo'

Use a regex brother:
if (/(lo.+wo)/.test("Hello beautiful world!")) {
document.write(RegExp.$1);
}
You need a backup plan in case the string does not match. Hence the use of test.

Regular expression may be able to achieve this to some extent, but there are many details that you must be aware of.
For example, if you want to find all the substrings that starts with "lo", and ends with the nearest "wo" after "lo". (If there are more than 1 match, the subsequent matches will pick up the first "lo" after the "wo" of last match).
"Hello beautiful world!".match(/lo.*?wo/g);
Using the RegExp constructor, you can make it more flexible (you can substitute "lo" and "wo" with the actual string you want to find):
"Hello beautiful world!".match(new RegExp("lo" + ".*?" + "wo", "g"));
Important: The downside of the RegExp approach above is that, you need to know what characters are special to escape them - otherwise, they will not match the actual substring you want to find.
It can also be achieve with indexOf, albeit a little bit dirty. For the first substring:
var startIndex = str.indexOf(startString);
var endIndex = str.indexOf(endString, startIndex);
if (startIndex >= 0 && endIndex >= 0)
str.substring(startIndex, endIndex + endString.length)
If you want to find the substring that starts with the first "lo" and ends with the last "wo" in the string, you can use indexOf and lastIndexOf to find it (with a small modification to the code above). RegExp can also do it, by changing .*? to .* in the two example above (there will be at most 1 match, so the "g" flag at the end is redundant).

Related

Extract specific chars from a string using a regex

I need to split an email address and take out the first character and the first character after the '#'
I can do this as follows:
'bar#foo'.split('#').map(function(a){ return a.charAt(0); }).join('')
--> bf
Now I was wondering if it can be done using a regex match, something like this
'bar#foo'.match(/^(\w).*?#(\w)/).join('')
--> bar#fbf
Not really what I want, but I'm sure I miss something here! Any suggestions ?
Why use a regex for this? just use indexOf to get the char at any given position:
var addr = 'foo#bar';
console.log(addr[0], addr[addr.indexOf('#')+1])
To ensure your code works on all browsers, you might want to use charAt instead of []:
console.log(addr.charAt(0), addr.charAt(addr.indexOf('#')+1));
Either way, It'll work just fine, and This is undeniably the fastest approach
If you are going to persist, and choose a regex, then you should realize that the match method returns an array containing 3 strings, in your case:
/^(\w).*?#(\w)/
["the whole match",//start of string + first char + .*?# + first string after #
"groupw 1 \w",//first char
"group 2 \w"//first char after #
]
So addr.match(/^(\w).*?#(\w)/).slice(1).join('') is probably what you want.
If I understand correctly, you are quite close. Just don't join everything returned by match because the first element is the entire matched string.
'bar#foo'.match(/^(\w).*?#(\w)/).splice(1).join('')
--> bf
Using regex:
matched="",
'abc#xyz'.replace(/(?:^|#)(\w)/g, function($0, $1) { matched += $1; return $0; });
console.log(matched);
// ax
The regex match function returns an array of all matches, where the first one is the 'full text' of the match, followed by every sub-group. In your case, it returns this:
bar#f
b
f
To get rid of the first item (the full match), use slice:
'bar#foo'.match(/^(\w).*?#(\w)/).slice(1).join('\r')
Use String.prototype.replace with regular expression:
'bar#foo'.replace(/^(\w).*#(\w).*$/, '$1$2'); // "bf"
Or using RegEx
^([a-zA-Z0-9])[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+#([a-zA-Z0-9-])[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$
Fiddle

Moving index in JavaScript regex matching

I have this regex to extract double words from text
/[A-Za-z]+\s[A-Za-z]+/g
And this sample text
Mary had a little lamb
My output is this
[0] - Mary had; [1] - a little;
Whereas my expected output is this:
[0] - Mary had; [1] - had a; [2] - a little; [3] - little lamb
How can I achieve this output? As I understand it, the index of the search moves to the end of the first match. How can I move it back one word?
Abusing String.replace function
I use a little trick using the replace function. Since the replace function loops through the matches and allows us to specify a function, the possibility is infinite. The result will be in output.
var output = [];
var str = "Mary had a little lamb";
str.replace(/[A-Za-z]+(?=(\s[A-Za-z]+))/g, function ($0, $1) {
output.push($0 + $1);
return $0; // Actually we don't care. You don't even need to return
});
Since the output contains overlapping portion in the input string, it is necessary to not to consume the next word when we are matching the current word by using look-ahead 1.
The regex /[A-Za-z]+(?=(\s[A-Za-z]+))/g does exactly as what I have said above: it will only consume one word at a time with the [A-Za-z]+ portion (the start of the regex), and look-ahead for the next word (?=(\s[A-Za-z]+)) 2, and also capture the matched text.
The function passed to the replace function will receive the matched string as the first argument and the captured text in subsequent arguments. (There are more - check the documentation - I don't need them here). Since the look-ahead is zero-width (the input is not consumed), the whole match is also conveniently the first word. The capture text in the look-ahead will go into the 2nd argument.
Proper solution with RegExp.exec
Note that String.replace function incurs a replacement overhead, since the replacement result is not used at all. If this is unacceptable, you can rewrite the above code with RegExp.exec function in a loop:
var output = [];
var str = "Mary had a little lamb";
var re = /[A-Za-z]+(?=(\s[A-Za-z]+))/g;
var arr;
while ((arr = re.exec(str)) != null) {
output.push(arr[0] + arr[1]);
}
Footnote
In other flavor of regex which supports variable width negative look-behind, it is possible to retrieve the previous word, but JavaScript regex doesn't support negative look-behind!.
(?=pattern) is syntax for look-ahead.
Appendix
String.match can't be used here since it ignores the capturing group when g flag is used. The capturing group is necessary in the regex, as we need look-around to avoid consuming input and match overlapping text.
It can be done without regexp
"Mary had a little lamb".split(" ")
.map(function(item, idx, arr) {
if(idx < arr.length - 1){
return item + " " + arr[idx + 1];
}
}).filter(function(item) {return item;})
Here's a non-regex solution (it's not really a regular problem).
function pairs(str) {
var parts = str.split(" "), out = [];
for (var i=0; i < parts.length - 1; i++)
out.push([parts[i], parts[i+1]].join(' '));
return out;
}
Pass your string and you get an array back.
demo
Side note: if you're worried about non-words in your input (making a case for regular expressions!) you can run tests on parts[i] and parts[i+1] inside the for loop. If the tests fail: don't push them onto out.
A way that you could like could be this one:
var s = "Mary had a little lamb";
// Break on each word and loop
s.match(/\w+/g).map(function(w) {
// Get the word, a space and another word
return s.match(new RegExp(w + '\\s\\w+'));
// At this point, there is one "null" value (the last word), so filter it out
}).filter(Boolean)
// There, we have an array of matches -- we want the matched value, i.e. the first element
.map(Array.prototype.shift.call.bind(Array.prototype.shift));
If you run this in your console, you'll see ["Mary had", "had a", "a little", "little lamb"].
With this way, you keep your original regex and can do the other stuff you want in it. Although with some code around it to make it really work.
By the way, this code is not cross-browser. The following functions are not supported in IE8 and below:
Array.prototype.filter
Array.prototype.map
Function.prototype.bind
But they're easily shimmable. Or the same functionality is easily achievable with for.
Here we go:
You still don't know how the regular expression internal pointer really works, so I will explain it to you with a little example:
Mary had a little lamb with this regex /[A-Za-z]+\s[A-Za-z]+/g
Here, the first part of the regex: [A-Za-z]+ will match Mary so the pointer will be at the end of the y
Mary had a little lamb
^
In the next part (\s[A-Za-z]+) it will match an space followed by another word so...
Mary had a little lamb
^
The pointer will be where the word had ends. So here's your problem, you are increasing the internal pointer of the regular expression without wanting, how is this solved? Lookaround is your friend. With lookarounds (lookahead and lookbehind) you are able to walk through your text without increasing the main internal pointer of the regular expression (it would use another pointer for that).
So at the end, the regular expression that would match what you want would be: ([A-Za-z]+(?=\s[A-Za-z]+))
Explanation:
The only think you dont know about that regular expression is the (?=\s[A-Za-z]+) part, it means that the [A-Za-z]+ must be followed by a word, else the regular expression won't match. And this is exactly what you seem to want because the interal pointer will not be increased and will match everyword but the last one because the last one won't be followed by a word.
Then, once you have that you only have to replace whatever you are done right now.
Here you have a working example, DEMO
In full admiration of the concept of 'look-ahead', I still propose a pairwise function (demo), since it's really Regex's task to tokenize a character stream, and the decision of what to do with the tokens is up to the business logic. At least, that's my opinion.
A shame that Javascript hasn't got a pairwise, yet, but this could do it:
function pairwise(a, f) {
for (var i = 0; i < a.length - 1; i++) {
f(a[i], a[i + 1]);
}
}
var str = "Mary had a little lamb";
pairwise(str.match(/\w+/g), function(a, b) {
document.write("<br>"+a+" "+b);
});
​

Javascript regex match for string "game_1"

I just can't get this thing to work in javascript. So, I have a text "game_1" without the quotes and now i want to get that number out of it and I tried this:
var idText = "game_1";
re = /game_(.*?)/;
found = idText.match(re);
var ajdi = found[1];
alert( ajdi );
But it doesn't work - please point out where am I going wrong.
If you're only matching a number, you may want to try
/game_([0-9]+)/
as your regular expression. That will match at least one number, which seems to be what you need. You entered a regexp that allows for 0 characters (*) and let it select the shortest possible result (?), which may be a problem (and match you 0 characters), depending on the regex engine.
If this is the complete text, then there is no need for regular expressions:
var id = +str.split('_')[1];
or
var id = +str.replace('game_', '');
(unary + is to convert the string to a number)
If you insist on regular expression, you have to anchor the expression:
/^game_(.*?)$/
or make the * greedy by omitting the ?:
/game_(.*)/
Better is to make the expression more restrictive as #Naltharial suggested.
Simple string manipulation:
var idText = "game_1",
adji = parseInt(idText.substring(5), 10);
* means zero or more occurrences. It seems that combining it with a greediness controller ? results in zero match.
You could replace * with + (which means one or more occurrences), but as #Felix Kling notes, it would only match one digit.
Better to ditch the ? completely.
http://jsfiddle.net/G8Qt7/2/
Try "game_1".replace(/^(game_)/, '')
this will return the number
You can simply use this re /\d+/ to get any number inside your string

Regex equivalent to str.substr(0, str.indexOf('foo'))

Given this string:
var str = 'A1=B2;C3,D0*E9+F6-';
I would like to retrieve the substring that goes from the beginning of the string up to 'D0*' (excluding), in this case:
'A1=B2;C3,'
I know how to achieve this using the combination of the substr and indexOf methods:
str.substr(0, str.indexOf('D0*'))
Live demo: http://jsfiddle.net/simevidas/XSu22/
However, this is obviously not the best solution since it contains a redundancy (the str name has to be written twice). This redundancy can be avoided by using the match method together with a regular expression that captures the substring:
str.match(/???/)[1]
Which regular expression literal do we have to pass into match to ensure that the correct substring is returned?
My guess is this: /(.*)D0\*/ (and that works), but my experience with regular expressions is rather limited, so I'm going to need a confirmation...
Try this:
/(.*?)D0\*/.exec(str)[1]
Or:
str.match(/(.*?)D0\*/)[1]
DEMO HERE
? directly following a quantifier makes the quantifier non-greedy (makes it match minimum instead of maximum of the interval defined).
Here's where that's from
/^(.+?)D0\*/
Try it here: http://rubular.com/r/TNTizJLSn9
/^.*(?=D0\*)/
more text to hit character limit...
You can do a number-group, like your example.
/^(.*?)foo/
It mean somethink like:
Store all in group, from start (the 0)
Stop, but don't store on found foo (the indexOf)
After that, you need match and get
'hello foo bar foo bar'.match(/^(.*?)foo/)[1]; // will return "hello "
It mean that will work on str variable and get the first (and unique) number-group existent. The [0] instead [1] mean that will get all matched code.
Bye :)

Using javascript regexp to find the first AND longest match

I have a RegExp like the following simplified example:
var exp = /he|hell/;
When I run it on a string it will give me the first match, fx:
var str = "hello world";
var match = exp.exec(str);
// match contains ["he"];
I want the first and longest possible match,
and by that i mean sorted by index, then length.
Since the expression is combined from an array of RegExp's, I am looking for a way to find the longest match without having to rewrite the regular expression.
Is that even possible?
If it isn't, I am looking for a way to easily analyze the expression, and arrange it in the proper order. But I can't figure out how since the expressions could be a lot more complex, fx:
var exp = /h..|hel*/
How about /hell|he/ ?
All regex implementations I know of will (try to) match characters/patterns from left to right and terminate whenever they find an over-all match.
In other words: if you want to make sure you get the longest possible match, you'll need to try all your patterns (separately), store all matches and then get the longest match from all possible matches.
You can do it. It's explained here:
http://www.regular-expressions.info/alternation.html
(In summary, change the operand order or group with question mark the second part of the search.)
You cannot do "longest match" (or anything involving counting, minus look-aheads) with regular expressions.
Your best bet is to find all matches, and simply compare the lengths in the program.
I don't know if this is what you're looking for (Considering this question is almost 8 years old...), but here's my grain of salt:
(Switching the he for hell will perform the search based on the biggest first)
var exp = /hell|he/;
var str = "hello world";
var match = exp.exec(str);
if(match)
{
match.sort(function(a, b){return b.length - a.length;});
console.log(match[0]);
}
Where match[0] is going to be the longest of all the strings matched.

Categories

Resources