search() string for multiple occurrences - javascript

Say you have the string, Black cat jack black cat jack black cat jack.
How would you use search() to find the 2nd occurence of the word jack?
I'm guessing the code would look something like:
var str = "Black cat jack black cat jack black cat jack";
var jack = str.search('jack');
But that will only return the location of the first occurrence of jack in the string.

you can use indexof method in a loop
var pos = foo.indexOf("jack");
while(pos > -1) {
pos = foo.indexOf("jack", pos+1);
}

Usage recommendation
Note that String.search method works with RegExp - if you supply a string then it will implicitly convert it into a RegExp. It more or less has the same purpose as RegExp.test, where you only want to know whether there is a match to the RegExp in the string.
If you want to search for fixed string, then I recommend that you stick with String.indexOf. If you really want to work with pattern, then you should use RegExp.exec instead to get the indices of all the matches.
String.indexOf
If you are searching for a fixed string, then you can supply the position to resume searching to String.indexOf:
str.indexOf(searchStr, lastMatch + searchStr.length);
I add searchStr.length to prevent overlapping matches, e.g. searching for abab in abababacccc, there will be only 1 match found if I add searchStr.length. Change it to + 1 if you want to find all matches, regardless of overlapping.
Full example:
var lastMatch;
var result = [];
if ((lastMatch = str.indexOf(searchStr)) >= 0) {
result.push(lastMatch);
while ((lastMatch = str.indexOf(searchStr, lastMatch + searchStr.length)) >= 0) {
result.push(lastMatch);
}
}
RegExp.exec
This is to demonstrate the usage. For fixed string, use String.indexOf instead - you don't need the extra overhead with RegExp in fixed string case.
As an example for RegExp.exec:
// Need g flag to search for all occurrences
var re = /jack/g;
var arr;
var result = [];
while ((arr = re.exec(str)) !== null) {
result.push(arr.index);
}
Note that the example above will give you non-overlapping matches. You need to set re.lastIndex if you want to find overlapping matches (no such thing for "jack" as search string, though).

I've figured out this solution -to call the function that searches and replaces the original string recursively, until no more occurrences of the word are found:
function ReplaceUnicodeChars(myString) {
var pos = myString.search("&#");
if (pos != -1) {
// alert("Found unicode char in string " + myString + ", position " + pos);
unicodeChars = myString.substr(pos, 6);
decimalChars = unicodeChars.substr(2, 3);
myString = myString.replace(unicodeChars, String.fromCharCode(decimalChars));
}
if (myString.search("&#") != -1)
// Keep calling the function until there are no more unicode chars
myString = ReplaceUnicodeChars(myString);
return myString;
}

Related

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.
Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]
You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.
Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Find indexOf character after certain index

Pretty basic but I'm afraid I'm overlooking a simple solution. I have the following string ... IBAN: NL56INGB06716xxxxx ...
I need the accountnumber so I'm looking for indexOf("IBAN: ") but now I need to find the next space/whitespace char after that index.
I don't really think I would need a loop for this but it's the best I can come up with. Regex capture group maybe better? How would I do that?
From MDN String.prototype.indexOf
str.indexOf(searchValue[, fromIndex])
fromIndex
Optional. The location within the calling string to start the search from. It can be any integer. The default value is 0.
n.b. .indexOf will only look for a specific substring, if you want to find a choice from many characters, you will either need to loop and compare or use RegExp
Gracious example
var haystack = 'foo_ _IBAN: Bar _ _';
var needle = 'IBAN: ',
i = haystack.indexOf(needle),
j;
if (i === -1) {
// no match, do something special
console.warn('One cannot simply find a needle in a haystack');
}
j = haystack.indexOf(' ', i + needle.length);
// now we have both matches, we can do something fancy
if (j === -1) {
j = haystack.length; // no match, set to end?
}
haystack.slice(i + needle.length, j); // "Bar"
While you can pass a starting index as Paul suggested, it would seem that a simple regex may just be easier.
var re = /IBAN:\s*(\S+)/
The capture group will hold the sequence of non-whitespace characters after the IBAN:
var match = re.exec(my_str)
if (match) {
console.log(match[1]);
}

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

Why is my RegExp ignoring start and end of strings?

I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}

cut out part of a string

Say, I have a string
"hello is it me you're looking for"
I want to cut part of this string out and return the new string, something like
s = string.cut(0,3);
s would now be equal to:
"lo is it me you're looking for"
EDIT: It may not be from 0 to 3. It could be from 5 to 7.
s = string.cut(5,7);
would return
"hellos it me you're looking for"
You're almost there. What you want is:
http://www.w3schools.com/jsref/jsref_substr.asp
So, in your example:
Var string = "hello is it me you're looking for";
s = string.substr(3);
As only providing a start (the first arg) takes from that index to the end of the string.
Update, how about something like:
function cut(str, cutStart, cutEnd){
return str.substr(0,cutStart) + str.substr(cutEnd+1);
}
Use
substring
function
Returns a subset of a string between
one index and another, or through the
end of the string.
substring(indexA, [indexB]);
indexA
An integer between 0 and one less than the length of the string.
indexB
(optional) An integer between 0 and the length of the string.
substring extracts characters from indexA up to but not including indexB. In particular:
* If indexA equals indexB, substring returns an empty string.
* If indexB is omitted, substring extracts characters to the end
of the string.
* If either argument is less than 0 or is NaN, it is treated as if
it were 0.
* If either argument is greater than stringName.length, it is treated as
if it were stringName.length.
If indexA is larger than indexB, then the effect of substring is as if the two arguments were swapped; for example, str.substring(1, 0) == str.substring(0, 1).
Some other more modern alternatives are:
Split and join
function cutFromString(oldStr, fullStr) {
return fullStr.split(oldStr).join('');
}
cutFromString('there ', 'Hello there world!'); // "Hello world!"
Adapted from MDN example
String.replace(), which uses regex. This means it can be more flexible with case sensitivity.
function cutFromString(oldStrRegex, fullStr) {
return fullStr.replace(oldStrRegex, '');
}
cutFromString(/there /i , 'Hello THERE world!'); // "Hello world!"
s = string.cut(5,7);
I'd prefer to do it as a separate function, but if you really want to be able to call it directly on a String from the prototype:
String.prototype.cut= function(i0, i1) {
return this.substring(0, i0)+this.substring(i1);
}
string.substring() is what you want.
Just as a reference for anyone looking for similar function, I have a String.prototype.bisect implementation that splits a string 3-ways using a regex/string delimiter and returns the before,delimiter-match and after parts of the string....
/*
Splits a string 3-ways along delimiter.
Delimiter can be a regex or a string.
Returns an array with [before,delimiter,after]
*/
String.prototype.bisect = function( delimiter){
var i,m,l=1;
if(typeof delimiter == 'string') i = this.indexOf(delimiter);
if(delimiter.exec){
m = this.match(delimiter);
i = m.index;
l = m[0].length
}
if(!i) i = this.length/2;
var res=[],temp;
if(temp = this.substring(0,i)) res.push(temp);
if(temp = this.substr(i,l)) res.push(temp);
if(temp = this.substring(i+l)) res.push(temp);
if(res.length == 3) return res;
return null;
};
/* though one could achieve similar and more optimal results for above with: */
"my string to split and get the before after splitting on and once".split(/and(.+)/,2)
// outputs => ["my string to split ", " get the before after splitting on and once"]
As stated here: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/String/split
If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.
You need to do something like the following:
var s = "I am a string";
var sSubstring = s.substring(2); // sSubstring now equals "am a string".
You have two options about how to go about it:
http://www.quirksmode.org/js/strings.html#substring
http://www.quirksmode.org/js/strings.html#substr
Try the following:
var str="hello is it me you're looking for";
document.write(str.substring(3)+"<br />");
You can check this link
this works well
function stringCutter(str,cutCount,caretPos){
let firstPart = str.substring(0,caretPos-cutCount);
let secondPart = str.substring(caretPos,str.length);
return firstPart + secondPart;
}

Categories

Resources