JS RegExp matched entries count - javascript

Is it possible to get a count of entries that matched of RegExp in JS?
Let's assume an simpliest example:
var pattern =/\bstring\b/g;
var str = "My the best string comes here. Do you want another one? That will be a string too!";
So how to get its count? If I try use a standard method exec():
pattern.exec(str);
...it shows me an array, contains unique matched entry:
["string"]
there is the lenght 1 of this array, but in reality there are 2 points where matched entry was found.

You can achieve using .match() :
var pattern =/\bstring\b/g;
var str = "My the best string comes here. Do you want another one? That will be a string too!";
alert(str.match(pattern).length);
Demo Fiddle

Related

Extract Twitter handlers from string using regex in JavaScript

I Would like to extract the Twitter handler names from a text string, using a regex. I believe I am almost there, except for the ">" that I am including in my output. How can I change my regex to be better, and drop the ">" from my output?
Here is an example of a text string value:
"PlaymakersZA, Absa, DiepslootMTB"
The desired output would be an array consisting of the following:
PlaymakersZA, Absa, DiepslootMTB
Here is an example of my regex:
var array = str.match(/>[a-z-_]+/ig)
Thank you!
You can use match groups in your regex to indicate the part you wish to extract.
I set up this JSFiddle to demonstrate.
Basically, you surround the part of the regex that you want to extract in parenthesis: />([a-z-_]+)/ig, save it as an object, and execute .exec() as long as there are still values. Using index 1 from the resulting array, you can find the first match group's result. Index 0 is the whole regex, and next indices would be subsequent match groups, if available.
var str = "PlaymakersZA, Absa, DiepslootMTB";
var regex = />([a-z-_]+)/ig
var array = regex.exec(str);
while (array != null) {
alert(array[1]);
array = regex.exec(str);
}
You could just strip all the HTML
var str = "PlaymakersZA, Absa, DiepslootMTB";
$handlers = str.replace(/<[^>]*>|\s/g,'').split(",");

How do I get a list of strings ending in a newline or ending in the end of the string in javascript regex?

I'm pretty frustrated with regex right now. Given:
var text = "This is a sentence.\nThis is another sentence\n\nThis is the last sentence!"
I want regex to return to me:
{"This is a sentence.\n", "This is another sentence\n\n", "This is the last sentence!"}
I think i should use
var matches = text.match(/.+[\n+\Z]/)
but \Z doesn't seem to work. Does javascript have an end of string matcher?
You can use the following regex.
var matches = text.match(/.+\n*/g);
Working Demo
Or you could match a newline sequence "one or more" times or the end of the string.
var matches = text.match(/.+(?:\n+|$)/g);
Try this one: /(.+\n*)/g
See it here: http://regex101.com/r/wK8oX3/1
If you wanted an array and didn't want to keep the "\n" around you could do...
var strings = text.split("\n");
which would yield
["This is a sentence.", "This is another sentence", "", "This is the last sentence!"]
if you wanted to get rid of that empty string chain a filter onto the split...
var strings = text.split("\n").filter(function(s){ return s !== ""; });
Maybe not what you want tho, also not as efficient as the regex options already proposed.
Edit: as torazaburo pointed out using Boolean as the filter function is cleaner than a callback.
var strings = text.split("\n").filter(Boolean);
Edit Again: I keep getting one upped, using the /\n+/ expression is even cooler.
var strings = text.split(/\n+/);
To get an array of sentences:
var matches = text.match(/.+?(?:(?:\\n)+|$)/g);
You can try this,
text.match(/.+/g)

Regular Expression to get the last word from TitleCase, camelCase

I'm trying to split a TitleCase (or camelCase) string into precisely two parts using javascript. I know I can split it into multiple parts by using the lookahead:
"StringToSplit".split(/(?=[A-Z])/);
And it will make an array ['String', 'To', 'Split']
But what I need is to break it into precisely TWO parts, to produce an array like this:
['StringTo', 'Split']
Where the second element is always the last word in the TitleCase, and the first element is everything else that precedes it.
Is this what you are looking for ?
"StringToSplit".split(/(?=[A-Z][a-z]+$)/); // ["StringTo", "Split"]
Improved based on lolol answer :
"StringToSplit".split(/(?=[A-Z][^A-Z]+$)/); // ["StringTo", "Split"]
Use it like this:
s = "StringToSplit";
last = s.replace(/^.*?([A-Z][a-z]+)(?=$)/, '$1'); // Split
first = s.replace(last, ''); // StringTo
tok = [first, last]; // ["StringTo", "Split"]
You could use
(function(){
return [this.slice(0,this.length-1).join(''), this[this.length-1]];
}).call("StringToSplit".split(/(?=[A-Z])/));
//=> ["StringTo", "Split"]
In [other] words:
create the Array using split from a String
join a slice of that Array without the last element of that
Array
add that and the last element to a final Array

javascript spilt to get part of the word

I tried use javascript spilt to get part of the word : new from What#a_new%20day
I tried code like this:
<script>
var word="What#a_new%20day";
var newword = word.split("%20", 1).split("_", 2);
alert(newword);
</script>
But caused:
Uncaught TypeError: Object What#a_new has no method 'split'
Maybe there have more wiser way to get the word which I need. So can anyone help me? Thanks.
split returns an array, so the second split is trying to operate on the array returned by the first, rather than a string, which causes a TypeError. You'll also want to add the correct index after the second call to split, or newword will also be an array, not the String you're expecting. Change it to:
var newword = word.split("%20", 1)[0].split("_", 2)[1];
This splits word, then splits the string at index 0 of the resulting array, and assigns the value of the string at index 1 of the new array to newword.
Regex to the rescue
var word="What#a_new%20day";
var newword = word.match(/_(.+)%/)[1];
alert(newword);
this returns the first ([1]) captured group ((...)) in the regex (_(.+)%) which is _ followed by any character (.) one or more times (+) followed by %.
the result of a split is an array, not a string. so what you need to do is
<script>
var word="What#a_new%20day";
var newword = word.split("%20", 1)[0].split("_", 2);
alert(newword);
</script>
notice the [0]
split returns an array:
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/split
word.split("%20", 1);
gives an array so you cannot do :
(result from above).split("_", 2);
If split is what your after, go for it, but performance wise, it would be better to do something like this:
var word="What#a_new%20day";
var newword = word.substr(word.indexOf('new'),3)
alert(newword);
Live example: http://jsfiddle.net/qJ8wM/
Split searches for all instances of %20 in the text, whereas indexOf finds the first instance, and substr is fairly cheap performance wise as well.
JsPerf stats on split vs substring (a general case): http://jsperf.com/split-vs-substring

Regex to extract substring, returning 2 results for some reason

I need to do a lot of regex things in javascript but am having some issues with the syntax and I can't seem to find a definitive resource on this.. for some reason when I do:
var tesst = "afskfsd33j"
var test = tesst.match(/a(.*)j/);
alert (test)
it shows
"afskfsd33j, fskfsd33"
I'm not sure why its giving this output of original and the matched string, I am wondering how I can get it to just give the match (essentially extracting the part I want from the original string)
Thanks for any advice
match returns an array.
The default string representation of an array in JavaScript is the elements of the array separated by commas. In this case the desired result is in the second element of the array:
var tesst = "afskfsd33j"
var test = tesst.match(/a(.*)j/);
alert (test[1]);
Each group defined by parenthesis () is captured during processing and each captured group content is pushed into result array in same order as groups within pattern starts. See more on http://www.regular-expressions.info/brackets.html and http://www.regular-expressions.info/refcapture.html (choose right language to see supported features)
var source = "afskfsd33j"
var result = source.match(/a(.*)j/);
result: ["afskfsd33j", "fskfsd33"]
The reason why you received this exact result is following:
First value in array is the first found string which confirms the entire pattern. So it should definitely start with "a" followed by any number of any characters and ends with first "j" char after starting "a".
Second value in array is captured group defined by parenthesis. In your case group contain entire pattern match without content defined outside parenthesis, so exactly "fskfsd33".
If you want to get rid of second value in array you may define pattern like this:
/a(?:.*)j/
where "?:" means that group of chars which match the content in parenthesis will not be part of resulting array.
Other options might be in this simple case to write pattern without any group because it is not necessary to use group at all:
/a.*j/
If you want to just check whether source text matches the pattern and does not care about which text it found than you may try:
var result = /a.*j/.test(source);
The result should return then only true|false values. For more info see http://www.javascriptkit.com/javatutors/re3.shtml
I think your problem is that the match method is returning an array. The 0th item in the array is the original string, the 1st thru nth items correspond to the 1st through nth matched parenthesised items. Your "alert()" call is showing the entire array.
Just get rid of the parenthesis and that will give you an array with one element and:
Change this line
var test = tesst.match(/a(.*)j/);
To this
var test = tesst.match(/a.*j/);
If you add parenthesis the match() function will find two match for you one for whole expression and one for the expression inside the parenthesis
Also according to developer.mozilla.org docs :
If you only want the first match found, you might want to use
RegExp.exec() instead.
You can use the below code:
RegExp(/a.*j/).exec("afskfsd33j")
I've just had the same problem.
You only get the text twice in your result if you include a match group (in brackets) and the 'g' (global) modifier.
The first item always is the first result, normally OK when using match(reg) on a short string, however when using a construct like:
while ((result = reg.exec(string)) !== null){
console.log(result);
}
the results are a little different.
Try the following code:
var regEx = new RegExp('([0-9]+ (cat|fish))','g'), sampleString="1 cat and 2 fish";
var result = sample_string.match(regEx);
console.log(JSON.stringify(result));
// ["1 cat","2 fish"]
var reg = new RegExp('[0-9]+ (cat|fish)','g'), sampleString="1 cat and 2 fish";
while ((result = reg.exec(sampleString)) !== null) {
console.dir(JSON.stringify(result))
};
// '["1 cat","cat"]'
// '["2 fish","fish"]'
var reg = new RegExp('([0-9]+ (cat|fish))','g'), sampleString="1 cat and 2 fish";
while ((result = reg.exec(sampleString)) !== null){
console.dir(JSON.stringify(result))
};
// '["1 cat","1 cat","cat"]'
// '["2 fish","2 fish","fish"]'
(tested on recent V8 - Chrome, Node.js)
The best answer is currently a comment which I can't upvote, so credit to #Mic.

Categories

Resources