Regular expression in Javascript: table of positions instead of table of occurrences - javascript

Regular expressions are most powerful. However, the result they return is sometimes useless:
For example:
I want to manage a CSV string using semicolons.
I define a string like:
var data = "John;Paul;Pete;Stuart;George";
If I use the instruction:
var tab = data.match(/;/g)
after what, "tab" contains an array of 4 ";" :
tab[0]=";", tab[1]=";", tab[2]=";", tab[3]=";"
This array is not useful in the present case, because I knew it even before using the regular expression.
Indeed, what I want to do is 2 things:
1stly: Suppress the 4th element (not "Stuart" as "Stuart", but "Stuart" as 4th element)
2ndly: Replace the 3rd element by "Ringo" so as to get back (to where you once belonged!) the following result:
data == "John;Paul;Ringo;George";
In this case, I would greatly prefer to obtain an array giving the positions of semicolons:
tab[0]=4, tab[1]=9, tab[2]=14 tab[3]=21
instead of the useless (in this specific case)
tab[0]=";", tab[1]=";", tab[2]=";", tab[3]=";"
So, here's my question: Is there a way to obtain this numeric array using regular expressions?

To get tab[0]=4, tab[1]=9, tab[2]=14 tab[3]=21, you can do
var tab = [];
var startPos = 0;
var data = "John;Paul;Pete;Stuart;George";
while (true) {
var currentIndex = data.indexOf(";", startPos);
if (currentIndex == -1) {
break;
}
tab.push(currentIndex);
startPos = currentIndex;
}
But if the result wanted is "John;Paul;Ringo;George", you can do
var tab = data.split(';'); // Split the string into an array of strings
tab.splice(3, 1); // Suppress the 4th element
tab[2] = "Ringo"; // Replace the 3rd element by "Ringo"
var str = tab.join(';'); // Join the elements of the array into a string
The second approach is maybe better in your case.
String.split
Array.splice
Array.join

You should try a different approach, using split.
tab = data.split(';') will return an array of the form
tab[0]="John", tab[1]="Paul", tab[2]="Pete", tab[3]="Stuart", tab[4]="George"
You should be able to achieve your goal with this array.

Why use a regex to perform this operation? You have a built-in function split, which can split your string based on the delimiter you pass.
var data = "John;Paul;Pete;Stuart;George";
var temp=data.split(';');
temp[0],temp[1]...

Related

How to check if one element of an array matches another element in same array?

Very new to javascript so bear with me...
I need to check one element of an array(arr[1]), which contains a string, against another element of the same array(arr[0]) to determine if any letters included in element arr[1] are included in arr[0]. Those letters can be in any order, upper or lower case, and don't have to occur the same number of times (i.e. arr[0]="hheyyy" and arr[1]="hey" is fine). This is what i have (which works) but I was curious if anyone has a better/more simple way of doing this? -thanks in advance.
function mutation(arr) {
//splits the array into two separate arrays of individual letters
var newArr0 = arr.join('').toLowerCase().split('').slice(0,arr[0].length);
var newArr1 = arr.join('').toLowerCase().split('').slice(arr[0].length);
var boolArr = [];
//checks each letter of arr1 to see if it is included in any letter of arr0
for(var i = 0; i < newArr1.length; i++)
boolArr.push(newArr0.includes(newArr1[i]));
//results are pushed into an array of boolean values
if (boolArr.indexOf(false) !==-1)
return false; //if any of those values are false return false
else return true;
}
mutation(["hello", "hey"]); //returns false
You could use a regular expression:
function mutationReg(arr) {
return !arr[1].replace(new RegExp('['+arr[0].replace(/(.)/g,'\\\\$1')+']', "gi"), '').length;
}
This escapes every character in the second string with backslash (so it cannot conflict with regular expression syntax), surrounds it with square brackets, and uses that as a search pattern on the first string. Any matches (case-insensitive) are removed from the result, so that only characters are left over that don't occur in the second string. The length of the result is thus an indication on whether there was success or not. Applying the ! to it gives the correct boolean result.
This might not be the fastest solution.
Here is another ES6 alternative using a Set for good performance:
function mutation(arr) {
var chars = new Set([...arr[0].toLowerCase()]);
return [...arr[1].toLowerCase()].every (c => chars.has(c));
}
You can use Array.from() to convert string to an array, Array.prototype.every(), String.prototype.indexOf() to check if every charactcer in string converted to array is contained in string of other array element.
var arr = ["abc", "cab"];
var bool = Array.from(arr[0]).every(el => arr[1].indexOf(el) > -1);
console.log(bool);

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

Javascript / jQuery faster alternative to $.inArray when pattern matching strings

I've got a large array of words in Javascript (~100,000), and I'd like to be able to quickly return a subset of them based on a text pattern.
For example, I'd like to return all the words that begin with a pattern so typing hap should give me ["happy", "happiness", "happening", etc, etc], as a result.
If it's possible I'd like to do this without iterating over the entire array.
Something like this is not working fast enough:
// data contains an array of beginnings of words e.g. 'hap'
$.each(data, function(key, possibleWord) {
found = $.inArray(possibleWord, words);
// do something if found
}
Any ideas on how I could quickly reduce the set to possible matches without iterating over the whole word set? The word array is in alphabetical order if that helps.
If you just want to search for prefixes there are data structures just for that, such as the Trie and Ternary search trees
A quick Google search and some promissing Javascrit Trie and autocomplete implementations show up:
http://ejohn.org/blog/javascript-trie-performance-analysis/
Autocomplete using a trie
http://odhyan.com/blog/2010/11/trie-implementation-in-javascript/
I have absolutely no idea if this is any faster (a jsperf test is probably in order...), but you can do it with one giant string and a RegExp search instead of arrays:
var giantStringOfWords = giantArrayOfWords.join(' ');
function searchForBeginning(beginning, str) {
var pattern = new RegExp('\\b' + str + '\\w*'),
matches = str.match(pattern);
return matches;
}
var hapResults = searchForBeginning('hap', giantStringOfWords);
The best approach is to structure the data better. Make an object with keys like "hap". That member holds an array of words (or word suffixes if you want to save space) or a separated string of words for regexp searching.
This means you will have shorter objects to iterate/search. Another way is to sort the arrays and use a binary search pattern. There's a good conversation about techniques and optimizations here: http://ejohn.org/blog/revised-javascript-dictionary-search/
I suppose that using raw javascript can help a bit, you can do:
var arr = ["happy", "happiness", "nothere", "notHereEither", "happening"], subset = [];
for(var i = 0, len = arr.length; i < len; i ++) {
if(arr[i].search("hap") !== -1) {
subset.push(arr[i]);
}
}
//subset === ["happy", "happiness","happening"]
Also, if the array is ordered you could break early if the first letter is bigger than the first of your search, instead of looping the entire array.
var data = ['foo', 'happy', 'happiness', 'foohap'];
jQuery.each(data, function(i, item) {
if(item.match(/^hap/))
console.log(item)
});
If you have the data in an array, you're going to have to loop through the whole thing.
A really simple optimization is on page load go through your big words array and make a note of what index ranges apply to each starting letter. E.g., in my example below the "a" words go from 0 to 2, "b" words go from 3 to 4, etc. Then when actually doing a pattern match only look through the applicable range. Although obviously some letters will have more words than others, a given search will only have to look through an average of 100,000/26 words.
// words array assumed to be lowercase and in alphabetical order
var words = ["a","an","and","be","blue","cast","etc."];
// figure out the index for the first and last word starting with
// each letter of the alphabet, so that later searches can use
// just the appropriate range instead of searching the whole array
var letterIndexes = {},
i,
l,
letterIndex = 0,
firstLetter;
for (i=0, l=words.length; i<l; i++) {
if (words[i].charAt(0) === firstLetter)
continue;
if (firstLetter)
letterIndexes[firstLetter] = {first : letterIndex, last : i-1};
letterIndex = i;
firstLetter = words[i].charAt(0);
}
function getSubset(pattern) {
pattern = pattern.toLowerCase()
var subset = [],
fl = pattern.charAt(0),
matched = false;
if (letterIndexes[firstLetter])
for (var i = letterIndexes[fl].first, l = letterIndex[fl].last; i <= l; i++) {
if (pattern === words[i].substr(0, pattern.length)) {
subset.push(words[i]);
matched = true;
} else if (matched) {
break;
}
}
return subset;
}
Note also that when searching through the (range within the) words array, once a match is found I set a flag, which indicates we've gone past all of the words that are alphabetically before the pattern and are now making our way through the matching words. That way as soon as the pattern no longer matches we can break out of the loop. If the pattern doesn't match at all we still end up going through all the words for that first letter though.
Also, if you're doing this as a user types, when letters are added to the end of the pattern you only have to search through the previous subset, not through the whole list.
P.S. Of course if you want to break the word list up by first letter you could easily do that server-side.

split string only on first instance of specified character

In my code I split a string based on _ and grab the second item in the array.
var element = $(this).attr('class');
var field = element.split('_')[1];
Takes good_luck and provides me with luck. Works great!
But, now I have a class that looks like good_luck_buddy. How do I get my javascript to ignore the second _ and give me luck_buddy?
I found this var field = element.split(new char [] {'_'}, 2); in a c# stackoverflow answer but it doesn't work. I tried it over at jsFiddle...
Use capturing parentheses:
'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element
They are defined as
If separator contains capturing parentheses, matched results are returned in the array.
So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).
In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.
We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.
What do you need regular expressions and arrays for?
myString = myString.substring(myString.indexOf('_')+1)
var myString= "hello_there_how_are_you"
myString = myString.substring(myString.indexOf('_')+1)
console.log(myString)
I avoid RegExp at all costs. Here is another thing you can do:
"good_luck_buddy".split('_').slice(1).join('_')
With help of destructuring assignment it can be more readable:
let [first, ...rest] = "good_luck_buddy".split('_')
rest = rest.join('_')
A simple ES6 way to get both the first key and remaining parts in a string would be:
const [key, ...rest] = "good_luck_buddy".split('_')
const value = rest.join('_')
console.log(key, value) // good, luck_buddy
Nowadays String.prototype.split does indeed allow you to limit the number of splits.
str.split([separator[, limit]])
...
limit Optional
A non-negative integer limiting the number of splits. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all.
The array may contain fewer entries than limit if the end of the string is reached before the limit is reached.
If limit is 0, no splitting is performed.
caveat
It might not work the way you expect. I was hoping it would just ignore the rest of the delimiters, but instead, when it reaches the limit, it splits the remaining string again, omitting the part after the split from the return results.
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C"]
I was hoping for:
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B_C_D_E"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C_D_E"]
This solution worked for me
var str = "good_luck_buddy";
var index = str.indexOf('_');
var arr = [str.slice(0, index), str.slice(index + 1)];
//arr[0] = "good"
//arr[1] = "luck_buddy"
OR
var str = "good_luck_buddy";
var index = str.indexOf('_');
var [first, second] = [str.slice(0, index), str.slice(index + 1)];
//first = "good"
//second = "luck_buddy"
You can use the regular expression like:
var arr = element.split(/_(.*)/)
You can use the second parameter which specifies the limit of the split.
i.e:
var field = element.split('_', 1)[1];
Replace the first instance with a unique placeholder then split from there.
"good_luck_buddy".replace(/\_/,'&').split('&')
["good","luck_buddy"]
This is more useful when both sides of the split are needed.
I need the two parts of string, so, regex lookbehind help me with this.
const full_name = 'Maria do Bairro';
const [first_name, last_name] = full_name.split(/(?<=^[^ ]+) /);
console.log(first_name);
console.log(last_name);
Non-regex solution
I ran some benchmarks, and this solution won hugely:1
str.slice(str.indexOf(delim) + delim.length)
// as function
function gobbleStart(str, delim) {
return str.slice(str.indexOf(delim) + delim.length);
}
// as polyfill
String.prototype.gobbleStart = function(delim) {
return this.slice(this.indexOf(delim) + delim.length);
};
Performance comparison with other solutions
The only close contender was the same line of code, except using substr instead of slice.
Other solutions I tried involving split or RegExps took a big performance hit and were about 2 orders of magnitude slower. Using join on the results of split, of course, adds an additional performance penalty.
Why are they slower? Any time a new object or array has to be created, JS has to request a chunk of memory from the OS. This process is very slow.
Here are some general guidelines, in case you are chasing benchmarks:
New dynamic memory allocations for objects {} or arrays [] (like the one that split creates) will cost a lot in performance.
RegExp searches are more complicated and therefore slower than string searches.
If you already have an array, destructuring arrays is about as fast as explicitly indexing them, and looks awesome.
Removing beyond the first instance
Here's a solution that will slice up to and including the nth instance. It's not quite as fast, but on the OP's question, gobble(element, '_', 1) is still >2x faster than a RegExp or split solution and can do more:
/*
`gobble`, given a positive, non-zero `limit`, deletes
characters from the beginning of `haystack` until `needle` has
been encountered and deleted `limit` times or no more instances
of `needle` exist; then it returns what remains. If `limit` is
zero or negative, delete from the beginning only until `-(limit)`
occurrences or less of `needle` remain.
*/
function gobble(haystack, needle, limit = 0) {
let remain = limit;
if (limit <= 0) { // set remain to count of delim - num to leave
let i = 0;
while (i < haystack.length) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain++;
i = found + needle.length;
}
}
let i = 0;
while (remain > 0) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain--;
i = found + needle.length;
}
return haystack.slice(i);
}
With the above definition, gobble('path/to/file.txt', '/') would give the name of the file, and gobble('prefix_category_item', '_', 1) would remove the prefix like the first solution in this answer.
Tests were run in Chrome 70.0.3538.110 on macOSX 10.14.
Use the string replace() method with a regex:
var result = "good_luck_buddy".replace(/.*?_/, "");
console.log(result);
This regex matches 0 or more characters before the first _, and the _ itself. The match is then replaced by an empty string.
Javascript's String.split unfortunately has no way of limiting the actual number of splits. It has a second argument that specifies how many of the actual split items are returned, which isn't useful in your case. The solution would be to split the string, shift the first item off, then rejoin the remaining items::
var element = $(this).attr('class');
var parts = element.split('_');
parts.shift(); // removes the first item from the array
var field = parts.join('_');
Here's one RegExp that does the trick.
'good_luck_buddy' . split(/^.*?_/)[1]
First it forces the match to start from the
start with the '^'. Then it matches any number
of characters which are not '_', in other words
all characters before the first '_'.
The '?' means a minimal number of chars
that make the whole pattern match are
matched by the '.*?' because it is followed
by '_', which is then included in the match
as its last character.
Therefore this split() uses such a matching
part as its 'splitter' and removes it from
the results. So it removes everything
up till and including the first '_' and
gives you the rest as the 2nd element of
the result. The first element is "" representing
the part before the matched part. It is
"" because the match starts from the beginning.
There are other RegExps that work as
well like /_(.*)/ given by Chandu
in a previous answer.
The /^.*?_/ has the benefit that you
can understand what it does without
having to know about the special role
capturing groups play with replace().
if you are looking for a more modern way of doing this:
let raw = "good_luck_buddy"
raw.split("_")
.filter((part, index) => index !== 0)
.join("_")
Mark F's solution is awesome but it's not supported by old browsers. Kennebec's solution is awesome and supported by old browsers but doesn't support regex.
So, if you're looking for a solution that splits your string only once, that is supported by old browsers and supports regex, here's my solution:
String.prototype.splitOnce = function(regex)
{
var match = this.match(regex);
if(match)
{
var match_i = this.indexOf(match[0]);
return [this.substring(0, match_i),
this.substring(match_i + match[0].length)];
}
else
{ return [this, ""]; }
}
var str = "something/////another thing///again";
alert(str.splitOnce(/\/+/)[1]);
For beginner like me who are not used to Regular Expression, this workaround solution worked:
var field = "Good_Luck_Buddy";
var newString = field.slice( field.indexOf("_")+1 );
slice() method extracts a part of a string and returns a new string and indexOf() method returns the position of the first found occurrence of a specified value in a string.
This should be quite fast
function splitOnFirst (str, sep) {
const index = str.indexOf(sep);
return index < 0 ? [str] : [str.slice(0, index), str.slice(index + sep.length)];
}
console.log(splitOnFirst('good_luck', '_')[1])
console.log(splitOnFirst('good_luck_buddy', '_')[1])
This worked for me on Chrome + FF:
"foo=bar=beer".split(/^[^=]+=/)[1] // "bar=beer"
"foo==".split(/^[^=]+=/)[1] // "="
"foo=".split(/^[^=]+=/)[1] // ""
"foo".split(/^[^=]+=/)[1] // undefined
If you also need the key try this:
"foo=bar=beer".split(/^([^=]+)=/) // Array [ "", "foo", "bar=beer" ]
"foo==".split(/^([^=]+)=/) // [ "", "foo", "=" ]
"foo=".split(/^([^=]+)=/) // [ "", "foo", "" ]
"foo".split(/^([^=]+)=/) // [ "foo" ]
//[0] = ignored (holds the string when there's no =, empty otherwise)
//[1] = hold the key (if any)
//[2] = hold the value (if any)
a simple es6 one statement solution to get the first key and remaining parts
let raw = 'good_luck_buddy'
raw.split('_')
.reduce((p, c, i) => i === 0 ? [c] : [p[0], [...p.slice(1), c].join('_')], [])
You could also use non-greedy match, it's just a single, simple line:
a = "good_luck_buddy"
const [,g,b] = a.match(/(.*?)_(.*)/)
console.log(g,"and also",b)

Grab url parameter with jquery and put into input

I am trying to grab a specific paramater from a url such as www.internets.com?param=123456
I am trying it with something like this..
$j.extend({
getUrlVars: function(){
return window.location.href.slice(window.location.href.indexOf('?')).split(/[&?]{1}[\w\d]+=/);
}
});
var allVars = $j.getUrlVars('param');
The weird thing is the variable is returning a comma so in this example it looks like ,123456
Where is that coming from?!
split returns an array of substrings, so the comma is coming from the serialization of the array.
Have a look here: http://jquery-howto.blogspot.com/2009/09/get-url-parameters-values-with-jquery.html
This seems to work for me.
You're asking the javascript to split the string into an array based on the rules in your regex, so the string "?param=123456" turns into an array where everything up to the = is simply a separator, so it sees two keys: an empty string and 123456.
EDIT - You can still use split, just use a different separator. The indexOf is telling it to look at the substring after the position of the '?', so if you split on '=' it would provide an array where one value is a parameter name (possibly with a '?' or '&', so just remove it) and the next value is the value sent in after the equal sign.
You can also get a little more in depth with your regex and processing like so:
var q = window.location.search; // returns everything after '?'
var regEx = /[^?& =]([\w]*)[^!?& =]/g;
var array = q.match(regEx);
var vars = new Array();
for (var i = 0; i < array.length; i++) {
if ((i % 2) == 0) {
vars[array[i]] = array[i + 1];
}
}
Which will leave you with an array where the keys are the param names, and their values are the associated values from the query string.

Categories

Resources