JavaScript: indexOf vs. Match when Searching Strings? - javascript

Readability aside, are there any discernable differences (performance perhaps) between using
str.indexOf("src")
and
str.match(/src/)
I personally prefer match (and regexp) but colleagues seem to go the other way. We were wondering if it mattered ...?
EDIT:
I should have said at the outset that this is for functions that will be doing partial plain-string matching (to pick up identifiers in class attributes for JQuery) rather than full regexp searches with wildcards etc.
class='redBorder DisablesGuiClass-2345-2d73-83hf-8293'
So it's the difference between:
string.indexOf('DisablesGuiClass-');
and
string.match(/DisablesGuiClass-/)

RegExp is indeed slower than indexOf (you can see it here), though normally this shouldn't be an issue. With RegExp, you also have to make sure the string is properly escaped, which is an extra thing to think about.
Both of those issues aside, if two tools do exactly what you need them to, why not choose the simpler one?

Your comparison may not be entirely fair. indexOf is used with plain strings and is therefore very fast; match takes a regular expression - of course it may be slower in comparison, but if you want to do a regex match, you won't get far with indexOf. On the other hand, regular expression engines can be optimized, and have been improving in performance in the last years.
In your case, where you're looking for a verbatim string, indexOf should be sufficient. There is still one application for regexes, though: If you need to match entire words and want to avoid matching substrings, then regular expressions give you "word boundary anchors". For example:
indexOf('bar')
will find bar three times in bar, fubar, barmy, whereas
match(/\bbar\b/)
will only match bar when it is not part of a longer word.
As you can see in the comments, some comparisons have been done that show that a regex may be faster than indexOf - if it's performance-critical, you may need to profile your code.

Here all possible ways (relatively) to search for string
// 1. includes (introduced in ES6)
var string = "string to search for substring",
substring = "sea";
string.includes(substring);
// 2. string.indexOf
var string = "string to search for substring",
substring = "sea";
string.indexOf(substring) !== -1;
// 3. RegExp: test
var string = "string to search for substring",
expr = /sea/; // no quotes here
expr.test(string);
// 4. string.match
var string = "string to search for substring",
expr = "/sea/";
string.match(expr);
//5. string.search
var string = "string to search for substring",
expr = "/sea/";
string.search(expr);
Here a src: https://koukia.ca/top-6-ways-to-search-for-a-string-in-javascript-and-performance-benchmarks-ce3e9b81ad31
Benchmarks seem to be twisted specially for es6 includes , read the comments.
In resume:
if you don't need the matches.
=> Either you need regex and so use test. Otherwise es6 includes or indexOf. Still test vs indexOf are close.
And for includes vs indexOf:
They seem to be the same : https://jsperf.com/array-indexof-vs-includes/4 (if it was different it would be wierd, they mostly perform the same except for the differences that they expose check this)
And for my own benchmark test. here it is http://jsben.ch/fFnA0
You can test it (it's browser dependent) [test multiple time]
here how it performed (multiple run indexOf and includes one beat the other, and they are close). So they are the same. [here using the same test platform as the article above].
And here for the a long text version (8 times longer)
http://jsben.ch/wSBA2
Tested both chrome and firefox, same thing.
Notice jsben.ch doesn't handle memory overflow (or there limits correctly. It doesn't show any message) so result can get wrong if you add more then 8 text duplication (8 work well). But the conclusion is for very big text all three perform the same way. Otherwise for short indexOf and includes are the same and test a little bit slower. or Can be the same as it seemed in chrome (firefox 60 it is slower).
Notice with jsben.ch: don't freak out if you get inconsistant result. Try different time and see if it's consistent or not. Change browser, sometimes they just run totally wrong. Bug or bad handling of memory. Or something.
ex:
Here too my benchmark on jsperf (better details, and handle graphs for multiple browsers)
(top is chrome)
normal text
https://jsperf.com/indexof-vs-includes-vs-test-2019
resume: includes and indexOf have same perofrmance. test slower.
(seem all three perform the same in chrom)
Long text (12 time longer then normal)
https://jsperf.com/indexof-vs-includes-vs-test-2019-long-text-str/
resume: All the three perform the same. (chrome and firefox)
very short string
https://jsperf.com/indexof-vs-includes-vs-test-2019-too-short-string/
resume: includes and indexOf perform the same and test slower.
Note: about the benchmark above. For the very short string version (jsperf) had an big error for chrome. Seeing by my eyes. around 60 sample was run for both indexOf and includes same way (repeated a lot of time). And test a little bit less and so slower.
don't be fooled with the wrong graph. It's clear wrong. Same test work ok for firefox, surely it's a bug.
Here the illustration: (the first image was the test on firefox)
waaaa. Suddenly indexOf became superman. But as i said i did the test, and looked at the number of samples it was around 60. Both indexOf and includes and they performed the same. A bug on jspref. Except for this one (maybe because of a memory restriction related problem) all the rest was consistent, it give more details. And you see how many simple happen in real time.
Final resume
indexOf vs includes => Same performance
test => can be slower for short strings or text. And the same for long texts. And it make sense for the overhead that the regex engine add. In chrome it seemed it doesn't matter at all.

If you're trying to search for substring occurrences case-insensitively then match seems to be faster than a combination of indexOf and toLowerCase()
Check here - http://jsperf.com/regexp-vs-indexof/152

You ask whether str.indexOf('target') or str.match(/target/) should be preferred. As other posters have suggested, the use cases and return types of these methods are different. The first asks "where in str can I first find 'target'?" The second asks "does str match the regex and, if so, what are all of the matches for any associated capture groups?"
The issue is that neither one technically is designed to ask the simpler question "does the string contain the substring?" There is something that is explicitly designed to do so:
var doesStringContainTarget = /target/.test(str);
There are several advantages to using regex.test(string):
It returns a boolean, which is what you care about
It is more performant than str.match(/target/) (and rivals str.indexOf('target'))
If for some reason, str is undefined or null, you'll get false (the desired result) instead of throwing a TypeError

Using indexOf should, in theory, be faster than a regex when you're just searching for some plain text, but you should do some comparative benchmarks yourself if you're concerned about performance.
If you prefer match and it's fast enough for your needs then go for it.
For what it's worth, I agree with your colleagues on this: I'd use indexOf when searching for a plain string, and use match etc only when I need the extra functionality provided by regular expressions.

Performance wise indexOf will at the very least be slightly faster than match. It all comes down to the specific implementation. When deciding which to use ask yourself the following question:
Will an integer index suffice or do I
need the functionality of a RegExp
match result?

The return values are different
Aside from the performance implications, which are addressed by other answers, it is important to note that the return values for each method are different; so the methods cannot merely be substituted without also changing your logic.
Return value of .indexOf: integer
The index within the calling String object of the first occurrence of the specified value, starting the search at fromIndex.Returns -1 if the value is not found.
Return value of .match: array
An Array containing the entire match result and any parentheses-captured matched results.Returns null if there were no matches.
Because .indexOf returns 0 if the calling string begins with the specified value, a simple truthy test will fail.
For example:
Given this class…
class='DisablesGuiClass-2345-2d73-83hf-8293 redBorder'
…the return values for each would differ:
// returns `0`, evaluates to `false`
if (string.indexOf('DisablesGuiClass-')) {
… // this block is skipped.
}
vs.
// returns `["DisablesGuiClass-"]`, evaluates to `true`
if (string.match(/DisablesGuiClass-/)) {
… // this block is run.
}
The correct way to run a truthy test with the return from .indexOf is to test against -1:
if (string.indexOf('DisablesGuiClass-') !== -1) {
// ^returns `0` ^evaluates to `true`
… // this block is run.
}

remember Internet Explorer 8 doesnt understand indexOf.
But if nobody of your users uses ie8 (google analytics would tell you) than omit this answer.
possible solution to fix ie8:
How to fix Array indexOf() in JavaScript for Internet Explorer browsers

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true
How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)
Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.
I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.

ES6 / JS: Regex for replacing delve with conditional chaining

How to replace delve with conditional chaining in a vs code project?
e.g.
delve(seo,'meta')
delve(item, "image.data.attributes.alternativeText")
desired result
seo?.meta
item?.image.data.attributes.alternativeText
Is it possible using find/replace in Visual Studio Code?
I propose the following RegEx:
delve\(\s*([^,]+?)\s*,\s*['"]([^.]+?)['"]\s*\)
and the following replacement format string:
$1?.$2
Explanation: Match delve(, a first argument up until the first comma (lazy match), and then a second string argument (no care is taken to ensure that the brackets match as this is rather quick'n'dirty anyways), then the closing bracket of the call ). Spacing at reasonable places is accounted for.
which will work for simple cases like delve(someVar, "key") but might fail for pathological cases; always review the replacements manually.
Note that this is explicitly made incapable of dealing with delve(var, "a.b.c") because as far as I know, VSC format strings don't support "joining" variable numbers of captures by a given string. As a workaround, you could explicitly create versions with two, three, four, five... dots and write the corresponding replacements. The version for two dots for example looks as follows:
delve\(([^,]+?)\s*,\s*['"]([^.]+?)\.([^.]+?)['"]\s*\)
and the format string is $1?.$2?.$3.
You write:
e.g.
delve(seo,'meta')
delve(item, "image.data.attributes.alternativeText")
desired result
seo?.meta
item?.image.data.attributes.alternativeText
but I highly doubt that this is intended, because delve(item, "image.data.attributes.alternativeText") is in fact equivalent to item?.image?.data?.attributes?.alternativeText rather than the desired result you describe. To make it handle it that way, simply replace [^.] with . to make it accept strings containing any characters (including dots).

Get an example matched text from a regex pattern [duplicate]

Is there any way of generating random text which satisfies provided regular expression.
I am looking for a function which works like below
var reg = Some Regular Expression
var str = RandString(reg)
I have seen fairly good solutions in perl and ruby on github, but I think there are technical issues that make a complete solution impossible. For example, /[0-9]+/ has an infinite upper bound, which is not practical for selecting random numbers from.
Never seen it in JavaScript, but you could translate.
EDIT: After googling for a few seconds...
https://github.com/fent/randexp.js
if you know what the regular expression is, you can just generate random strings, then use a function that references the index of the letters and changes them as needed. Regex expressions vary widely, so it will be difficult to find one in particular that satisfies all possible regex.
Your question is pretty open so hopefully this steers you to the right solution. Get the current time (in seconds), MD5 it, check it against a REGEX, return the match.
Running Example: http://jsfiddle.net/MattLo/3gKrb/
Usage: RandString(/([A-Za-z])/ig); // expected to be a string
For JavaScript, the following modules can generate a random match to a regex:
pxeger
randexp.js
regexgen

Most Efficient Way to Find Last Occurrence of a Character in a Given Range

I’m fairly new to JavaScript and need to find an efficient way to get the index of the last occurrence of a space character within a given range. I’m not sure if this could be done with RegEx or not; I’m currently doing it with the built in string methods, however, building a new string with the substring method seems like a waste for what I need.
My current solution:
Let n be the end of the range
let spaceIndex = stringText.substr(0, n + 1).lastIndexOf(" ");
Sure, just use the second argument to lastIndexOf -- fromIndex:
let spaceIndex = stringText.lastIndexOf(" ", n + 1)
You should probably use slice or substring instead of substr
Although String.prototype.substr() is not strictly deprecated, it is considered a legacy function and should be avoided when possible. It is not part of the core JavaScript language and may be removed in the future.
Other than that, your code looks fine
You shouldn't change your code (except for the suggestion by Adiga). What you are doing is fine and "seems like a waste" is never a good reason to optimize code.
See this SO Software Engineering discussion on premature optimization.
Code should be clear first, and only optimized if and when you run into problems. Your code looks clear to me.

regular expressions difference between ((?:[^\"])*) and ([^\"]*)

what is the difference between this regular expressions are the replaceable?
((?:[^\"])*)
([^\"]*)
background to this question:
The javascript WYSIWYG editor (tinymce) fails to parse my html code
in Firefox (23.0.1 and 25.0a2) but works in in Chrome.
I found the regular expression to blame:
attrRegExp = /([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:[^\"])*)\")|(?:\'((?:[^\'])*)\')|([^>\s]+)))?/g;
which I modified, replacing
((?:[^\"])*)
with
([^\"]*)
and
((?:[^\'])*)
with
([^\']*)
the resulting regular expression is working in both browsers for my test case
attrRegExp = /([\w:\-]+)(?:\s*=\s*(?:(?:\"([^\"]*)\")|(?:\'([^\']*)\')|([^>\s]+)))?/g
can someone put some light on that?
my test data that only works with the modified regular expression is a big image >700 kb like:
var testdata = '<img alt="" src="data:image/jpeg;base64,/9j/4AAQSkZJRgA...5PmDk4FOGOHy6S3JW120W1uCJ5M0PBa54edOFAc8ePX/2Q==">'
doing something like that to test:
testdata.match(attrRegExp);
especially when the test data is big the unmodified regex is likely to fail in firefox.
You can find the jsfiddle example here:
There should be no difference in the result. So you should be fine.
However, there might be a big difference in how RegExp engines will process these two expressions, and in the case of Firefox/Safari you just proved there actually is ;)
Firefox makes use of WebKit/JavaScriptCore YARR.
YARR imposes an arbitrary, artificial limit, which hits in the non-capturing group variant
// The below limit restricts the number of "recursive" match calls in order to
// avoid spending exponential time on complex regular expressions.
static const unsigned matchLimit = 1000000;
As such Safari is affected as well.
See the relevant Webkit bug and relevant Firefox bug and the nice test case comparing different expression types somebody put together.

Categories

Resources