ES6 / JS: Regex for replacing delve with conditional chaining - javascript

How to replace delve with conditional chaining in a vs code project?
e.g.
delve(seo,'meta')
delve(item, "image.data.attributes.alternativeText")
desired result
seo?.meta
item?.image.data.attributes.alternativeText
Is it possible using find/replace in Visual Studio Code?

I propose the following RegEx:
delve\(\s*([^,]+?)\s*,\s*['"]([^.]+?)['"]\s*\)
and the following replacement format string:
$1?.$2
Explanation: Match delve(, a first argument up until the first comma (lazy match), and then a second string argument (no care is taken to ensure that the brackets match as this is rather quick'n'dirty anyways), then the closing bracket of the call ). Spacing at reasonable places is accounted for.
which will work for simple cases like delve(someVar, "key") but might fail for pathological cases; always review the replacements manually.
Note that this is explicitly made incapable of dealing with delve(var, "a.b.c") because as far as I know, VSC format strings don't support "joining" variable numbers of captures by a given string. As a workaround, you could explicitly create versions with two, three, four, five... dots and write the corresponding replacements. The version for two dots for example looks as follows:
delve\(([^,]+?)\s*,\s*['"]([^.]+?)\.([^.]+?)['"]\s*\)
and the format string is $1?.$2?.$3.
You write:
e.g.
delve(seo,'meta')
delve(item, "image.data.attributes.alternativeText")
desired result
seo?.meta
item?.image.data.attributes.alternativeText
but I highly doubt that this is intended, because delve(item, "image.data.attributes.alternativeText") is in fact equivalent to item?.image?.data?.attributes?.alternativeText rather than the desired result you describe. To make it handle it that way, simply replace [^.] with . to make it accept strings containing any characters (including dots).

Related

[Nearley]: how to parse matching opening and closing tag

I'm trying to parse a very simple language with nearley: you can put a string between matching opening and closing tags, and you can chain some tags. It looks like a kind of XML, but with[ instead of < , with tag always 2 chars long, and without nesting.
[aa]My text[/aa][ab]Another Text[/ab]
But I don't seem to be able to parse correctly this, as I get the grammar should be unambiguous as soon as I have more than one tag.
The grammar that I have right now:
#builtin "string.ne"
#builtin "whitespace.ne"
openAndCloseTag[X] -> "[" $X "]" string "[/" $X "]"
languages -> openAndCloseTag[[a-zA-Z] [a-zA-Z]] (_ openAndCloseTag[[a-zA-Z] [a-zA-Z]]):*
string -> sstrchar:* {% (d) => d[0].join("") %}
And related, Ideally I would like the tags to be case insensitive (eg. [bc]TESt[/BC] would be valid)
Has anyone any idea how we can do that? I wasn't able to find a nearley XML parser example .
Your language is almost too simple to need a parser generator. And at the same time, it is not context free, which makes it difficult to use a parser generator. So it is quite possible that the Nearly parser is not the best tool for you, although it is probably possible to make it work with a bit of hackery.
First things first. You have not actually provided an unambiguous definition of your language, which is why your parser reports an ambiguity. To see the ambiguity, consider the input
[aa]My text[/ab][ab]Another Text[/aa]
That's very similar to your test input; all I did was swap a pair of letters. Now, here's the question: Is that a valid input consisting of a single aa tag? Or is it a syntax error? (That's a serious question. Some definitions of tagging systems like this consider a tag to only be closed by a matching close tag, so that things which look like different tags are considered to be plain text. Such systems would accept the input as a single tagged value.)
The problem is that you define string as sstrchar:*, and if we look at the definition of sstrchar in string.ne, we see (leaving out the postprocessing actions, which are irrelevant):
sstrchar -> [^\\'\n]
| "\\" strescape
| "\\'"
Now, the first possibility is "any character other than a backslash, a single quote or a newline", and it's easy to see that all of the characters in [/ab] are in sstrchar. (It's not clear to me why you chose sstrchar; single quotes don't appear to be special in your language. Or perhaps you just didn't mention their significance.) So a string could extend up to the end of the input. Of course, the syntax requires a closing tag, and the Nearley parser is determined to find a match if there is one. But, in fact, there are two of them. So the parser declares an ambiguity, since it doesn't have any criterion to choose between the two close tags.
And here's where we come up against the issue that your language is not context-free. (Actually, it is context-free in some technical sense, because there are "only" 676 two-letter case-insensitive tags, and it would theoretically be possible to list all 676 possibilities. But I'm guessing you don't want to do that.)
A context-free grammar cannot express a language that insists that two non-terminals expand to the same string. That's the very definition of context-free: if one non-terminal can only match the same input as a previous non-terminal, then
the second non-terminals match is dependent on the context, specifically on the match produced by the first non-terminal. In a context-free grammar, a non-terminal expands to the same thing, regardless of the rest of the text. The context in which the non-terminal appears is not allowed to influence the expansion.
Now, you quite possibly expected that your macro definition:
openAndCloseTag[X] -> "[" $X "]" string "[/" $X "]"
is expressing a context-sensitive match by repeating the $X macro parameter. But it is not by accident that the Nearley documentation describes this construct as a macro. X here refers exactly to the string used in the macro invocation. So when you say:
openAndCloseTag[[a-zA-Z] [a-zA-Z]]
Nearly macro expands that to
"[" [a-zA-Z] [a-zA-Z] "]" string "[/" [a-zA-Z] [a-zA-Z] "]"
and that's what it will use as the grammar production. Observe that the two $X macro parameters were expanded to the same argument, but that doesn't mean that will match the same input text. Each of those subpatterns will independently match any two alphabetic characters. Context-freely.
As I alluded to earlier, you could use this macro to write out the 676 possible tag patterns:
tag -> openAndCloseTag["aa"i]
| openAndCloseTag["ab"i]
| openAndCloseTag["ac"i]
| ...
| openAndCloseTag["zz"i]
If you did that (and you managed to correctly list all of the possibilities) then the parser would not complain about ambiguity as long as you never use the same tag twice in the same input. So it would be ok with both your original input and my altered input (as long as you accept the interpretation that my input is a single tagged object). But it would still report the following as ambiguous:
[aa]My text[/aa][aa]Another Text[/aa]
That's ambiguous because the grammar allows it to be either a single aa tagged string (whose text includes characters which look like close and open tags) or as two consecutive aa tagged strings.
To eliminate the ambiguity you would have to write the string pattern in a way which does not permit internal tags, in the same way that sstrchar doesn't allow internal single quotes. Except, of course, it is not nearly so simple to match a string which doesn't contain a pattern, than to match a string which doesn't contain a single character. It could be done using Nearley, but I really don't think that it's what you want.
Probably your best bet is to use native Javascript regular expressions to match tagged strings. This will prove simpler because Javascript regular expressions are much more powerful than mathematical regular expressions, even allowing the possibility of matching (certain) context-sensitive constructions. You could, for example, use Javascript regular expressions with the Moo lexer, which integrates well into Nearley. Or you could just use the regular expressions directly, since once you match the tagged text, there isn't much else you need to do.
To get you started, here's a simple Javascript regular expression which matches tagged strings with matching case-insensitive labels (the i flag at the end):
/\[([a-zA-Z]{2})\].*?\[\/\1\]/gmi
You can play with it online using Regex 101

Match a word unless it is preceded by an equals sign?

I have the following string
class=use><em>use</em>
that when searched using us I want to transform into
class=use><em><b>us</b>e</em>
I've tried looking at relating answers but I can't quite get it working the way I want it to. I'm especially interested in this answer's callback approach.
Help appreciated
This is a good exercise for writing regular expressions, and here's a possible solution.
"useclass=use><em>use</em>".replace(/([^=]|^)(us)/g, "$1<b>$2</b>");
// returns "<b>us</b>eclass=use><em><b>us</b>e</em>"
([^=]|^) ensures that the prefix of any matched us is either not an equal sign, or it's the start of the string.
As #jamiec pointed out in the comments, if you are using this to parse/modify HTML, just stop right now. It's mathematically impossible to parse a CFG with a regular grammar (even with enhanced JS regexps you will have a bad time trying to achieve that.)
If you can make any assumptions about the structure of your document, you may be better off using an approach that operates on DOM elements directly rather than parsing the whole document with a regex.
Parsing HTML with a regex has certain problems that can be painful to deal with.
var element = document.querySelector('em');
element.innerHTML = element.innerHTML.replace('us', '<b>us</b>');
<div class=use><em>use</em>
</div>
I would first look for any character other than the equals sign [^=] and separate it by parentheses so that I can use it again in my replacement. Then another set of parentheses around the two characters us ought to do it:
var re = /([^=]|^)(us)/
That will give you two capture groups to work with (inside the parentheses), which you can represent with $1 and $2 in your replacement string.
str.replace( /([^=|^])(us)/, '$1<b>$2</b>' );

What is the function of .source in context of this new RegExp

I ran into the below monster of a regex in the wild today. The regex is meant to validate a url.
function superUrlValidation(url) {
return new RegExp(/^/.source + "((.+):\/\/)?" + /(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/.source, "i")
.test(url);
}
I've never seen .source used in a regex like this so I looked it up.
The MDN docs for RegExp.prototype.source states:
The source property returns a String containing the source text of the regexp object, and it doesn't contain the two forward slashes on both sides and any flags.
... and gives this example:
var regex = /fooBar/ig;
console.log(regex.source); // "fooBar", doesn't contain /.../ and "ig".
I understand the MDN example (you're getting the source text of the regex object after it is created, makes sense), but I dont understand how this is being used in the superUrlValidation regex above.
How is the source being used before the regex object is completed and what does this accomplish? I cant find any documentation showing .source being used in this way.
Note that .source is used twice in the regex, at the beginning and the end
Use of .source everywhere in your regex seems totally unnecessary, may be just a trick to avoid double escaping. In fact even use of new RegExp is not needed and you can get away with just the regex literal as this:
var re = /^((.+):\/\/)?(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i;
/^/ is a regex literal, meaning it's a valid regex object in it's own right. This means that /^/.source === "^".
This seems like an arbitrary example of using the source property as this means the author could have just placed a "^" in it's place, or even just put a ^ at the beginning of the next string, and it would have the same effect.
The .source property returns the content of the regex between the forward slashes as you say. so the result of the above is equivalent to this string:
/^((.+):\/\/)?(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i
In JavaScript you can write regexes like this: /matchsomething/ or using the RegExp function/constructor above. It looks like the code you found is the result of someone not know what they were doing. They seem to have taken a few regexes using the literal syntax (i.e /match_here/) and plugged it into the constructor version and stuck them all together.
I can't see any benefit in using the source property this way. I would just use the string version or the constructor version. Or better, find out what the original author intended and write it again or find a respected regex library with the criteria you need.
And, yeah, wow. It's massive.

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

JavaScript: indexOf vs. Match when Searching Strings?

Readability aside, are there any discernable differences (performance perhaps) between using
str.indexOf("src")
and
str.match(/src/)
I personally prefer match (and regexp) but colleagues seem to go the other way. We were wondering if it mattered ...?
EDIT:
I should have said at the outset that this is for functions that will be doing partial plain-string matching (to pick up identifiers in class attributes for JQuery) rather than full regexp searches with wildcards etc.
class='redBorder DisablesGuiClass-2345-2d73-83hf-8293'
So it's the difference between:
string.indexOf('DisablesGuiClass-');
and
string.match(/DisablesGuiClass-/)
RegExp is indeed slower than indexOf (you can see it here), though normally this shouldn't be an issue. With RegExp, you also have to make sure the string is properly escaped, which is an extra thing to think about.
Both of those issues aside, if two tools do exactly what you need them to, why not choose the simpler one?
Your comparison may not be entirely fair. indexOf is used with plain strings and is therefore very fast; match takes a regular expression - of course it may be slower in comparison, but if you want to do a regex match, you won't get far with indexOf. On the other hand, regular expression engines can be optimized, and have been improving in performance in the last years.
In your case, where you're looking for a verbatim string, indexOf should be sufficient. There is still one application for regexes, though: If you need to match entire words and want to avoid matching substrings, then regular expressions give you "word boundary anchors". For example:
indexOf('bar')
will find bar three times in bar, fubar, barmy, whereas
match(/\bbar\b/)
will only match bar when it is not part of a longer word.
As you can see in the comments, some comparisons have been done that show that a regex may be faster than indexOf - if it's performance-critical, you may need to profile your code.
Here all possible ways (relatively) to search for string
// 1. includes (introduced in ES6)
var string = "string to search for substring",
substring = "sea";
string.includes(substring);
// 2. string.indexOf
var string = "string to search for substring",
substring = "sea";
string.indexOf(substring) !== -1;
// 3. RegExp: test
var string = "string to search for substring",
expr = /sea/; // no quotes here
expr.test(string);
// 4. string.match
var string = "string to search for substring",
expr = "/sea/";
string.match(expr);
//5. string.search
var string = "string to search for substring",
expr = "/sea/";
string.search(expr);
Here a src: https://koukia.ca/top-6-ways-to-search-for-a-string-in-javascript-and-performance-benchmarks-ce3e9b81ad31
Benchmarks seem to be twisted specially for es6 includes , read the comments.
In resume:
if you don't need the matches.
=> Either you need regex and so use test. Otherwise es6 includes or indexOf. Still test vs indexOf are close.
And for includes vs indexOf:
They seem to be the same : https://jsperf.com/array-indexof-vs-includes/4 (if it was different it would be wierd, they mostly perform the same except for the differences that they expose check this)
And for my own benchmark test. here it is http://jsben.ch/fFnA0
You can test it (it's browser dependent) [test multiple time]
here how it performed (multiple run indexOf and includes one beat the other, and they are close). So they are the same. [here using the same test platform as the article above].
And here for the a long text version (8 times longer)
http://jsben.ch/wSBA2
Tested both chrome and firefox, same thing.
Notice jsben.ch doesn't handle memory overflow (or there limits correctly. It doesn't show any message) so result can get wrong if you add more then 8 text duplication (8 work well). But the conclusion is for very big text all three perform the same way. Otherwise for short indexOf and includes are the same and test a little bit slower. or Can be the same as it seemed in chrome (firefox 60 it is slower).
Notice with jsben.ch: don't freak out if you get inconsistant result. Try different time and see if it's consistent or not. Change browser, sometimes they just run totally wrong. Bug or bad handling of memory. Or something.
ex:
Here too my benchmark on jsperf (better details, and handle graphs for multiple browsers)
(top is chrome)
normal text
https://jsperf.com/indexof-vs-includes-vs-test-2019
resume: includes and indexOf have same perofrmance. test slower.
(seem all three perform the same in chrom)
Long text (12 time longer then normal)
https://jsperf.com/indexof-vs-includes-vs-test-2019-long-text-str/
resume: All the three perform the same. (chrome and firefox)
very short string
https://jsperf.com/indexof-vs-includes-vs-test-2019-too-short-string/
resume: includes and indexOf perform the same and test slower.
Note: about the benchmark above. For the very short string version (jsperf) had an big error for chrome. Seeing by my eyes. around 60 sample was run for both indexOf and includes same way (repeated a lot of time). And test a little bit less and so slower.
don't be fooled with the wrong graph. It's clear wrong. Same test work ok for firefox, surely it's a bug.
Here the illustration: (the first image was the test on firefox)
waaaa. Suddenly indexOf became superman. But as i said i did the test, and looked at the number of samples it was around 60. Both indexOf and includes and they performed the same. A bug on jspref. Except for this one (maybe because of a memory restriction related problem) all the rest was consistent, it give more details. And you see how many simple happen in real time.
Final resume
indexOf vs includes => Same performance
test => can be slower for short strings or text. And the same for long texts. And it make sense for the overhead that the regex engine add. In chrome it seemed it doesn't matter at all.
If you're trying to search for substring occurrences case-insensitively then match seems to be faster than a combination of indexOf and toLowerCase()
Check here - http://jsperf.com/regexp-vs-indexof/152
You ask whether str.indexOf('target') or str.match(/target/) should be preferred. As other posters have suggested, the use cases and return types of these methods are different. The first asks "where in str can I first find 'target'?" The second asks "does str match the regex and, if so, what are all of the matches for any associated capture groups?"
The issue is that neither one technically is designed to ask the simpler question "does the string contain the substring?" There is something that is explicitly designed to do so:
var doesStringContainTarget = /target/.test(str);
There are several advantages to using regex.test(string):
It returns a boolean, which is what you care about
It is more performant than str.match(/target/) (and rivals str.indexOf('target'))
If for some reason, str is undefined or null, you'll get false (the desired result) instead of throwing a TypeError
Using indexOf should, in theory, be faster than a regex when you're just searching for some plain text, but you should do some comparative benchmarks yourself if you're concerned about performance.
If you prefer match and it's fast enough for your needs then go for it.
For what it's worth, I agree with your colleagues on this: I'd use indexOf when searching for a plain string, and use match etc only when I need the extra functionality provided by regular expressions.
Performance wise indexOf will at the very least be slightly faster than match. It all comes down to the specific implementation. When deciding which to use ask yourself the following question:
Will an integer index suffice or do I
need the functionality of a RegExp
match result?
The return values are different
Aside from the performance implications, which are addressed by other answers, it is important to note that the return values for each method are different; so the methods cannot merely be substituted without also changing your logic.
Return value of .indexOf: integer
The index within the calling String object of the first occurrence of the specified value, starting the search at fromIndex.Returns -1 if the value is not found.
Return value of .match: array
An Array containing the entire match result and any parentheses-captured matched results.Returns null if there were no matches.
Because .indexOf returns 0 if the calling string begins with the specified value, a simple truthy test will fail.
For example:
Given this class…
class='DisablesGuiClass-2345-2d73-83hf-8293 redBorder'
…the return values for each would differ:
// returns `0`, evaluates to `false`
if (string.indexOf('DisablesGuiClass-')) {
… // this block is skipped.
}
vs.
// returns `["DisablesGuiClass-"]`, evaluates to `true`
if (string.match(/DisablesGuiClass-/)) {
… // this block is run.
}
The correct way to run a truthy test with the return from .indexOf is to test against -1:
if (string.indexOf('DisablesGuiClass-') !== -1) {
// ^returns `0` ^evaluates to `true`
… // this block is run.
}
remember Internet Explorer 8 doesnt understand indexOf.
But if nobody of your users uses ie8 (google analytics would tell you) than omit this answer.
possible solution to fix ie8:
How to fix Array indexOf() in JavaScript for Internet Explorer browsers

Categories

Resources