Why does
[].reverse.call("string");
fails (error in both firefox and ie, returns the original string in chrome) while calling all other arrays methods on a string work ?
>>> [].splice.call("string",3)
["i", "n", "g"]
>>> [].map.call("string",function (a) {return a +a;} )
["ss", "tt", "rr", "ii", "nn", "gg"]
Because .reverse() modifies an Array, and strings are immutable.
You could borrow Array.prototype.slice to convert to an Array, then reverse and join it.
var s = "string";
var s2 = [].slice.call(s).reverse().join('');
Just be aware that in older versions of IE, you can't manipulate a string like an Array.
The following technique (or similar) is commonly used to reverse a string in JavaScript:
// Don’t use this!
var naiveReverse = function(string) {
return string.split('').reverse().join('');
}
In fact, all the answers posted so far are a variation of this pattern. However, there are some problems with this solution. For example:
naiveReverse('foo 𝌆 bar');
// → 'rab �� oof'
// Where did the `𝌆` symbol go? Whoops!
If you’re wondering why this happens, read up on JavaScript’s internal character encoding. (TL;DR: 𝌆 is an astral symbol, and JavaScript exposes it as two separate code units.)
But there’s more:
// To see which symbols are being used here, check:
// http://mothereff.in/js-escapes#1ma%C3%B1ana%20man%CC%83ana
naiveReverse('mañana mañana');
// → 'anãnam anañam'
// Wait, so now the tilde is applied to the `a` instead of the `n`? WAT.
A good string to test string reverse implementations is the following:
'foo 𝌆 bar mañana mañana'
Why? Because it contains an astral symbol (𝌆) (which are represented by surrogate pairs in JavaScript) and a combining mark (the ñ in the last mañana actually consists of two symbols: U+006E LATIN SMALL LETTER N and U+0303 COMBINING TILDE).
The order in which surrogate pairs appear cannot be reversed, else the astral symbol won’t show up anymore in the ‘reversed’ string. That’s why you saw those �� marks in the output for the previous example.
Combining marks always get applied to the previous symbol, so you have to treat both the main symbol (U+006E LATIN SMALL LETTER N) as the combining mark (U+0303 COMBINING TILDE) as a whole. Reversing their order will cause the combining mark to be paired with another symbol in the string. That’s why the example output had ã instead of ñ.
Hopefully, this explains why all the answers posted so far are wrong.
To answer your initial question — how to [properly] reverse a string in JavaScript —, I’ve written a small JavaScript library that is capable of Unicode-aware string reversal. It doesn’t have any of the issues I just mentioned. The library is called Esrever; its code is on GitHub, and it works in pretty much any JavaScript environment. It comes with a shell utility/binary, so you can easily reverse strings from your terminal if you want.
var input = 'foo 𝌆 bar mañana mañana';
esrever.reverse(input);
// → 'anañam anañam rab 𝌆 oof'
See #am not i am's answer for why it doesn't work. However, if you want to know how to accomplish this, convert it to an array first:
"string".split('').reverse().join(''); // "gnirts"
Related
First off I been searching the web for this solution.
How to:
<''.split('');
> ['','','']
Simply express of what I'll like to do. But also with other Unicode characters like poo.
As explained in JavaScript has a Unicode problem, in ES6 you can do this quite easily by using the new ... spread operator. This causes the string iterator (another new ES6 feature) to be used internally, and because that iterator is designed to deal with code points rather than UCS-2/UTF-16 code units, it works the way you want:
console.log([...'💩💩']);
// → ['💩', '💩']
Try it out here: https://babeljs.io/repl/#?experimental=true&evaluate=true&loose=false&spec=false&code=console.log%28%0A%20%20%5B%2e%2e%2e%27%F0%9F%92%A9%F0%9F%92%A9%27%5D%0A%29%3B
A more generic solution:
function splitStringByCodePoint(string) {
return [...string];
}
console.log(splitStringByCodePoint('💩💩'));
// → ['💩', '💩']
for ... of could loop through string contains unicode characters,
let string = "😀😃😄😁😆😅🤣😂🙂🙃😉😊😇"
for(var c of string)
console.log(c);
The above solutions work well for simple emojis, but not for the one from an extended set and the ones that use Surrogate Pairs
For example:
splitStringByCodePoint("❤️")
// Returns: [ "❤", "️" ]
To handle these cases properly you'll need a purpose-built library, like for example:
https://github.com/dotcypress/runes
https://github.com/essdot/spliddit
First off I been searching the web for this solution.
How to:
<''.split('');
> ['','','']
Simply express of what I'll like to do. But also with other Unicode characters like poo.
As explained in JavaScript has a Unicode problem, in ES6 you can do this quite easily by using the new ... spread operator. This causes the string iterator (another new ES6 feature) to be used internally, and because that iterator is designed to deal with code points rather than UCS-2/UTF-16 code units, it works the way you want:
console.log([...'💩💩']);
// → ['💩', '💩']
Try it out here: https://babeljs.io/repl/#?experimental=true&evaluate=true&loose=false&spec=false&code=console.log%28%0A%20%20%5B%2e%2e%2e%27%F0%9F%92%A9%F0%9F%92%A9%27%5D%0A%29%3B
A more generic solution:
function splitStringByCodePoint(string) {
return [...string];
}
console.log(splitStringByCodePoint('💩💩'));
// → ['💩', '💩']
for ... of could loop through string contains unicode characters,
let string = "😀😃😄😁😆😅🤣😂🙂🙃😉😊😇"
for(var c of string)
console.log(c);
The above solutions work well for simple emojis, but not for the one from an extended set and the ones that use Surrogate Pairs
For example:
splitStringByCodePoint("❤️")
// Returns: [ "❤", "️" ]
To handle these cases properly you'll need a purpose-built library, like for example:
https://github.com/dotcypress/runes
https://github.com/essdot/spliddit
I've recently discussed with a colleague how the separator of String.split is treated internally by JavaScript.
Is the separator always converted into a regular expression? E.g. will calling String.split(",", myvar) convert the "," into a regualar expression matching that string?
Well the answer for your question: "Is the separator always converted into a regular expression?" is:
It depends solely on the implementation. For example if you look at WebKit implementation http://svn.webkit.org/repository/webkit/trunk/Source/JavaScriptCore/runtime/StringPrototype.cpp (find stringProtoFuncSplit) then you see it is not always converted to RegEx. However, this does not imply anything, it is just a matter of implementation
Here's the official writeup over at ecma, but the relevant part is around this section:
8.If separator is a RegExp object (its [[Class]] is "RegExp"), let R = separator; otherwise let R = ToString(separator).
That being said it is the ecma spec, and as Anthony Grist mentioned in the comments, browsers can implement as they want, for instance V8 implements ecma262.
Edit: expanded thought on browser/js engines implementation, it appears the majority implement versions of ecma, as seen on this wiki
Yes, the javascript function split allow you to use regex:
EX:
var str = "I am confused";
str.split(/\s/g)
Str then contains ["I","am","confused"]
separator specifies the character(s) to use for separating the string. The separator is treated as a string or a regular expression. If separator is omitted, the array returned contains one element consisting of the entire string. If separator is an empty string, str is converted to an array of characters.
Please see the below link to know more about this, hope it will help you:
String.prototype.split()
If I give Closure Compiler something like this:
window.array = '0123456789'.split('');
It "compiles" it to this:
window.array="0,1,2,3,4,5,6,7,8,9".split(",");
Now as you can tell, that's bigger. Is there any reason why Closure Compiler is doing this?
I think this is what's going on, but I am by no means certain...
The code that causes the insertion of commas is tryMinimizeStringArrayLiteral in PeepholeSubstituteAlternateSyntax.java.
That method contains a list of characters that are likely to have a low Huffman encoding, and are therefore preferable to split on than other characters. You can see the result of this if you try something like this:
"a b c d e f g".split(" "); //Uncompiled, split on spaces
"a,b,c,d,e,f,g".split(","); //Compiled, split on commas (same size)
The compiler will replace the character you try to split on with one it thinks is favourable. It does so by iterating over the characters of the string and finding the most favourable splitting character that does not occur within the string:
// These delimiters are chars that appears a lot in the program therefore
// probably have a small Huffman encoding.
NEXT_DELIMITER: for (char delimiter : new char[]{',', ' ', ';', '{', '}'}) {
for (String cur : strings) {
if (cur.indexOf(delimiter) != -1) {
continue NEXT_DELIMITER;
}
}
String template = Joiner.on(delimiter).join(strings);
//...
}
In the above snippet you can see the array of characters the compiler claims to be optimal to split on. The comma is first (which is why in my space example above, the spaces have been replaced by commas).
I believe the insertion of commas in the case where the string to split on is the empty string may simply be an oversight. There does not appear to be any special treatment of this case, so it's treated like any other split call and each character is joined with the first appropriate character from the array shown in the above snippet.
Another example of how the compiler deals with the split method:
"a,;b;c;d;e;f;g".split(";"); //Uncompiled, split on semi-colons
"a, b c d e f g".split(" "); //Compiled, split on spaces
This time, since the original string already contains a comma (and we don't want to split on the comma character), the comma can't be chosen from the array of low-Huffman-encoded characters, so the next best choice is selected (the space).
Update
Following some further research into this, it is definitely not a bug. This behaviour is actually by design, and in my opinion it's a very clever little optimisation, when you bear in mind that the Closure compiler tends to favour the speed of the compiled code over size.
Above I mentioned Huffman encoding a couple of times. The Huffman coding algorithm, explained very simply, assigns a weight to each character appearing the the text to be encoded. The weight is based on the frequency with which each character appears. These frequencies are used to build a binary tree, with the most common character at the root. That means the most common characters are quicker to decode, since they are closer to the root of the tree.
And since the Huffman algorithm is a large part of the DEFLATE algorithm used by gzip. So if your web server is configured to use gzip, your users will be benefiting from this clever optimisation.
This issue was fixed on Apr 20, 2012 see revision:
https://code.google.com/p/closure-compiler/source/detail?r=1267364f742588a835d78808d0eef8c9f8ba8161
Ironically, split in the compiled code has nothing to do with split in the source. Consider:
Source : a = ["0","1","2","3","4","5"]
Compiled: a="0,1,2,3,4,5".split(",")
Here, split is just a way to represent long arrays (long enough for sum of all quotes + commas to be longer than split(","") ). So, what's going on in your example? First, the compiler sees a string function applied to a constant and evaluates it right away:
'0123456789'.split('') => ["0","1","2","3","4","5","6","7","8","9"]
At some later point, when generating output, the compiler considers this array to be "long" and writes it in the above "split" form:
["0","1","2","3","4","5","6","7","8","9"] => "0,1,2,3,4,5,6,7,8,9".split(",")
Note that all information about split('') in the source is already lost at this point.
If the source string were shorter, it would be generated in the array array form, without extra splitting:
Source : a = '0123'.split('')
Compiled: a=["0","1","2","3"]
Readability aside, are there any discernable differences (performance perhaps) between using
str.indexOf("src")
and
str.match(/src/)
I personally prefer match (and regexp) but colleagues seem to go the other way. We were wondering if it mattered ...?
EDIT:
I should have said at the outset that this is for functions that will be doing partial plain-string matching (to pick up identifiers in class attributes for JQuery) rather than full regexp searches with wildcards etc.
class='redBorder DisablesGuiClass-2345-2d73-83hf-8293'
So it's the difference between:
string.indexOf('DisablesGuiClass-');
and
string.match(/DisablesGuiClass-/)
RegExp is indeed slower than indexOf (you can see it here), though normally this shouldn't be an issue. With RegExp, you also have to make sure the string is properly escaped, which is an extra thing to think about.
Both of those issues aside, if two tools do exactly what you need them to, why not choose the simpler one?
Your comparison may not be entirely fair. indexOf is used with plain strings and is therefore very fast; match takes a regular expression - of course it may be slower in comparison, but if you want to do a regex match, you won't get far with indexOf. On the other hand, regular expression engines can be optimized, and have been improving in performance in the last years.
In your case, where you're looking for a verbatim string, indexOf should be sufficient. There is still one application for regexes, though: If you need to match entire words and want to avoid matching substrings, then regular expressions give you "word boundary anchors". For example:
indexOf('bar')
will find bar three times in bar, fubar, barmy, whereas
match(/\bbar\b/)
will only match bar when it is not part of a longer word.
As you can see in the comments, some comparisons have been done that show that a regex may be faster than indexOf - if it's performance-critical, you may need to profile your code.
Here all possible ways (relatively) to search for string
// 1. includes (introduced in ES6)
var string = "string to search for substring",
substring = "sea";
string.includes(substring);
// 2. string.indexOf
var string = "string to search for substring",
substring = "sea";
string.indexOf(substring) !== -1;
// 3. RegExp: test
var string = "string to search for substring",
expr = /sea/; // no quotes here
expr.test(string);
// 4. string.match
var string = "string to search for substring",
expr = "/sea/";
string.match(expr);
//5. string.search
var string = "string to search for substring",
expr = "/sea/";
string.search(expr);
Here a src: https://koukia.ca/top-6-ways-to-search-for-a-string-in-javascript-and-performance-benchmarks-ce3e9b81ad31
Benchmarks seem to be twisted specially for es6 includes , read the comments.
In resume:
if you don't need the matches.
=> Either you need regex and so use test. Otherwise es6 includes or indexOf. Still test vs indexOf are close.
And for includes vs indexOf:
They seem to be the same : https://jsperf.com/array-indexof-vs-includes/4 (if it was different it would be wierd, they mostly perform the same except for the differences that they expose check this)
And for my own benchmark test. here it is http://jsben.ch/fFnA0
You can test it (it's browser dependent) [test multiple time]
here how it performed (multiple run indexOf and includes one beat the other, and they are close). So they are the same. [here using the same test platform as the article above].
And here for the a long text version (8 times longer)
http://jsben.ch/wSBA2
Tested both chrome and firefox, same thing.
Notice jsben.ch doesn't handle memory overflow (or there limits correctly. It doesn't show any message) so result can get wrong if you add more then 8 text duplication (8 work well). But the conclusion is for very big text all three perform the same way. Otherwise for short indexOf and includes are the same and test a little bit slower. or Can be the same as it seemed in chrome (firefox 60 it is slower).
Notice with jsben.ch: don't freak out if you get inconsistant result. Try different time and see if it's consistent or not. Change browser, sometimes they just run totally wrong. Bug or bad handling of memory. Or something.
ex:
Here too my benchmark on jsperf (better details, and handle graphs for multiple browsers)
(top is chrome)
normal text
https://jsperf.com/indexof-vs-includes-vs-test-2019
resume: includes and indexOf have same perofrmance. test slower.
(seem all three perform the same in chrom)
Long text (12 time longer then normal)
https://jsperf.com/indexof-vs-includes-vs-test-2019-long-text-str/
resume: All the three perform the same. (chrome and firefox)
very short string
https://jsperf.com/indexof-vs-includes-vs-test-2019-too-short-string/
resume: includes and indexOf perform the same and test slower.
Note: about the benchmark above. For the very short string version (jsperf) had an big error for chrome. Seeing by my eyes. around 60 sample was run for both indexOf and includes same way (repeated a lot of time). And test a little bit less and so slower.
don't be fooled with the wrong graph. It's clear wrong. Same test work ok for firefox, surely it's a bug.
Here the illustration: (the first image was the test on firefox)
waaaa. Suddenly indexOf became superman. But as i said i did the test, and looked at the number of samples it was around 60. Both indexOf and includes and they performed the same. A bug on jspref. Except for this one (maybe because of a memory restriction related problem) all the rest was consistent, it give more details. And you see how many simple happen in real time.
Final resume
indexOf vs includes => Same performance
test => can be slower for short strings or text. And the same for long texts. And it make sense for the overhead that the regex engine add. In chrome it seemed it doesn't matter at all.
If you're trying to search for substring occurrences case-insensitively then match seems to be faster than a combination of indexOf and toLowerCase()
Check here - http://jsperf.com/regexp-vs-indexof/152
You ask whether str.indexOf('target') or str.match(/target/) should be preferred. As other posters have suggested, the use cases and return types of these methods are different. The first asks "where in str can I first find 'target'?" The second asks "does str match the regex and, if so, what are all of the matches for any associated capture groups?"
The issue is that neither one technically is designed to ask the simpler question "does the string contain the substring?" There is something that is explicitly designed to do so:
var doesStringContainTarget = /target/.test(str);
There are several advantages to using regex.test(string):
It returns a boolean, which is what you care about
It is more performant than str.match(/target/) (and rivals str.indexOf('target'))
If for some reason, str is undefined or null, you'll get false (the desired result) instead of throwing a TypeError
Using indexOf should, in theory, be faster than a regex when you're just searching for some plain text, but you should do some comparative benchmarks yourself if you're concerned about performance.
If you prefer match and it's fast enough for your needs then go for it.
For what it's worth, I agree with your colleagues on this: I'd use indexOf when searching for a plain string, and use match etc only when I need the extra functionality provided by regular expressions.
Performance wise indexOf will at the very least be slightly faster than match. It all comes down to the specific implementation. When deciding which to use ask yourself the following question:
Will an integer index suffice or do I
need the functionality of a RegExp
match result?
The return values are different
Aside from the performance implications, which are addressed by other answers, it is important to note that the return values for each method are different; so the methods cannot merely be substituted without also changing your logic.
Return value of .indexOf: integer
The index within the calling String object of the first occurrence of the specified value, starting the search at fromIndex.Returns -1 if the value is not found.
Return value of .match: array
An Array containing the entire match result and any parentheses-captured matched results.Returns null if there were no matches.
Because .indexOf returns 0 if the calling string begins with the specified value, a simple truthy test will fail.
For example:
Given this class…
class='DisablesGuiClass-2345-2d73-83hf-8293 redBorder'
…the return values for each would differ:
// returns `0`, evaluates to `false`
if (string.indexOf('DisablesGuiClass-')) {
… // this block is skipped.
}
vs.
// returns `["DisablesGuiClass-"]`, evaluates to `true`
if (string.match(/DisablesGuiClass-/)) {
… // this block is run.
}
The correct way to run a truthy test with the return from .indexOf is to test against -1:
if (string.indexOf('DisablesGuiClass-') !== -1) {
// ^returns `0` ^evaluates to `true`
… // this block is run.
}
remember Internet Explorer 8 doesnt understand indexOf.
But if nobody of your users uses ie8 (google analytics would tell you) than omit this answer.
possible solution to fix ie8:
How to fix Array indexOf() in JavaScript for Internet Explorer browsers