Special unicode string splitting in Javascript [duplicate]

Special unicode string splitting in Javascript [duplicate] - javascript

First off I been searching the web for this solution.
How to:
<''.split('');
> ['','','']
Simply express of what I'll like to do. But also with other Unicode characters like poo.

As explained in JavaScript has a Unicode problem, in ES6 you can do this quite easily by using the new ... spread operator. This causes the string iterator (another new ES6 feature) to be used internally, and because that iterator is designed to deal with code points rather than UCS-2/UTF-16 code units, it works the way you want:
console.log([...'💩💩']);
// → ['💩', '💩']
Try it out here: https://babeljs.io/repl/#?experimental=true&evaluate=true&loose=false&spec=false&code=console.log%28%0A%20%20%5B%2e%2e%2e%27%F0%9F%92%A9%F0%9F%92%A9%27%5D%0A%29%3B
A more generic solution:
function splitStringByCodePoint(string) {
return [...string];
}
console.log(splitStringByCodePoint('💩💩'));
// → ['💩', '💩']

for ... of could loop through string contains unicode characters,
let string = "😀😃😄😁😆😅🤣😂🙂🙃😉😊😇"
for(var c of string)
console.log(c);

The above solutions work well for simple emojis, but not for the one from an extended set and the ones that use Surrogate Pairs
For example:
splitStringByCodePoint("❤️")
// Returns: [ "❤", "️" ]
To handle these cases properly you'll need a purpose-built library, like for example:
https://github.com/dotcypress/runes
https://github.com/essdot/spliddit

Related

How make a "SELECT LIKE" in arrays javascript [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Emulating SQL LIKE in JavaScript
Is there an operator in JavaScript which is similar to the like operator in SQL? Explanations and examples are appreciated.

You can use regular expressions in Javascript to do pattern matching of strings.
For example:
var s = "hello world!";
if (s.match(/hello.*/)) {
// do something
}
The match() test is much like WHERE s LIKE 'hello%' in SQL.

No.
You want to use: .indexOf("foo") and then check the index. If it's >= 0, it contains that string.

Use the string objects Match method:
// Match a string that ends with abc, similar to LIKE '%abc'
if (theString.match(/^.*abc$/))
{
/*Match found */
}
// Match a string that starts with abc, similar to LIKE 'abc%'
if (theString.match(/^abc.*$/))
{
/*Match found */
}

You can check the String.match() or the String.indexOf() methods.

No there isn't, but you can check out indexOf as a starting point to developing your own, and/or look into regular expressions. It would be a good idea to familiarise yourself with the JavaScript string functions.
EDIT: This has been answered before:
Emulating SQL LIKE in JavaScript

No, there isn't any.
The list of comparison operators are listed here.
Comparison Operators
For your requirement the best option would be regular expressions.

String Split With Unicode

First off I been searching the web for this solution.
How to:
<''.split('');
> ['','','']
Simply express of what I'll like to do. But also with other Unicode characters like poo.

As explained in JavaScript has a Unicode problem, in ES6 you can do this quite easily by using the new ... spread operator. This causes the string iterator (another new ES6 feature) to be used internally, and because that iterator is designed to deal with code points rather than UCS-2/UTF-16 code units, it works the way you want:
console.log([...'💩💩']);
// → ['💩', '💩']
Try it out here: https://babeljs.io/repl/#?experimental=true&evaluate=true&loose=false&spec=false&code=console.log%28%0A%20%20%5B%2e%2e%2e%27%F0%9F%92%A9%F0%9F%92%A9%27%5D%0A%29%3B
A more generic solution:
function splitStringByCodePoint(string) {
return [...string];
}
console.log(splitStringByCodePoint('💩💩'));
// → ['💩', '💩']

for ... of could loop through string contains unicode characters,
let string = "😀😃😄😁😆😅🤣😂🙂🙃😉😊😇"
for(var c of string)
console.log(c);

The above solutions work well for simple emojis, but not for the one from an extended set and the ones that use Surrogate Pairs
For example:
splitStringByCodePoint("❤️")
// Returns: [ "❤", "️" ]
To handle these cases properly you'll need a purpose-built library, like for example:
https://github.com/dotcypress/runes
https://github.com/essdot/spliddit

Javascripts String.split - how does it work internally?

I've recently discussed with a colleague how the separator of String.split is treated internally by JavaScript.
Is the separator always converted into a regular expression? E.g. will calling String.split(",", myvar) convert the "," into a regualar expression matching that string?

Well the answer for your question: "Is the separator always converted into a regular expression?" is:
It depends solely on the implementation. For example if you look at WebKit implementation http://svn.webkit.org/repository/webkit/trunk/Source/JavaScriptCore/runtime/StringPrototype.cpp (find stringProtoFuncSplit) then you see it is not always converted to RegEx. However, this does not imply anything, it is just a matter of implementation

Here's the official writeup over at ecma, but the relevant part is around this section:
8.If separator is a RegExp object (its [[Class]] is "RegExp"), let R = separator; otherwise let R = ToString(separator).
That being said it is the ecma spec, and as Anthony Grist mentioned in the comments, browsers can implement as they want, for instance V8 implements ecma262.
Edit: expanded thought on browser/js engines implementation, it appears the majority implement versions of ecma, as seen on this wiki

Yes, the javascript function split allow you to use regex:
EX:
var str = "I am confused";
str.split(/\s/g)
Str then contains ["I","am","confused"]

separator specifies the character(s) to use for separating the string. The separator is treated as a string or a regular expression. If separator is omitted, the array returned contains one element consisting of the entire string. If separator is an empty string, str is converted to an array of characters.
Please see the below link to know more about this, hope it will help you:
String.prototype.split()

Call [].reverse on a string

Why does
[].reverse.call("string");
fails (error in both firefox and ie, returns the original string in chrome) while calling all other arrays methods on a string work ?
>>> [].splice.call("string",3)
["i", "n", "g"]
>>> [].map.call("string",function (a) {return a +a;} )
["ss", "tt", "rr", "ii", "nn", "gg"]

Because .reverse() modifies an Array, and strings are immutable.
You could borrow Array.prototype.slice to convert to an Array, then reverse and join it.
var s = "string";
var s2 = [].slice.call(s).reverse().join('');
Just be aware that in older versions of IE, you can't manipulate a string like an Array.

The following technique (or similar) is commonly used to reverse a string in JavaScript:
// Don’t use this!
var naiveReverse = function(string) {
return string.split('').reverse().join('');
}
In fact, all the answers posted so far are a variation of this pattern. However, there are some problems with this solution. For example:
naiveReverse('foo 𝌆 bar');
// → 'rab �� oof'
// Where did the `𝌆` symbol go? Whoops!
If you’re wondering why this happens, read up on JavaScript’s internal character encoding. (TL;DR: 𝌆 is an astral symbol, and JavaScript exposes it as two separate code units.)
But there’s more:
// To see which symbols are being used here, check:
// http://mothereff.in/js-escapes#1ma%C3%B1ana%20man%CC%83ana
naiveReverse('mañana mañana');
// → 'anãnam anañam'
// Wait, so now the tilde is applied to the `a` instead of the `n`? WAT.
A good string to test string reverse implementations is the following:
'foo 𝌆 bar mañana mañana'
Why? Because it contains an astral symbol (𝌆) (which are represented by surrogate pairs in JavaScript) and a combining mark (the ñ in the last mañana actually consists of two symbols: U+006E LATIN SMALL LETTER N and U+0303 COMBINING TILDE).
The order in which surrogate pairs appear cannot be reversed, else the astral symbol won’t show up anymore in the ‘reversed’ string. That’s why you saw those �� marks in the output for the previous example.
Combining marks always get applied to the previous symbol, so you have to treat both the main symbol (U+006E LATIN SMALL LETTER N) as the combining mark (U+0303 COMBINING TILDE) as a whole. Reversing their order will cause the combining mark to be paired with another symbol in the string. That’s why the example output had ã instead of ñ.
Hopefully, this explains why all the answers posted so far are wrong.
To answer your initial question — how to [properly] reverse a string in JavaScript —, I’ve written a small JavaScript library that is capable of Unicode-aware string reversal. It doesn’t have any of the issues I just mentioned. The library is called Esrever; its code is on GitHub, and it works in pretty much any JavaScript environment. It comes with a shell utility/binary, so you can easily reverse strings from your terminal if you want.
var input = 'foo 𝌆 bar mañana mañana';
esrever.reverse(input);
// → 'anañam anañam rab 𝌆 oof'

See #am not i am's answer for why it doesn't work. However, if you want to know how to accomplish this, convert it to an array first:
"string".split('').reverse().join(''); // "gnirts"

Javascript Regex: extracting variables from paths

Trying to extract variable names from paths (variable is preceded with : ,optionally enclosed by ()), the number of variables may vary
"foo/bar/:firstVar/:(secondVar)foo2/:thirdVar"
Expected output should be:
['firstVar', 'secondVar', 'thirdVar']
Tried something like
"foo/bar/:firstVar/:(secondVar)foo2/:thirdVar".match(/\:([^/:]\w+)/g)
but it doesnt work (somehow it captures colons & doesnt have optional enclosures), if there is some regex mage around, please help. Thanks a lot in advance!

var path = "foo/bar/:firstVar/:(secondVar)foo2/:thirdVar";
var matches = [];
path.replace(/:\(?(\w+)\)?/g, function(a, b){
matches.push(b)
});
matches; // ["firstVar", "secondVar", "thirdVar"]

What about this:
/\:\(?([A-Za-z0-9_\-]+)\)?/
matches:
:firstVar
:(secondVar)
:thirdVar
$1 contains:
firstVar
secondVar
thirdVar

May I recommend that you look into the URI template specification? It does exactly what you're trying to do, but more elegantly. I don't know of any current URI template parsers for JavaScript, since it's usually a server-side operation, but a minimal implementation would be trivial to write.
Essentially, instead of:
foo/bar/:firstVar/:(secondVar)foo2/:thirdVar
You use:
foo/bar/{firstVar}/{secondVar}foo2/{thirdVar}
Hopefully, it's pretty obvious why this format works better in the case of secondVar. Plus it has the added advantage of being a specification, albeit currently still a draft.

Develop Reference

JavaScript is the programming language of the Web.

Special unicode string splitting in Javascript [duplicate] - javascript

First off I been searching the web for this solution. How to: <''.split(''); > ['','',''] Simply express of what I'll like to do. But also with other Unicode characters like poo.

for ... of could loop through string contains unicode characters, let string = "😀😃😄😁😆😅🤣😂🙂🙃😉😊😇" for(var c of string) console.log(c);

Related

How make a "SELECT LIKE" in arrays javascript [duplicate]

String Split With Unicode

Javascripts String.split - how does it work internally?

Call [].reverse on a string

Javascript Regex: extracting variables from paths

Categories

Resources