Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters")

Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") - javascript

Splitting a JavaScript string into "characters" can be done trivially but there are problems if you care about Unicode (and you should care about Unicode).
JavaScript natively treats characters as 16-bit entities (UCS-2 or UTF-16) but this does not allow for Unicode characters outside the BMP (Basic Multilingual Plane).
To deal with Unicode characters beyond the BMP, JavaScript must take into account "surrogate pairs", which it does not do natively.
I'm looking for how to split a js string by codepoint, whether the codepoints require one or two JavaScript "characters" (code units).
Depending on your needs, splitting by codepoint might not be enough, and you might want to split by "grapheme cluster", where a cluster is a base codepoint followed by all its non-spacing modifier codepoints, such as combining accents and diacritics.
For the purposes of this question I do not require splitting by grapheme cluster.

#bobince's answer has (luckily) become a bit dated; you can now simply use
var chars = Array.from( text )
to obtain a list of single-codepoint strings which does respect astral / 32bit / surrogate Unicode characters.

Along the lines of #John Frazer's answer, one can use this even succincter form of string iteration:
const chars = [...text]
e.g., with:
const text = 'A\uD835\uDC68B\uD835\uDC69C\uD835\uDC6A'
const chars = [...text] // ["A", "𝑨", "B", "𝑩", "C", "𝑪"]

In ECMAScript 6 you'll be able to use a string as an iterator to get code points, or you could search a string for /./ug, or you could call getCodePointAt(i) repeatedly.
Unfortunately for..of syntax and regexp flags can't be polyfilled and calling a polyfilled getCodePoint() would be super slow (O(n²)), so we can't realistically use this approach for a while yet.
So doing it the manual way:
String.prototype.toCodePoints= function() {
chars = [];
for (var i= 0; i<this.length; i++) {
var c1= this.charCodeAt(i);
if (c1>=0xD800 && c1<0xDC00 && i+1<this.length) {
var c2= this.charCodeAt(i+1);
if (c2>=0xDC00 && c2<0xE000) {
chars.push(0x10000 + ((c1-0xD800)<<10) + (c2-0xDC00));
i++;
continue;
}
}
chars.push(c1);
}
return chars;
}
For the inverse to this see https://stackoverflow.com/a/3759300/18936

Another method using codePointAt:
String.prototype.toCodePoints = function () {
var arCP = [];
for (var i = 0; i < this.length; i += 1) {
var cP = this.codePointAt(i);
arCP.push(cP);
if (cP >= 0x10000) {
i += 1;
}
}
return arCP;
}

Related

Trying to design a WORD SEARCH puzzle with Unicode Letters (TAMIL) Using HTML and JAVASCRIPT [duplicate]

Splitting a JavaScript string into "characters" can be done trivially but there are problems if you care about Unicode (and you should care about Unicode).
JavaScript natively treats characters as 16-bit entities (UCS-2 or UTF-16) but this does not allow for Unicode characters outside the BMP (Basic Multilingual Plane).
To deal with Unicode characters beyond the BMP, JavaScript must take into account "surrogate pairs", which it does not do natively.
I'm looking for how to split a js string by codepoint, whether the codepoints require one or two JavaScript "characters" (code units).
Depending on your needs, splitting by codepoint might not be enough, and you might want to split by "grapheme cluster", where a cluster is a base codepoint followed by all its non-spacing modifier codepoints, such as combining accents and diacritics.
For the purposes of this question I do not require splitting by grapheme cluster.

#bobince's answer has (luckily) become a bit dated; you can now simply use
var chars = Array.from( text )
to obtain a list of single-codepoint strings which does respect astral / 32bit / surrogate Unicode characters.

Along the lines of #John Frazer's answer, one can use this even succincter form of string iteration:
const chars = [...text]
e.g., with:
const text = 'A\uD835\uDC68B\uD835\uDC69C\uD835\uDC6A'
const chars = [...text] // ["A", "𝑨", "B", "𝑩", "C", "𝑪"]

In ECMAScript 6 you'll be able to use a string as an iterator to get code points, or you could search a string for /./ug, or you could call getCodePointAt(i) repeatedly.
Unfortunately for..of syntax and regexp flags can't be polyfilled and calling a polyfilled getCodePoint() would be super slow (O(n²)), so we can't realistically use this approach for a while yet.
So doing it the manual way:
String.prototype.toCodePoints= function() {
chars = [];
for (var i= 0; i<this.length; i++) {
var c1= this.charCodeAt(i);
if (c1>=0xD800 && c1<0xDC00 && i+1<this.length) {
var c2= this.charCodeAt(i+1);
if (c2>=0xDC00 && c2<0xE000) {
chars.push(0x10000 + ((c1-0xD800)<<10) + (c2-0xDC00));
i++;
continue;
}
}
chars.push(c1);
}
return chars;
}
For the inverse to this see https://stackoverflow.com/a/3759300/18936

Another method using codePointAt:
String.prototype.toCodePoints = function () {
var arCP = [];
for (var i = 0; i < this.length; i += 1) {
var cP = this.codePointAt(i);
arCP.push(cP);
if (cP >= 0x10000) {
i += 1;
}
}
return arCP;
}

How do I determine the width of the result of codePointAt?

I'm trying to loop over the Unicode characters in a Javascript string, that I assume is encoded with UTF-16.
It is my understanding that UTF-16 is variable width. That is, a single Unicode character may be split across multiple 16-bit characters. I can use s[i].codePointAt to get the Unicode character beginning at a given code point. But once I have it, how do I know how far to advance i?
Roughly, what is getWidth here? Is it simply c > Math.pow(2, 16)?
for (var i = 0; i < s.length;) {
var c = s.codePointAt(i);
// do some operation with c
i = i + getWidth(c)
}
Is there a standard library function I can use to determine how far to advance? Or a way to iterate over the Unicode code points in a string?

Is there a standard […] way to iterate over the Unicode code points in a string?
Yes, since ES6 you can simply iterate all strings to get the code points:
for (const character of string) {
const codepoint = character.codePointAt(0);
// do some operation with codepoint
}

A simple approach:
for (var i = 0; i < s.length; ++i) {
var c = s.codePointAt(i);
// do some operation with c
if( s.charCodeAt(i) != c) {
++i; // step past the next sixteen bits of the surrogate pair
}
}
(where the value of c is the Unicode codepoint, not the character).
If you want to split the string into an array of Unicode characters you can make use of the string iterator invoked by the spread operator introduced in ES6:
var array = [...s];
In pre-ES6 browsers the start of a surrogate pair can be identified in order to skip the second part:
for (var i = 0; i < s.length; ++i) {
var k = s.charCodeAt(i);
if( k < 0xD800 || k > 0xDBFF) {
var c = s[i]; // character in BMP
}
else {
c = s.substring( i,i+2); // use surrogate pair
++i;
}
// do something with c
console.log(c)
}

See: http://www.unicode.org/glossary/#supplementary_code_point
Basically, if your code point is 0x010000+ you are dealing with multibyte character.
const MIN_SUPPLEMENTARY_CODE_POINT = 0x010000;
function charCount(int codePoint) {
return codePoint >= MIN_SUPPLEMENTARY_CODE_POINT ? 2 : 1;
}

JavaScript predates Unicode and uses another, older system called UCS2, which is very similar but doesn't handle surrogate pairs nor does it understand any characters that can't be represented by two bytes.
If you are stepping through a string looking at codepoints, you can look at the codepoint value itself... if the value is greater than 2^16, you have to advance 2 string characters, otherwise advance 1 string character.
You might try a new ES6 sytax that works really well at splitting up strings into characters, even if those characters are high-order.
// High order unicode character
const k = '💩';
// Takes four bytes
console.log(k.length);
const chars = [...k];
// But its only one character
console.log(chars.length);

Regex matching JS source that's not in a string or regex literal

Do there exist comprehensive regular expressions that, when applied to JavaScript source code, will match all valid string literals (such as "say \"Hello\"") and regex literals (such as /and\/or/)? The expressions would have to cover all edge cases, including line breaks and escape sequences.
Alternatively, does anyone know of regexes for matching patterns outside of string and regex literals?
My goal is to implement a simple JavaScript syntax extension that allows macros in delimeters (e.g. {{#foo.bar}} or ##foo.bar#) to be expanded by a preprocessor. However, I'd like the macros to be processed only outside of literals.
For now, I'm trying to accomplish this using just string replacement, without having to augment an existing JavaScript lexer/parser.
This JavaScript preprocessor will itself be implemented in JavaScript.

This is the regex that I've been using to match quoted strings which is pretty good since it should work with almost all engines since it does not require backtracking or backreferences or any of that voodoo. This will match all text INSIDE literals.
"(\\.|[^"])*"
Depending on the engine, it might support non capturing groups. In that case you can use
"(?:\\.|[^"])*"
and it should be faster.

I think this is too much for regexes.
Consider var foo = "//" // /"(?:\\.|[^"])*"/. Where do the strings, comments and regex literals start and end? You would need to write a complete JavaScript parser to cover all edge cases. Of course, the parser will be using regexes...

I would probably go about doing something like the following. It will need to be improved for certain possible conditions, though.
var str = '"aaa \"sss \\t bbb" sss #3 ss# ((t sdsds)) ff ';
str += '/gg sdfd \/dsds/ {aaa bbb} {{ss}} {#sdsd#}';
var repeating = ['"','\\\'','/','\\~','\\#'];
// "example" 'example' /example/ ~example~ #example#
var enclosing = [];
enclosing.push(['\\{','\\}']);
enclosing.push(['\\{\\{','\\}\\}']);
enclosing.push(['\\[','\\]']);
enclosing.push(['\\(\\(','\\)\\)']);
// {example} {{example}} [example] ((example))
for (var forEnclosing='',i = 0 ; i < enclosing.length; i++) {
var e = enclosing[i];
var r = e[0]+'(\\\\['+e[0]+e[1]+']|[^'+e[0]+e[1]+'])*'+e[1];
forEnclosing += r + (i < enclosing.length-1 ? '|' : '');
}
for (var forRepeating='',i = 0; i < repeating.length; i++) {
var e = repeating[i];
var r = e+'(\\'+e+'|[^'+e+'])*'+e;
forRepeating += r + (i < repeating.length-1 ? '|' : '');
}
var rx = new RegExp('('+forEnclosing+'|'+forRepeating+')','g');
var m = str.match(rx);
try { for (var i = 0; i < m.length; i++) console.log(m[i]) }
catch(e) {}
Outputs:
"aaa "sss \t bbb"
#3 ss#
((t sdsds))
/gg sdfd /dsds/
{aaa bbb}
{{ss}}
{#sdsd#}

The closest you can get with a regex is to have one regex that matches EITHER a string literal (single- or double-quoted) OR a regex OR a comment (OR whatever else might contain bogus matches) OR one of your macro thingies:
"[^"\\]*(?:\\.[^"\\]*)*"
|
'[^'\\]*(?:\\.[^'\\]*)*'
|
/[^/\\]*(?:\\.[^/\\]*)*/[gim]*
|
/\*[^*]*(?:\*(?!/)[^*]*)*\*/
|
##(\w+\.\w+)#
If group #1 contains anything after the match, it must be what you're looking for. Otherwise, ignore this match and go on to the next one.

How do I split a string into an array of characters? [duplicate]

This question already has answers here:
How to get character array from a string?
(14 answers)
Closed 5 years ago.
var s = "overpopulation";
var ar = [];
ar = s.split();
alert(ar);
I want to string.split a word into array of characters.
The above code doesn't seem to work - it returns "overpopulation" as Object..
How do i split it into array of characters, if original string doesn't contain commas and whitespace?

You can split on an empty string:
var chars = "overpopulation".split('');
If you just want to access a string in an array-like fashion, you can do that without split:
var s = "overpopulation";
for (var i = 0; i < s.length; i++) {
console.log(s.charAt(i));
}
You can also access each character with its index using normal array syntax. Note, however, that strings are immutable, which means you can't set the value of a character using this method, and that it isn't supported by IE7 (if that still matters to you).
var s = "overpopulation";
console.log(s[3]); // logs 'r'

Old question but I should warn:
Do NOT use .split('')
You'll get weird results with non-BMP (non-Basic-Multilingual-Plane) character sets.
Reason is that methods like .split() and .charCodeAt() only respect the characters with a code point below 65536; bec. higher code points are represented by a pair of (lower valued) "surrogate" pseudo-characters.
'𝟙𝟚𝟛'.length // —> 6
'𝟙𝟚𝟛'.split('') // —> ["�", "�", "�", "�", "�", "�"]
'😎'.length // —> 2
'😎'.split('') // —> ["�", "�"]
Use ES2015 (ES6) features where possible:
Using the spread operator:
let arr = [...str];
Or Array.from
let arr = Array.from(str);
Or split with the new u RegExp flag:
let arr = str.split(/(?!$)/u);
Examples:
[...'𝟙𝟚𝟛'] // —> ["𝟙", "𝟚", "𝟛"]
[...'😎😜🙃'] // —> ["😎", "😜", "🙃"]
For ES5, options are limited:
I came up with this function that internally uses MDN example to get the correct code point of each character.
function stringToArray() {
var i = 0,
arr = [],
codePoint;
while (!isNaN(codePoint = knownCharCodeAt(str, i))) {
arr.push(String.fromCodePoint(codePoint));
i++;
}
return arr;
}
This requires knownCharCodeAt() function and for some browsers; a String.fromCodePoint() polyfill.
if (!String.fromCodePoint) {
// ES6 Unicode Shims 0.1 , © 2012 Steven Levithan , MIT License
String.fromCodePoint = function fromCodePoint () {
var chars = [], point, offset, units, i;
for (i = 0; i < arguments.length; ++i) {
point = arguments[i];
offset = point - 0x10000;
units = point > 0xFFFF ? [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)] : [point];
chars.push(String.fromCharCode.apply(null, units));
}
return chars.join("");
}
}
Examples:
stringToArray('𝟙𝟚𝟛') // —> ["𝟙", "𝟚", "𝟛"]
stringToArray('😎😜🙃') // —> ["😎", "😜", "🙃"]
Note: str[index] (ES5) and str.charAt(index) will also return weird results with non-BMP charsets. e.g. '😎'.charAt(0) returns "�".
UPDATE: Read this nice article about JS and unicode.

.split('') splits emojis in half.
Onur's solutions work for some emojis, but can't handle more complex languages or combined emojis.
Consider this emoji being ruined:
[..."🏳️‍🌈"] // returns ["🏳", "️", "‍", "🌈"] instead of ["🏳️‍🌈"]
Also consider this Hindi text अनुच्छेद which is split like this:
[..."अनुच्छेद"] // returns ["अ", "न", "ु", "च", "्", "छ", "े", "द"]
but should in fact be split like this:
["अ","नु","च्","छे","द"]
This happens because some of the characters are combining marks (think diacritics/accents in European languages).
You can use the grapheme-splitter library for this:
It does proper standards-based letter split in all the hundreds of exotic edge-cases - yes, there are that many.

It's as simple as:
s.split("");
The delimiter is an empty string, hence it will break up between each single character.

The split() method in javascript accepts two parameters: a separator and a limit.
The separator specifies the character to use for splitting the string. If you don't specify a separator, the entire string is returned, non-separated. But, if you specify the empty string as a separator, the string is split between each character.
Therefore:
s.split('')
will have the effect you seek.
More information here

A string in Javascript is already a character array.
You can simply access any character in the array as you would any other array.
var s = "overpopulation";
alert(s[0]) // alerts o.
UPDATE
As is pointed out in the comments below, the above method for accessing a character in a string is part of ECMAScript 5 which certain browsers may not conform to.
An alternative method you can use is charAt(index).
var s = "overpopulation";
alert(s.charAt(0)) // alerts o.

To support emojis use this
('Dragon 🐉').split(/(?!$)/u);
=> ['D', 'r', 'a', 'g', 'o', 'n', ' ', '🐉']

You can use the regular expression /(?!$)/:
"overpopulation".split(/(?!$)/)
The negative look-ahead assertion (?!$) will match right in front of every character.

Javascript: How to remove characters from end of string? [duplicate]

I have a string, 12345.00, and I would like it to return 12345.0.
I have looked at trim, but it looks like it is only trimming whitespace and slice which I don't see how this would work. Any suggestions?

You can use the substring function:
let str = "12345.00";
str = str.substring(0, str.length - 1);
console.log(str);
This is the accepted answer, but as per the conversations below, the slice syntax is much clearer:
let str = "12345.00";
str = str.slice(0, -1);
console.log(str);

You can use slice! You just have to make sure you know how to use it. Positive #s are relative to the beginning, negative numbers are relative to the end.
js>"12345.00".slice(0,-1)
12345.0

You can use the substring method of JavaScript string objects:
s = s.substring(0, s.length - 4)
It unconditionally removes the last four characters from string s.
However, if you want to conditionally remove the last four characters, only if they are exactly _bar:
var re = /_bar$/;
s.replace(re, "");

The easiest method is to use the slice method of the string, which allows negative positions (corresponding to offsets from the end of the string):
const s = "your string";
const withoutLastFourChars = s.slice(0, -4);
If you needed something more general to remove everything after (and including) the last underscore, you could do the following (so long as s is guaranteed to contain at least one underscore):
const s = "your_string";
const withoutLastChunk = s.slice(0, s.lastIndexOf("_"));
console.log(withoutLastChunk);

For a number like your example, I would recommend doing this over substring:
console.log(parseFloat('12345.00').toFixed(1));
Do note that this will actually round the number, though, which I would imagine is desired but maybe not:
console.log(parseFloat('12345.46').toFixed(1));

Be aware that String.prototype.{ split, slice, substr, substring } operate on UTF-16 encoded strings
None of the previous answers are Unicode-aware.
Strings are encoded as UTF-16 in most modern JavaScript engines, but higher Unicode code points require surrogate pairs, so older, pre-existing string methods operate on UTF-16 code units, not Unicode code points.
See: Do NOT use .split('').
const string = "ẞ🦊";
console.log(string.slice(0, -1)); // "ẞ\ud83e"
console.log(string.substr(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.substring(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.replace(/.$/, "")); // "ẞ\ud83e"
console.log(string.match(/(.*).$/)[1]); // "ẞ\ud83e"
const utf16Chars = string.split("");
utf16Chars.pop();
console.log(utf16Chars.join("")); // "ẞ\ud83e"
In addition, RegExp methods, as suggested in older answers, don’t match line breaks at the end:
const string = "Hello, world!\n";
console.log(string.replace(/.$/, "").endsWith("\n")); // true
console.log(string.match(/(.*).$/) === null); // true
Use the string iterator to iterate characters
Unicode-aware code utilizes the string’s iterator; see Array.from and ... spread.
string[Symbol.iterator] can be used (e.g. instead of string) as well.
Also see How to split Unicode string to characters in JavaScript.
Examples:
const string = "ẞ🦊";
console.log(Array.from(string).slice(0, -1).join("")); // "ẞ"
console.log([
...string
].slice(0, -1).join("")); // "ẞ"
Use the s and u flags on a RegExp
The dotAll or s flag makes . match line break characters, the unicode or u flag enables certain Unicode-related features.
Note that, when using the u flag, you eliminate unnecessary identity escapes, as these are invalid in a u regex, e.g. \[ is fine, as it would start a character class without the backslash, but \: isn’t, as it’s a : with or without the backslash, so you need to remove the backslash.
Examples:
const unicodeString = "ẞ🦊",
lineBreakString = "Hello, world!\n";
console.log(lineBreakString.replace(/.$/s, "").endsWith("\n")); // false
console.log(lineBreakString.match(/(.*).$/s) === null); // false
console.log(unicodeString.replace(/.$/su, "")); // ẞ
console.log(unicodeString.match(/(.*).$/su)[1]); // ẞ
// Now `split` can be made Unicode-aware:
const unicodeCharacterArray = unicodeString.split(/(?:)/su),
lineBreakCharacterArray = lineBreakString.split(/(?:)/su);
unicodeCharacterArray.pop();
lineBreakCharacterArray.pop();
console.log(unicodeCharacterArray.join("")); // "ẞ"
console.log(lineBreakCharacterArray.join("").endsWith("\n")); // false
Note that some graphemes consist of more than one code point, e.g. 🏳️‍🌈 which consists of the sequence 🏳 (U+1F3F3), VS16 (U+FE0F), ZWJ (U+200D), 🌈 (U+1F308).
Here, even Array.from will split this into four “characters”.
Matching those is made easier with the RegExp set notation and properties of strings proposal.

Using JavaScript's slice function:
let string = 'foo_bar';
string = string.slice(0, -4); // Slice off last four characters here
console.log(string);
This could be used to remove '_bar' at end of a string, of any length.

A regular expression is what you are looking for:
let str = "foo_bar";
console.log(str.replace(/_bar$/, ""));

Try this:
const myString = "Hello World!";
console.log(myString.slice(0, -1));

Performance
Today 2020.05.13 I perform tests of chosen solutions on Chrome v81.0, Safari v13.1 and Firefox v76.0 on MacOs High Sierra v10.13.6.
Conclusions
the slice(0,-1)(D) is fast or fastest solution for short and long strings and it is recommended as fast cross-browser solution
solutions based on substring (C) and substr(E) are fast
solutions based on regular expressions (A,B) are slow/medium fast
solutions B, F and G are slow for long strings
solution F is slowest for short strings, G is slowest for long strings
Details
I perform two tests for solutions A, B, C, D, E(ext), F, G(my)
for 8-char short string (from OP question) - you can run it HERE
for 1M long string - you can run it HERE
Solutions are presented in below snippet
function A(str) {
return str.replace(/.$/, '');
}
function B(str) {
return str.match(/(.*).$/)[1];
}
function C(str) {
return str.substring(0, str.length - 1);
}
function D(str) {
return str.slice(0, -1);
}
function E(str) {
return str.substr(0, str.length - 1);
}
function F(str) {
let s= str.split("");
s.pop();
return s.join("");
}
function G(str) {
let s='';
for(let i=0; i<str.length-1; i++) s+=str[i];
return s;
}
// ---------
// TEST
// ---------
let log = (f)=>console.log(`${f.name}: ${f("12345.00")}`);
[A,B,C,D,E,F,G].map(f=>log(f));
This snippet only presents soutions
Here are example results for Chrome for short string

Use regex:
let aStr = "12345.00";
aStr = aStr.replace(/.$/, '');
console.log(aStr);

How about:
let myString = "12345.00";
console.log(myString.substring(0, myString.length - 1));

1. (.*), captures any character multiple times:
console.log("a string".match(/(.*).$/)[1]);
2. ., matches last character, in this case:
console.log("a string".match(/(.*).$/));
3. $, matches the end of the string:
console.log("a string".match(/(.*).{2}$/)[1]);

https://stackoverflow.com/questions/34817546/javascript-how-to-delete-last-two-characters-in-a-string
Just use trim if you don't want spaces
"11.01 °C".slice(0,-2).trim()

Here is an alternative that i don't think i've seen in the other answers, just for fun.
var strArr = "hello i'm a string".split("");
strArr.pop();
document.write(strArr.join(""));
Not as legible or simple as slice or substring but does allow you to play with the string using some nice array methods, so worth knowing.

debris = string.split("_") //explode string into array of strings indexed by "_"
debris.pop(); //pop last element off the array (which you didn't want)
result = debris.join("_"); //fuse the remainng items together like the sun

If you want to do generic rounding of floats, instead of just trimming the last character:
var float1 = 12345.00,
float2 = 12345.4567,
float3 = 12345.982;
var MoreMath = {
/**
* Rounds a value to the specified number of decimals
* #param float value The value to be rounded
* #param int nrDecimals The number of decimals to round value to
* #return float value rounded to nrDecimals decimals
*/
round: function (value, nrDecimals) {
var x = nrDecimals > 0 ? 10 * parseInt(nrDecimals, 10) : 1;
return Math.round(value * x) / x;
}
}
MoreMath.round(float1, 1) => 12345.0
MoreMath.round(float2, 1) => 12345.5
MoreMath.round(float3, 1) => 12346.0
EDIT: Seems like there exists a built in function for this, as Paolo points out. That solution is obviously much cleaner than mine. Use parseFloat followed by toFixed

if(str.substring(str.length - 4) == "_bar")
{
str = str.substring(0, str.length - 4);
}

Via slice(indexStart, indexEnd) method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let str = "12345.00";
let a = str.slice(0, str.length -1)
console.log(a, "<= a");
console.log(str, "<= str is NOT changed");
Via Regular Expression method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let regExp = /.$/g
let b = str.replace(regExp,"")
console.log(b, "<= b");
console.log(str, "<= str is NOT changed");
Via array.splice() method -> this only works on arrays, and it CHANGES, the existing array (so careful with this one), you'll need to convert a string to an array first, then back.
console.clear();
let str = "12345.00";
let strToArray = str.split("")
console.log(strToArray, "<= strToArray");
let spliceMethod = strToArray.splice(str.length-1, 1)
str = strToArray.join("")
console.log(str, "<= str is changed now");

In cases where you want to remove something that is close to the end of a string (in case of variable sized strings) you can combine slice() and substr().
I had a string with markup, dynamically built, with a list of anchor tags separated by comma. The string was something like:
var str = "<a>text 1,</a><a>text 2,</a><a>text 2.3,</a><a>text abc,</a>";
To remove the last comma I did the following:
str = str.slice(0, -5) + str.substr(-4);

You can, in fact, remove the last arr.length - 2 items of an array using arr.length = 2, which if the array length was 5, would remove the last 3 items.
Sadly, this does not work for strings, but we can use split() to split the string, and then join() to join the string after we've made any modifications.
var str = 'string'
String.prototype.removeLast = function(n) {
var string = this.split('')
string.length = string.length - n
return string.join('')
}
console.log(str.removeLast(3))

Try to use toFixed
const str = "12345.00";
return (+str).toFixed(1);

Try this:
<script>
var x="foo_foo_foo_bar";
for (var i=0; i<=x.length; i++) {
if (x[i]=="_" && x[i+1]=="b") {
break;
}
else {
document.write(x[i]);
}
}
</script>
You can also try the live working example on http://jsfiddle.net/informativejavascript/F7WTn/87/.

#Jason S:
You can use slice! You just have to
make sure you know how to use it.
Positive #s are relative to the
beginning, negative numbers are
relative to the end.
js>"12345.00".slice(0,-1)
12345.0
Sorry for my graphomany but post was tagged 'jquery' earlier. So, you can't use slice() inside jQuery because slice() is jQuery method for operations with DOM elements, not substrings ...
In other words answer #Jon Erickson suggest really perfect solution.
However, your method will works out of jQuery function, inside simple Javascript.
Need to say due to last discussion in comments, that jQuery is very much more often renewable extension of JS than his own parent most known ECMAScript.
Here also exist two methods:
as our:
string.substring(from,to) as plus if 'to' index nulled returns the rest of string. so:
string.substring(from) positive or negative ...
and some other - substr() - which provide range of substring and 'length' can be positive only:
string.substr(start,length)
Also some maintainers suggest that last method string.substr(start,length) do not works or work with error for MSIE.

Use substring to get everything to the left of _bar. But first you have to get the instr of _bar in the string:
str.substring(3, 7);
3 is that start and 7 is the length.

Develop Reference

JavaScript is the programming language of the Web.

Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") - javascript

#bobince's answer has (luckily) become a bit dated; you can now simply use var chars = Array.from( text ) to obtain a list of single-codepoint strings which does respect astral / 32bit / surrogate Unicode characters.

Along the lines of #John Frazer's answer, one can use this even succincter form of string iteration: const chars = [...text] e.g., with: const text = 'A\uD835\uDC68B\uD835\uDC69C\uD835\uDC6A' const chars = [...text] // ["A", "𝑨", "B", "𝑩", "C", "𝑪"]

Another method using codePointAt: String.prototype.toCodePoints = function () { var arCP = []; for (var i = 0; i < this.length; i += 1) { var cP = this.codePointAt(i); arCP.push(cP); if (cP >= 0x10000) { i += 1; } } return arCP; }

Related

Trying to design a WORD SEARCH puzzle with Unicode Letters (TAMIL) Using HTML and JAVASCRIPT [duplicate]

How do I determine the width of the result of codePointAt?

Regex matching JS source that's not in a string or regex literal

How do I split a string into an array of characters? [duplicate]

Javascript: How to remove characters from end of string? [duplicate]

Categories

Resources