JavaScript unicode aware string slice - javascript

I am trying to slice the string containing Unicode characters. but it returns a replacement character. here is my sample code.
let str = '๐’ฝ๐‘’๐“๐“๐‘œ ๐“Œ๐‘œ๐“‡๐“๐’น';
str = str.slice(0, -1);
console.log(str);
which gives me below result
"๐’ฝ๐‘’๐“๐“๐‘œ ๐“Œ๐‘œ๐“‡๐“๏ฟฝ"
how can I get rid of the replacement character?

Try this, it won't break a 4 byte character into 2:
let str = '๐’ฝ๐‘’๐“๐“๐‘œ ๐“Œ๐‘œ๐“‡๐“๐’น';
str = [...str].slice(0, -1).join('');
console.log(str);

That's because your ๐’น is a surrogate pair, which means that it is represented as four bytes (two code units รก 2 bytes, UTF-16). As .slice works on code units (as all other string methods), you need to slice away the pair:
let str = '๐’ฝ๐‘’๐“๐“๐‘œ ๐“Œ๐‘œ๐“‡๐“๐’น';
str = str.slice(0, -2);
console.log(str);
To work with code points instead of code units you can use the iterator of strings, which will iterate over the code points (that is one character of the string might be a string with two chars):
let str = '๐’ฝ๐‘’๐“๐“๐‘œ ๐“Œ๐‘œ๐“‡๐“๐’น';
for(const char of str)
console.log(char, char.length);
You can use the iterator to build up an array, work on that, and turn the array back into a string as the other answer shows.

Related

How to remove specific multiple string and then remain another?

let result = 'Apple%00Juice%02';
const removeOne = result.slice(5, 8); // get %00
const removeTwo = result.slice(13, 16); // get %02
slice get the part of I want to remove not I want to get.
Is any function can let me get the result becomes to 'Apple Juice' ?
You can get the desired result using a regex to match the parts you want to remove from the string and then replace them using the replace() method on strings.
const str = "Apple%00Juice%02";
const regex = /%\d+/g;
const result = str.replace(regex, " ").trim()
console.log(result);
Explanation of regex:
% - match the character % literally
\d+ - match any digit 0 to 9 one or more times
%\d+ - match % character followed by one or more digits
You can achieve this by using .replace() function
Example:
let result = 'Apple%00Juice%02';
result = result.replace('%00', ' ');
result = result.replace('%02', '');
console.log(result);
Read More About .replace() function at MDN Docs
Edit:
Minifying #Yousaf's Answer
let result = "Apple%00Juice%02";
result = result.replace(/%\d+/g, " ").trim()
console.log(result);
It is possible with replace(), but that is not a sustainable solution. The URL encoding "% 00" is the ๏ฟฝ ASCII character. This suggests that the string is already being encoded in the wrong character format for the URL. So you have to look at the character format in which your database or file is read out. for example UTF-8, ISO 8859-1
When encoded in the correct character format. Can you decode it in JavaScript using the decodeURIComponent (str) method.
more on this

How to get characters from a text like LEFT() but using google apps script [duplicate]

This may duplicate with previous topics but I can't find what I really need.
I want to get a first three characters of a string. For example:
var str = '012123';
console.info(str.substring(0,3)); //012
I want the output of this string '012' but I don't want to use subString or something similar to it because I need to use the original string for appending more characters '45'. With substring it will output 01245 but what I need is 01212345.
var str = '012123';
var strFirstThree = str.substring(0,3);
console.log(str); //shows '012123'
console.log(strFirstThree); // shows '012'
Now you have access to both.
slice(begin, end) works on strings, as well as arrays. It returns a string representing the substring of the original string, from begin to end (end not included) where begin and end represent the index of characters in that string.
const string = "0123456789";
console.log(string.slice(0, 2)); // "01"
console.log(string.slice(0, 8)); // "01234567"
console.log(string.slice(3, 7)); // "3456"
See also:
What is the difference between String.slice and String.substring?

How can I split the word by numbers but also keep the numbers in Node.js?

I would like to split a word by numbers, but at the same time keep the numbers in node.js.
For example, take this following sentence:
var a = "shuan3jia4";
What I want is:
"shuan3 jia4"
However, if you use a regexp's split() function, the numbers that are used on the function are gone, for example:
s.split(/[0-9]/)
The result is:
[ 'shuan', 'jia', '' ]
So is there any way to keep the numbers that are used on the split?
You can use match to actually split it per your requirement:
var a = "shuan3jia4";
console.log(a.match(/[a-z]+[0-9]/ig));
use parenthesis around the match you wanna keep
see further details at Javascript and regex: split string and keep the separator
var s = "shuan3jia4";
var arr = s.split(/([0-9])/);
console.log(arr);
var s = "shuan3jia4";
var arr = s.split(/(?<=[0-9])/);
console.log(arr);
This will work as per your requirements. This answer was curated from #arhak and C# split string but keep split chars / separators
As #codybartfast said, (?<=PATTERN) is positive look-behind for PATTERN. It should match at any place where the preceding text fits PATTERN so there should be a match (and a split) after each occurrence of any of the characters.
Split, map, join, trim.
const a = 'shuan3jia4';
const splitUp = a.split('').map(function(char) {
if (parseInt(char)) return `${char} `;
return char;
});
const joined = splitUp.join('').trim();
console.log(joined);

Javascript How to get first three characters of a string

This may duplicate with previous topics but I can't find what I really need.
I want to get a first three characters of a string. For example:
var str = '012123';
console.info(str.substring(0,3)); //012
I want the output of this string '012' but I don't want to use subString or something similar to it because I need to use the original string for appending more characters '45'. With substring it will output 01245 but what I need is 01212345.
var str = '012123';
var strFirstThree = str.substring(0,3);
console.log(str); //shows '012123'
console.log(strFirstThree); // shows '012'
Now you have access to both.
slice(begin, end) works on strings, as well as arrays. It returns a string representing the substring of the original string, from begin to end (end not included) where begin and end represent the index of characters in that string.
const string = "0123456789";
console.log(string.slice(0, 2)); // "01"
console.log(string.slice(0, 8)); // "01234567"
console.log(string.slice(3, 7)); // "3456"
See also:
What is the difference between String.slice and String.substring?

Splitting Nucleotide Sequences in JS with Regexp

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.
If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}
I think you want is
var subStrings = inputString.split('ATG');
KISS :)
Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.
Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });
This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.

Categories

Resources