Howto detect wide characters in javascript? - javascript

I write a small parser for a custom query language which contains Chinese characters. When detecting syntax error, it outputs error message as following:
語法錯誤:應為數,但為字串。
索引 = '3213茂訊'"
^
The last line has only one '^' character to indicate the position of error token. For Chinese characters' visual length occupy two other characters, I need to detect wide character to calculate the '^' position to indicate the right token. Does anyone knows some function can detect the wide character in javascript?

I’m not sure if I understand you correct. But you might want to try the https://www.npmjs.com/package/wcwidth package. Which can be implemented as follows:
import wcwidth from 'wcwidth';
const getCharAtPosition = (str, position) => {
let currPos = 0;
return [...str].find(char => {
const charWidth = wcwidth(char);
const isPosition =
currPos === position || (charWidth === 2 && currPos === position - 1);
currPos += charWidth;
return isPosition;
});
};
const indicatorPos = ' ^'.indexOf('^');
console.log(getCharAtPosition(`索引 = '3213茂訊'"`, indicatorPos));
// will log: '
I didn’t test it, but something like this might work.

Related

Add line break in string on last character before n characters

To start off, this is probably worded badly, as I am not sure how to put what I want into wordsLets say I have this canvas (I'm using node-canvas) and I want to make it display text from a user input. However, the way I am doing it limits the number of characters to 36-38 (not looking for a solution to this). So I made a script using the Regex textstr.match(/.{1,32}/g) that splits the string every 32 characters (just to be safe), calculates a new canvas height, and then does join("\n") when it comes time to print the string. However, when receiving feedback on this, I realized it would be better to split along the last space in the string and add a line break there, but I am confused how to do this.
My current code is this:
textStr = "123456789 01234567890 123456789012 34567890"
var splitStr
if(textstr.length > 32){
if(textstr.substring(1,32).includes(" ")){ //1,32 so it won't bug out if the first character is a space
//splitStr = textstr.something(test)
} else {
splitStr = textstr.match(/.{1,32}/g)
}
}
//canvas initialization blah blah blah
//load fonts yada yada yada
ctx.fillText(splitStr.join("\n"), 20, 55)
I was wondering if there was some sort of regex expression that I could use. Any help/feedback/common sense is appreciated
This solution is a bit complex and can due with some simplification. However it should get you mostly there.
const input = "123456789 01234567890 123456789012 34567890 11444444444 424124 1234124124121 4444444444444444444444444444444444444444444444444444444444444444444444";
const split = (value, width) => {
const stack = value.split(' ').reverse();
const results = [];
let builder = "";
while (stack.length > 0) {
const item = stack.pop();
if (item.length > width) { // is the current chunk already larger than our desired width?
if (builder !== "") { // we have to push our buffer too
results.push(builder);
builder = "";
}
results.push(item);
} else {
const line = builder === ""
? item
: `${builder} ${item}`;
if (line.length > width) { // is our new line greater than our width?
stack.push(item); // push the item back, since consuming it would make our line length too long. we let the next iteration consume it. results.push(builder); // push the buffer into our results.
builder = "";
} else if (stack.length === 0) { // is this the last element? just add it to the results.
results.push(line);
} else {
builder = line; // update our buffer to the current appended chunk.
}
}
}
return results;
};
split(input, 32).forEach((c) => console.log(c, c.length));
split(input, 32).join("\n");

Emojis to/from codepoints in Javascript

In a hybrid Android/Cordova game that I am creating I let users provide an identifier in the form of an Emoji + an alphanumeric - i.e. 0..9,A..Z,a..z - name. For example
🙋‍️Stackoverflow
Server-side the user identifiers are stored with the Emoji and Name parts separated with only the Name part requiried to be unique. From time-to-time the game displays a "league table" so the user can see how well they are performing compared to other players. For this purpose the server sends back a sequence of ten "high score" values consisting of Emoji, Name and Score.
This is then presented to the user in a table with three columns - one each for Emoji, Name and Score. And this is where I have hit a slight problem. Initially I had quite naively assumed that I could figure out the Emoji by simply looking at handle.codePointAt(0). When it dawned on me that an Emoji could in fact be a sequence of one or more 16 bit Unicode values I changed my code as follows
Part 1:Dissecting the user supplied "handle"
var i,username,
codepoints = [],
handle = "🙋‍️StackOverflow",
len = handle,length;
while ((i < len) && (255 < handle.codePointAt(i)))
{codepoints.push(handle.codePointAt(i));i += 2;}
username = handle.substring(codepoints.length + 1);
At this point I have the "disssected" handle with
codepoints =  [128587, 8205, 65039];
username = 'Stackoverflow;
A note of explanation for the i += 2 and the use of handle.length above. This article suggests that
handle.codePointAt(n) will return the code point for the full surrogate pair if you hit the leading surrogate. In my case since the Emoji has to be first character the leading surrogates for the sequence of 16 bit Unicodes for the emoji are at 0,2,4....
From the same article I learnt that String.length in Javascript will return the number of 16 bit code units.
Part II - Re generating the Emojis for the "league table"
Suppose the league table data squirted back to the app by my servers has the entry {emoji: [128583, 8205, 65039],username:"Stackexchange",points:100} for the emoji character 🙇‍️. Now here is the bothersome thing. If I do
var origCP = [],
i = 0,
origEmoji = '🙇‍️',
origLen = origEmoji.length;
while ((i < origLen) && (255 < origEmoji.codePointAt(i))
{origCP.push(origEmoji.codePointAt(i);i += 2;}
I get
origLen = 5, origCP = [128583, 8205, 65039]
However, if I regenerate the emoji from the provided data
var reEmoji = String.fromCodePoint.apply(String,[128583, 8205, 65039]),
reEmojiLen = reEmoji.length;
I get
reEmoji = '🙇‍️'
reEmojiLen = 4;
So while reEmoji has the correct emoji its reported length has mysteriously shrunk down to 4 code units in place of the original 5.
If I then extract code points from the regenerated emoji
var reCP = [],
i = 0;
while ((i < reEmojiLen) && (255 < reEmoji.codePointAt(i))
{reCP.push(reEmoji.codePointAt(i);i += 2;}
which gives me
reCP = [128583, 8205];
Even curioser, origEmoji.codePointAt(3) gives the trailing surrogate pair value of 9794 while reEmoji.codePointAt(3) gives the value of the next full surrogate pair 65039.
I could at this point just say
Do I really care?
After all, I just want to show the league table emojis in a separate column so as long as I am getting the right emoji the niceties of what is happening under the hood do not matter. However, this might well be stocking up problems for the future.
Can anyone here shed any light on what is happening?
emojis are more complicated than just single chars, they come in "sequences", e.g. a zwj-sequence (combine multiple emojis into one image) or a presentation sequence (provide different variations of the same symbol) and some more, see tr51 for all the nasty details.
If you "dump" your string like this
str = "🙋‍️StackOverflow"
console.log(...[...str].map(x => x.codePointAt(0).toString(16)))
you'll see that it's actually an (incorrectly formed) zwj-sequence wrapped in a presentation sequence.
So, to slice emojis accurately, you need to iterate the string as an array of codepoints (not units!) and extract plane 1 CPs (>0xffff) + ZWJ's + variation selectors. Example:
function sliceEmoji(str) {
let res = ['', ''];
for (let c of str) {
let n = c.codePointAt(0);
let isEmoji = n > 0xfff || n === 0x200d || (0xfe00 <= n && n <= 0xfeff);
res[1 - isEmoji] += c;
}
return res;
}
function hex(str) {
return [...str].map(x => x.codePointAt(0).toString(16))
}
myStr = "🙋‍️StackOverflow"
console.log(sliceEmoji(myStr))
console.log(sliceEmoji(myStr).map(hex))

The most efficient way to trim a file name in JavaScript

I was wondering how to trim a file name in JS to show "..." or any appendix for that matter after a certain number of characters, the most efficient way to handle all possible test cases.
Rules
Show the actual file extension and not the last character after splitting the string name with "."
The function should take the input file name (string), the number of characters to trim (integer) and appendix (string) as the parameter.
By efficient, I mean I expect to write fewer lines of code and handle all possible edge cases.
Sample Inputs
myAwesomeFile.min.css
my Awesome File.tar.gz
file.png
Expected output (say I want to trim after 5 characters)
myAwe....min.css
my Aw....tar.gz
file.png
Editing the question to show my attempt
function trimFileName(str, noOfChars, appendix) {
let nameArray = str.split(".");
let fileType = `.${nameArray.pop()}`;
let fileName = nameArray.join(" ");
if (fileName.length >= noOfChars) {
fileName = fileName.substr(0, noOfChars) + appendix;
};
return (fileName + fileType);
}
console.log(trimFileName("myAwesomeFile.min.css", 5, "..."));
console.log(trimFileName("my Awesome File.tar.gz", 5, "..."));
console.log(trimFileName("file.png", 5, "..."));
Edit #2: Feel free to go ahead and edit the question if you think it's not the standard expectation and add more edge cases to the sample inputs and expected outputs.
Edit #3: Added a few more details to the question after the new comments. I know my attempt doesn't fulfill my expected outputs (and I am unsure whether the output I have listed above is a standard expectation or not).
Edit #4 (Final): Removed the rule of not breaking a word in the middle after a continuous backlash in the comments and changed the rules to cater to more realistic and practical use cases.
If we treat the dot character . as a separator for file extensions, what you ask for can be solved with a single line of JavaScript:
name.replace(new RegExp('(^[^\\.]{' + chars + '})[^\\.]+'), '$1' + subst);
Demo code in the following snippet:
function f(name, chars, subst) {
return name.replace(
new RegExp('(^[^\\.]{' + chars + '})[^\\.]+'), '$1' + subst);
}
test('myAwesomeFile.min.css', 5, '...', 'myAwe....min.css');
test('my Awesome File.tar.gz', 5, '...', 'my Aw....tar.gz');
test('file.png', 5, '...', 'file.png');
function test(filename, length, subst, expected) {
let actual = f(filename, length, subst);
console.log(actual, actual === expected ? 'OK' : 'expected: ' + expected);
}
On Windows, AFAIK, the file extension is only what follows the last dot. Thus, technically, the file extension of "myAwesomeFile.min.css" is just css, and the file extension of "my Awesome File.tar.gz" is just gz.
In this case, what you ask for can still be solved with one line of JavaScript:
name.replace(new RegExp('(^.{' + chars + '}).+(\\.[^\\.]*$)'), '$1' + subst + '$2');
Demo code in the following snippet:
function f(name, chars, subst) {
return name.replace(
new RegExp('(^.{' + chars + '}).+(\\.[^\\.]*$)'), '$1' + subst + '$2');
}
test('myAwesomeFile.min.css', 5, '...', 'myAwe....css');
test('my Awesome File.tar.gz', 5, '...', 'my Aw....gz');
test('file.png', 5, '...', 'file.png');
function test(filename, length, subst, expected) {
let actual = f(filename, length, subst);
console.log(actual, actual === expected ? 'OK' : 'expected: ' + expected);
}
If you really want to allow edge cases with specific multiple extensions, you probably need to define a comprehensive list of all allowed multiple extensions to know how to deal with cases like "my.awesome.file.min.css". You would need to provide a list of all cases you want to include before it would be possible to determine how efficient any solution could be.
It is really hard to account for all extensions (including edge cases). See this list for example for common extensions: https://www.computerhope.com/issues/ch001789.htm. Event with that many extensions, the list is exhaustive of all extensions.
Your function is OK but to account for more cases it could be re-written to this:
function trimFileName(filename, limit = 5, spacer = '.') {
const split = filename.indexOf(".");
const name = filename.substring(0, split);
const ext = filename.substring(split);
let trimName = name.substring(0, limit);
if (name.length > trimName.length)
trimName = trimName.padEnd(limit + 3, spacer);
return trimName + ext;
}
console.log(trimFileName("myAwesomeFile.min.css"));
console.log(trimFileName("my Awesome File.tar.gz"));
console.log(trimFileName("file.png"));
Below is a pretty simple approach to achieve shortening in the fashion you desire. Comments are in the code, but let me know if anything needs additional explanation:
//A simple object to hold some configs about how we want to shorten the file names
const config = {
charsBeforeTrim: 5,
seperator: '.',
replacementText: '....'
};
//Given file names to shorten
const files = ['myAwesomeFile.min.css', 'my Awesome File.tar.gz', 'file.png'];
//Function to do the actual file name shortening
const shorten = s =>
s.length > config.charsBeforeTrim ? `${s.substring(0, config.charsBeforeTrim)}${config.replacementText}` : s;
//Function to generate a short file name with extension(s)
const generateShortName = (file, config) => {
//ES6 Array destructuring makes it easy to get the file name in a unique variable while keeping the remaining elements (the extensions) in their own array:
const [name, ...extensions] = file.split(config.seperator);
//Simply append all remaining extension chunks to the shortName
return extensions.reduce((accum, extension) => {
accum += `${config.seperator}${extension}`;
return accum;
}, shorten(name));
};
//Demonstrate usage
const shortFileNames = files.map(file => generateShortName(file, config));
console.log(shortFileNames);
const parse = (filename, extIdx = filename.lastIndexOf('.')) => ({
name: filename.substring(0, extIdx),
extension: filename.substring(extIdx + 1),
})
const trimFileName = (
filename, size = 5, fill = '...',
file = parse(filename),
head = file.name.substring(0, size)
) => file.name.length >= size ? `${head}${fill}${file.extension}` : filename
/* - - - - - - - - - - - - - - - - - - - - - - - - - */
;[
'myAwesomeFile.min.css',
'my.Awesome.File.min.css',
'my Awesome File.tar.gz',
'file.png',
].forEach(f => console.log(trimFileName(f)))
You can fairly straightforwardly pull the your extension condition (easily replaced with a list of valid extensions) and regex pull the last part. Then you just add a check on the filename (starting at the beginning of filename) to trim the result.
const trim = (string, x) => {
// We assume that up to last two . delimited substrings of length are the extension
const extensionRegex = /(?:(\.[a-zA-Z0-9]+){0,2})$/g;
const { index } = extensionRegex.exec(string);
// No point in trimming since result string would be longer than input
if (index + 2 < x) {
return string;
}
return string.substr(0, x) + ".." + string.substr(index);
};
/* Assert that we keep the extension */
console.log("cat.tar.gz", trim("cat.tar.gz", 100) == "cat.tar.gz");
console.log("cat.zip", trim("cat.zip", 100) == "cat.zip");
/* Assert that we keep x characters before trim */
console.log("1234567890cat.tar.gz",!trim("1234567890cat.tar.gz",10).includes("cat"));
console.log("1234567890cat.zip", !trim("1234567890cat.zip", 10).includes("cat"));

Look for substring in a string with at most one different character-javascript

I am new in programing and right now I am working on one program. Program need to find the substring in a string and return the index where the chain starts to be the same. I know that for that I can use "indexOf". Is not so easy. I want to find out substrings with at moste one different char.
I was thinking about regular expresion... but not really know how to use it because I need to use regular expresion for every element of the string. Here some code wich propably will clarify what I want to do:
var A= "abbab";
var B= "ba";
var tb=[];
console.log(A.indexOf(B));
for (var i=0;i<B.length; i++){
var D=B.replace(B[i],"[a-z]");
tb.push(A.indexOf(D));
}
console.log(tb);
I know that the substring B and string A are the lowercase letters. Will be nice to get any advice how to make it using regular expresions. Thx
Simple Input:
A B
1) abbab ba
2) hello world
3) banana nan
Expected Output:
1) 1 2
2) No Match!
3) 0 2
While probably theoretically possible, I think it would very complicated to try this kind of search while attempting to incorporate all possible search query options in one long complex regular expression. I think a better approach is to use JavaScript to dynamically create various simpler options and then search with each separately.
The following code sequentially replaces each character in the initial query string with a regular expression wild card (i.e. a period, '.') and then searches the target string with that. For example, if the initial query string is 'nan', it will search with '.an', 'n.n' and 'na.'. It will only add the position of the hit to the list of hits if that position has not already been hit on a previous search. i.e. It ensures that the list of hits contains only unique values, even if multiple query variations found a hit at the same location. (This could be implemented even better with ES6 sets, but I couldn't get the Stack Overflow code snippet tool to cooperate with me while trying to use a set, even with the Babel option checked.) Finally, it sorts the hits in ascending order.
Update: The search algorithm has been updated/corrected. Originally, some hits were missed because the exec search for any query variation would only iterate as per the JavaScript default, i.e. after finding a match, it would start the next search at the next character after the end of the previous match, e.g. it would find 'aa' in 'aaaa' at positions 0 and 2. Now it starts the next search at the next character after the start of the previous match, e.g. it now finds 'aa' in 'aaaa' at positions 0, 1 and 2.
const findAllowingOneMismatch = (target, query) => {
const numLetters = query.length;
const queryVariations = [];
for (let variationNum = 0; variationNum < numLetters; variationNum += 1) {
queryVariations.push(query.slice(0, variationNum) + "." + query.slice(variationNum + 1));
};
let hits = [];
queryVariations.forEach(queryVariation => {
const re = new RegExp(queryVariation, "g");
let myArray;
while ((searchResult = re.exec(target)) !== null) {
re.lastIndex = searchResult.index + 1;
const hit = searchResult.index;
// console.log('found a hit with ' + queryVariation + ' at position ' + hit);
if (hits.indexOf(hit) === -1) {
hits.push(searchResult.index);
}
}
});
hits = hits.sort((a,b)=>(a-b));
console.log('Found "' + query + '" in "' + target + '" at positions:', JSON.stringify(hits));
};
[
['abbab', 'ba'],
['hello', 'world'],
['banana', 'nan'],
['abcde abcxe abxxe xbcde', 'abcd'],
['--xx-xxx--x----x-x-xxx--x--x-x-xx-', '----']
].forEach(pair => {findAllowingOneMismatch(pair[0], pair[1])});

Javascript NETMASK and CIDR conversion

I was expecting to find hundreds of examples of functions to convert to and from CIDR and NETMASK for javascript, but was unable to find any.
I need to convert to and from CIDR and NETMASKS on a nodejs page which sets and retrieves the IP address for a machine using NETCTL.
Any easy solutions to do this using javascript / nodejs ??
This code could provide a solution:
var mask = "255.255.248.0";
var maskNodes = mask.match(/(\d+)/g);
var cidr = 0;
for(var i in maskNodes)
{
cidr += (((maskNodes[i] >>> 0).toString(2)).match(/1/g) || []).length;
}
return cidr;
Here's one that doesn't check if the netmask is valid:
const netmaskToCidr = n => n
.split('.')
.reduce((c, o) => c - Math.log2(256 - +o), 32)
NETMASK BINARY CIDR
255.255.248.0 11111111.11111111.11111000.00000000 /21
255.255.0.0 11111111.11111111.00000000.00000000 /16
255.192.0.0 11111111.11000000.00000000.00000000 /10
This how calculate CIDR.. So , it is the occurrences of 1 in the second cloumn. Thus , I design a readable algorithm as below :
const masks = ['255.255.255.224', '255.255.192.0', '255.0.0.0'];
/**
* Count char in string
*/
const countCharOccurences = (string , char) => string.split(char).length - 1;
const decimalToBinary = (dec) => (dec >>> 0).toString(2);
const getNetMaskParts = (nmask) => nmask.split('.').map(Number);
const netmask2CIDR = (netmask) =>
countCharOccurences(
getNetMaskParts(netmask)
.map(part => decimalToBinary(part))
.join(''),
'1'
);
masks.forEach((mask) => {
console.log(`Netmask =${mask}, CIDR = ${netmask2CIDR(mask)}`)
})
I know it's been long since this question was asked, but I just wanted to add checks to ensure that the netmask is valid:
function mask2cidr(mask){
var cidr = ''
for (m of mask.split('.')) {
if (parseInt(m)>255) {throw 'ERROR: Invalid Netmask'} // Check each group is 0-255
if (parseInt(m)>0 && parseInt(m)<128) {throw 'ERROR: Invalid Netmask'}
cidr+=(m >>> 0).toString(2)
}
// Condition to check for validity of the netmask
if (cidr.substring(cidr.search('0'),32).search('1') !== -1) {
throw 'ERROR: Invalid Netmask ' + mask
}
return cidr.split('1').length-1
}
As the mask is only valid when the bits in 1 go from left to right, the condition checks that no bit is 1 after the first bit in 0. It also checks each group is 0 or 128-255
The method of conversion is mostly the same as the other answers
Given that you have mentioned using node.js to implement this, I'm assuming you're looking for a way to run this server side in javascript, as opposed to client side. If that's correct, does the netmask npm module cover what you need to do?

Categories

Resources