Ignoring characters after backslash "\" - javascript - javascript

I have an application where I need to split the string with "," but I want to Ignore "\," ("\\,").
Is there any one line solution?
sample input :- "This is first\\, this is still first, this is second"
sample output :- ["This is first, this is still first", " this is second"]

If you can use a negative look-behind regexp, this could work:
const input = "This is first \\, this is still first, this is second";
// Split on all commas that aren't preceded by a backslash.
const result = input.split( /(?<!\\),/ );
console.log( result );
Just .map() with .trim() if you do not want the leading spaces.

If there's a character sequence that can never appear in the original string, you can replace all the \, with it, split the string, then undo the replacement.
let input = "This is first \\, this is still first, this is second";
let output = input.replace(/\\,/g, '-comma-').split(',').map(s => s.replace(/-comma-/g, ','));
console.log(output);
This isn't a perfect solution if the input is user-generated, since they could type -comma-, but you can make that replacement string arbitrarily complex so it will be improbable.

You can try this
.*?[^\\]+?(?:,|$)
let splitByComma = (str) =>
str.match(/.*?[^\\]+?(?:,|$)/g).map(v => v.replace(/(\\,)|,$/g, (m, g1) => g1 ? ',' : ''))
console.log(splitByComma('\\, some, more some\\123'))
console.log(splitByComma("This is first \\, this is still first, this is second"))
console.log(splitByComma("hello\\,123, some text"))
console.log(splitByComma("\\,\\,\\,123, 123-123\\,1232,"))

Related

Remove last occurence of invisible unicode character using regex?

I have string my string 󠀀, there is an invisible character \u{E0000} at the end of this string, I wanted to know how I can use regex to remove this character so that if I were to split the string using .split(' '), it would say the length is 2 and not 3 which is what it is showing right now.
This is the regex I am currently using to remove the character, however when I split the string it still shows the length is 3 and not 2. The split would like look ['my', 'string'].
.replace(/[\u034f\u2800(\u{E0000})\u180e\ufeff\u2000-\u200d\u206D]/gu, '');
The invisible character you have there is 2 code points, so you need to replace a sequence of 2 unicode escapes: \u{e0000}\u{dc00}.
However, you also seem to be misunderstanding the way split works. If you have a space at the end of the string, it will still try to split it into a separate element. See below example where there is no special character following:
// removing the special character so the length of string is 10 with my string
console.log(
"my string 󠀀".length,
"my string 󠀀".replace(/[\u034f\u2800(\u{e0000}\u{dc00})\u180e\ufeff\u2000-\u200d\u206D]/gu, '')
.length
);
console.log(
// use trim to remove trailing space so that it behaves the way you want
"my string 󠀀".replace(/[\u034f\u2800(\u{e0000}\u{dc00})\u180e\ufeff\u2000-\u200d\u206D]/gu, '')
.trim().split(' ')
);
// notice that it still tries to split the final into a 3rd element.
console.log( //\u0020 is the hex code for space
("my string" + "\u0020").split(' ')
);
Note that you may need to adjust your Regex. I haven't checked, but it is highly likely that the unicode characters you are using are not correct, and do not take into account multi-codepoint characters.
I've created a function below for extracting full escape sequences.
var codePoints = (char, pos, end) => Array(char.length).fill(0).map((_,i)=>char.codePointAt(i)).slice(pos||0, end)
//some code point values stop iterator; use length instead
var escapeSequence = (codes, pos, end) => codePoints(codes, pos,end).map(p=>`\\u{${p.toString(16)}}`).join('')
document.getElementById('btn').onclick=()=>{
const text = document.getElementById('text').value
const start = +document.getElementById('start').value
const end = document.getElementById('end').value||undefined
document.getElementById('result').innerHTML = escapeSequence(text,start,end)
}
console.log(
escapeSequence('1️⃣')
)
console.log(
escapeSequence("󠀀"),
)
console.log(
escapeSequence("my string 󠀀",10)
)
<label for="text">unicode text: </label><input type="text" id="text"><br>
<label for="start">start position to retrieve from: </label><input type="number" id="start"><br>
<label for="end">end position to retrieve from: </label><input type="number" id="end"><br>
<button id="btn">get unicode escaped code points</button><br>
<div id="result"></div>

How to check if a string contains a WORD in javascript? [duplicate]

This question already has answers here:
How to check if a string contain specific words?
(11 answers)
Closed 3 years ago.
So, you can easily check if a string contains a particular substring using the .includes() method.
I'm interested in finding if a string contains a word.
For example, if I apply a search for "on" for the string, "phones are good", it should return false. And, it should return true for "keep it on the table".
You first need to convert it into array using split() and then use includes()
string.split(" ").includes("on")
Just need to pass whitespace " " to split() to get all words
This is called a regex - regular expression
You can use of 101regex website when you need to work around them (it helps). Words with custom separators aswell.
function checkWord(word, str) {
const allowedSeparator = '\\\s,;"\'|';
const regex = new RegExp(
`(^.*[${allowedSeparator}]${word}$)|(^${word}[${allowedSeparator}].*)|(^${word}$)|(^.*[${allowedSeparator}]${word}[${allowedSeparator}].*$)`,
// Case insensitive
'i',
);
return regex.test(str);
}
[
'phones are good',
'keep it on the table',
'on',
'keep iton the table',
'keep it on',
'on the table',
'the,table,is,on,the,desk',
'the,table,is,on|the,desk',
'the,table,is|the,desk',
].forEach((x) => {
console.log(`Check: ${x} : ${checkWord('on', x)}`);
});
Explaination :
I am creating here multiple capturing groups for each possibily :
(^.*\son$) on is the last word
(^on\s.*) on is the first word
(^on$) on is the only word
(^.*\son\s.*$) on is an in-between word
\s means a space or a new line
const regex = /(^.*\son$)|(^on\s.*)|(^on$)|(^.*\son\s.*$)/i;
console.log(regex.test('phones are good'));
console.log(regex.test('keep it on the table'));
console.log(regex.test('on'));
console.log(regex.test('keep iton the table'));
console.log(regex.test('keep it on'));
console.log(regex.test('on the table'));
You can .split() your string by spaces (\s+) into an array, and then use .includes() to check if the array of strings has your word within it:
const hasWord = (str, word) =>
str.split(/\s+/).includes(word);
console.log(hasWord("phones are good", "on"));
console.log(hasWord("keep it on the table", "on"));
If you are worried about punctuation, you can remove it first using .replace() (as shown in this answer) and then split():
const hasWord = (str, word) =>
str.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"").split(/\s+/).includes(word);
console.log(hasWord("phones are good son!", "on"));
console.log(hasWord("keep it on, the table", "on"));
You can split and then try to find:
const str = 'keep it on the table';
const res = str.split(/[\s,\?\,\.!]+/).some(f=> f === 'on');
console.log(res);
In addition, some method is very efficient as it will return true if any predicate is true.
You can use .includes() and check for the word. To make sure it is a word and not part of another word, verify that the place you found it in is followed by a space, comma, period, etc and also has one of those before it.
A simple version could just be splitting on the whitespace and looking through the resulting array for the word:
"phones are good".split(" ").find(word => word === "on") // undefined
"keep it on the table".split(" ").find(word => word === "on") // "on"
This just splits by whitespace though, when you need parse text (depending on your input) you'll encounter more word delimiters than whitespace. In that case you could use a regex to account for these characters.
Something like:
"Phones are good, aren't they? They are. Yes!".split(/[\s,\?\,\.!]+/)
I would go with the following assumptions:
Words the start of a sentence always have a trailing space.
Words at the end of a sentence always have a preceding space.
Words in the middle of a sentence always have a trailing and preceding space.
Therefore, I would write my code as follows:
function containsWord(word, sentence) {
return (
sentence.startsWith(word.trim() + " ") ||
sentence.endsWith(" " + word.trim()) ||
sentence.includes(" " + word.trim() + " "));
}
console.log(containsWord("test", "This is a test of the containsWord function."));
Try the following -
var mainString = 'codehandbook'
var substr = /hand/
var found = substr.test(mainString)
if(found){
console.log('Substring found !!')
} else {
console.log('Substring not found !!')
}

How to convert a string of camelCase identifiers to a string with space-separted words, while replacing the separator?

I have studied the answers to "how to use regular expressions to insert space into a camel case string" and several related questions, and the code below will produce the string
Word Double Word A Triple Word UPPER Case Word
Unfortunately, it's necessary to have a separator where {TOKEN} appears in the input. Ideally, the result would have comma separators
Word, Double Word, A Triple Word, UPPER Case Word
Is there a way to do that with a single regex? (It would be okay for the regex replacement to result in a string with a leading comma.)
Here's the code that I have so far:
const regex = /({TOKEN})|([A-Z])(?=[A-Z][a-z])|([a-z])(?=[A-Z])/g;
const str = '{TOKEN}NormalWord{TOKEN}DoubleWord{TOKEN}ATripleWord{TOKEN}UPPERCaseWord';
const subst = '$2$3 ';
const result = str.replace(regex, subst);
It does not look pretty, but you may use it like
const regex = /(^(?:{TOKEN})+|(?:{TOKEN})+$)|{TOKEN}|([A-Z])(?=[A-Z][a-z])|([a-z])(?=[A-Z])/g;
const str = '{TOKEN}NormalWord{TOKEN}DoubleWord{TOKEN}ATripleWord{TOKEN}UPPERCaseWord';
const result = str.replace(regex, (g0, g1, g2, g3) =>
g1 ? "" : g2 ? `${g2} ` : g3 ? `${g3} ` : ", "
);
console.log(result); // => Normal Word, Double Word, A Triple Word, UPPER Case Word
The (^(?:{TOKEN})+|(?:{TOKEN})+$) alternative will capture {TOKEN}s at the start and end of the string, and will remove them completely (see g1 ? "" in the replacement callback method). {TOKEN} will signal a normal token that must be replaced with a comma and space. The rest is the same as in the original regex.
Note that in the callback, g0 stands for Group 0 (the whole match), g1 for Group 1, etc.

Split string by all spaces except those in parentheses

I'm trying to split text the following like on spaces:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}"
but I want it to ignore the spaces within parentheses. This should produce an array with:
var words = ["Text", "(what is)|what's", "a", "story|fable" "called|named|about", "{Search}|{Title}"];
I know this should involve some sort of regex with line.match(). Bonus points if the regex removes the parentheses. I know that word.replace() would get rid of them in a subsequent step.
Use the following approach with specific regex pattern(based on negative lookahead assertion):
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}",
words = line.split(/(?!\(.*)\s(?![^(]*?\))/g);
console.log(words);
(?!\(.*) ensures that a separator \s is not preceded by brace ((including attendant characters)
(?![^(]*?\)) ensures that a separator \s is not followed by brace )(including attendant characters)
Not a single regexp but does the job. Removes the parentheses and splits the text by spaces.
var words = line.replace(/[\(\)]/g,'').split(" ");
One approach which is useful in some cases is to replace spaces inside parens with a placeholder, then split, then unreplace:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}";
var result = line.replace(/\((.*?)\)/g, m => m.replace(' ', 'SPACE'))
.split(' ')
.map(x => x.replace(/SPACE/g, ' '));
console.log(result);

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");
You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}
You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");
([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);
var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

Categories

Resources