I would like to split a string like that:
"'Hi, how are you?' he said."
in this array:
["'", "Hi", ",", " ", "how", " ", "are", " ", "you", "?", "'", " ", "he", " ", "said", "."]
in my js script. I tried with some regexp, but I'm not very good at using it. Can anyone help me?
This is what I'd probably use:
"'Hi, how are you?' he said.".match(/\w+|./g);
It performs a global match for words (\w+) and other characters (.) in the given string.
"'Hi, how are you?' he said.".match(/\w+|\W/g)
//output
["'", "Hi", ",", " ", "how", " ", "are", " ", "you", "?", "'", " ", "he", " ", "said", "."]
Explanation
\w+ - For Matching Group of Characters
\W - For Matching Non-Character
| - Or operator between above two (either a Character or a non character)
Related
I've used below formula to remove all blank space before " and after " but not effort.
var test_string = '{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }'
var test_string_format = test_string.replace(/^[ '"]+|[ '"]+$|( ){2,}/g,'$1')
console.log(test_string_format)
How can I use regex to get the desired output?
{"1":"b.) some pasta salad","10":"a.) Yes, some bread","11":"a.) eggs and toast" }
const regex = /(\s*["]\s*)/gm;
var test_string = = `{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }
`;
console.log(test_string.replace(regex, '"'))
Assuming that the string content is always valid JSON, you can do it like this:
var str = '{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }'
const res=JSON.stringify(Object.fromEntries(Object.entries(JSON.parse(str)).map(e=>e.map(e=>e.trim()))))
console.log(res)
I know, this is kind of a "goofy" solution, but it works for the given sample string.
The snippet parses the JSON string, then converts the resulting object into an array of arrays. Each element of each sub-array is then .trim()-med and at the end of the operation the object is reconstructed via .Object.fromEntries() and turned back into a JSON string via JSON.stringify().
Replace zero or more spaces, followed by a quote, followed by zero or more spaces, with just the quote:
replaceAll(/ *" */g, '"')
const test_string = '{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }'
const trimmed = test_string.replaceAll(/ *" */g, '"');
console.log(trimmed);
const test_string = '{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }';
const test_string_format = test_string.replace(/\s*"\s*/g, '"');
console.log(test_string_format);
\s: matches any white-space character
*: matches zero or more of the prev character(\s)
": matches the double quote character
console.log('{" 1 ": "b.) some pasta salad", " 10": " a.) Yes, some bread", "11 ": " a.) eggs and toast " }'
.replaceAll(/( *\" +)|( +\ *)/ig,'"'))
I'm trying to split a string in infix notation into a tokenized list, ideally with regex.
e.g. ((10 + 4) ^ 2) * 5 would return ['(', '(', '10', '+', '4', ')', '^', '2', ')', '*', '5']
At the moment I'm just splitting it up by character, but this doesn't allow me to represent numbers with more than one digit.
I tried tokens = infixString.split("(\d+|[^ 0-9])"); which I found online for this very same problem, but I think it was for Java and it simply gives a list with only one element, being the entire infixString itself.
I know next to nothing about regex, so any tips would be appreciated. Thanks!
It's because you're passing a string to split. If you use a literal regex it will output something closer to what you'd expect
infixString.split(/(\d+|[^ 0-9])/)
// Array(23) [ "", "(", "", "(", "", "10", " ", "+", " ", "4", … ]
However there's a bunch of empty elements and white space that you might want to filter out
infixString.split(/(\d+|[^ 0-9])/).filter(e => e.trim().length > 0)
// Array(11) [ "(", "(", "10", "+", "4", ")", "^", "2", ")", "*", … ]
Depending on the version of JavaScript/ECMAScript you're targeting here, the syntax in the filter (or the filter function itself) might need to be adapted.
let test = "((10 + 4) ^ 2) * 5 * -1.5";
let arr = test.replace(/\s+/g, "").match(/(?:(?<!\d)-)?\d+(?:\.\d+)?|./g);
console.log(arr);
code { white-space: nowrap !important }
(?:(?<!\d)-)?\d+(?:\.\d+)?
(?:(?<!\d)-)? — Negative lookbehind. Catching minus sign, only if it is not a subtraction (has no \d digit behind)
(?:\.\d+)? — ?: non capture group, \.\d+ dot and one or more digits, ? optional.
Take the following text:
This is a sentence. This is a sentence... This is a sentence! This is a sentence? This is a sentence.This is a sentence. This is a sentence
I'd like to match this so I have an array like the following:
[
"This is a sentence.",
" ",
"This is a sentence...",
" ",
"This is a sentence!",
" ",
"This is a sentence?",
" ",
"This is a sentence.",
"",
"This is a sentence.",
" ",
"This is a sentence",
]
With my current regex, however:
str.match(/[^.!?]+[.!?]*(\s*)/g);
I get the following:
[
"This is a sentence. ",
"This is a sentence... ",
"This is a sentence! ",
"This is a sentence? ",
"This is a sentence.",
"This is a sentence. ",
"This is a sentence"
]
How can I achieve this with JS ReExp?
Thanks in advance!
Just add [^\s] at the beginning and change (\s*) to |\s+.
The final regex will be like:
str.match(/[^\s][^.!?]+[.!?]*|\s+/g)
[^\s] will remove white spaces from the beginning of the expression
|\s+ will treat white spaces as a new expression
here is solution using you regex in the question, but doing some array spliting afterwards to keep the whitespaces in the array; essentially it will split the array by white spaces if they are in the end of the string ( positive lookahead of $ ) then flatting it again to achieve the exact output you want .
const baseStr = "This is a sentence. This is a sentence... This is a sentence! This is a sentence? This is a sentence.This is a sentence. This is a sentence";
var result = baseStr.match(/[^.!?]+[.!?]*(\s*)/g).map( str => str.split(/(\s*)(?=$)/).filter(_=>_)).flat();
console.log(result);
Normal split works like this:
var a = " a #b c "
console.log(a.split(" "))
["", "a", "b", "c", ""]
But my expected output is: [" a", "#b", "c "] it is possible? And how?
One option is to use a regular expression and require word boundaries before and after the space:
var a = " a b c "
console.log(a.split(/\b \b/));
If non-word characters are allowed as well, you can use match instead - either match spaces at the beginning of the string, followed by non-spaces, or match non-spaces followed by spaces and the end of the string, or match non-spaces without restriction:
const a = " foo #bar c "
console.log(
a.match(/^ *\S+|\S+ *$|\S+/g)
);
Lookbehind is another option, but it's not supported enough to be reliable in production code yet.
How about
a.split(/(?!^) (?!$)/)
If there may be more than one space and lookbehinds are supported then
a.split(/(?<!^ *) +(?! *$)/)
You can trim the string before the split, for example:
var a = " a b c ";
a = a.trim();
console.log(a.split(" "));
update
i was wrong to read the expected output, the result of my suggested code it's:
["a", "b", "c"] and not [" a", "b", "c "]
I have a string which I'd like to split into items contained in an array as the following example:
var text = "I like grumpy cats. Do you?"
// to result in:
var wordArray = ["I", " ", "like", " ", "grumpy", " ", "cats", ".", " ", "Do", " ", "you", "?" ]
I've tried the following expression (and a similar varieties without success
var wordArray = text.split(/(\S+|\W)/)
//this disregards spaces and doesn't separate punctuation from words
In Ruby there's a Regex operator (\b) that splits at any word boundary preserving spaces and punctuation but I can't find a similar for Java Script. Would appreciate your help.
Use String#match method with regex /\w+|\s+|[^\s\w]+/g.
\w+ - for any word match
\s+ - for whitespace
[^\s\w]+ - for matching combination of anything other than whitespace and word character.
var text = "I like grumpy cats. Do you?";
console.log(
text.match(/\w+|\s+|[^\s\w]+/g)
)
Regex explanation here
FYI : If you just want to match single special char then you can use \W or . instead of [^\s\w]+.
The word boundary \b should work fine.
Example
"I like grumpy cats. Do you?".split(/\b/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ". ", "Do", " ", "you", "?"]
Edit
To handle the case of ., we can split it on [.\s] as well
Example
"I like grumpy cats. Do you?".split(/(?=[.\s]|\b)/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ".", " ", "Do", " ", "you", "?"]
(?=[.\s] Positive look ahead, splits just before . or \s
var text = "I like grumpy cats. Do you?"
var arr = text.split(/\s|\b/);
alert(arr);