remove extra spaces in string in javascript - javascript

I have a text and after deleting special characters (!##$%^&*()-=+`";:'><.?/) and show just letters and numbers (and float numbers like 23.4 ) it returns some extra space
const input : 'this is a signal , entry : 24.30 and side is short';
const text = input.replace(/\.(?!\d)|[^\w.]/g, " ").toUpperCase();
console.log(text.split(" "))
the output :
[
'THIS', 'IS', 'A',
'SIGNAL', '', '',
'', 'ENTRY', '',
'', '24.30', 'AND',
'SIDE', 'IS', 'SHORT'
]
but I want to be this :
[
'THIS', 'IS', 'A',
'SIGNAL', 'ENTRY', '24.30',
'AND', 'SIDE', 'IS',
'SHORT'
]
And when I replace spaces and enters with empty string , returns this :
[ 'THISISASIGNALENTRY24.30ANDSIDEISSHORT' ]
what is the problem of my code?

Instead of replacing, consider matching all the sorts of characters you want to produce the array of words. It looks like you want something like:
const input = 'this is a signal , entry : 24.30 and side is short';
const matches = input.toUpperCase().match(/[\w.]+/g);
console.log(matches);

The second parameter in the replace method needs to be an empty string and not a space as you have it in your code.
Just do:
...
const text = input.replace(/\.(?!\d)|[^\w.]/g, "").toUpperCase();
...

Related

How to lowercase field data in MongoDB find()

In my database collection I have two Objects.
[
{"_id" : 1, name: "notLowercased"},
{"_id" : 2, name: "lowercased"},
]
I'm using find and $regex to find name that includes some string.
data = await CatalogModel.find({name: {$regex : searcher.toString().toLowerCase()}})
For example my input string is "lowercased".
In result I'm getting an array
[
{"_id" : 2, name: "lowercased"},
]
But I want to get in result this:
[
{"_id" : 1, name: "notLowercased"},
{"_id" : 2, name: "lowercased"},
]
I'm understand that it's happening becase name "notLowercased" not lowercased.
How to lowercase name fields in this request?
You can add $options parameter like this: $options: "i".
As explained into docs:
i: Case insensitivity to match upper and lower cases. For an example, see Perform Case-Insensitive Regular Expression Match.
Even you can avoid toLowerCase()
data = await CatalogModel.find({name: {$regex : searcher.toString(), "$options": "i" }})
Example here and without toLowerCase() here

use named regex groups to output an array of matches

I'm trying to get the hang of named capturing groups.
Given a string
var a = '{hello} good {sir}, a [great] sunny [day] to you.';
I'd like to output an array which maintains the integrity of the sentence (complete with punctuation, spaces, etc) so I can reassemble the sentence at a later time:
[
{
group: "braces",
word: "hello"
},
{
group: "other",
word: " good " <-- space on either side is maintained
},
{
group: "braces",
word: "sir"
},
{
group: "other",
word: ", a "
},
{
group: "brackets",
word: "great"
},
{
group: "other",
word: " sunny "
},
{
group: "brackets",
word: "day"
},
{
group: "other",
word: " to you."
},
]
I'm using named capturing groups to try and output this. <braces> captures any text within {}, <brackets> captures any text within [], and <others> captures anything else (\s,.\w+):
var regex = /(?<braces>\{(.*?)\})(?<brackets>\[(.*?)\])(?<others>\s,.\w+)?/g;
console.log(a.match(regex)); outputs nothing.
If I remove <others> group,
var regex = /(?<braces>\{(.*?)\})(?<brackets>\[(.*?)\])?/g;
console.log(a.match(regex)); outputs ["{hello}", "{sir}"]
Question: How do I use capturing groups to find all instances of named groups and output them like the above desired array?
A regex match object will only contain one string for a given named capture group. For what you're trying to do, you'll have to do it in two steps: first separate out the parts of the input, then map it to the array of objects while checking which group was captured to identify the sort of group it needs:
const str = '{hello} good {sir}, a [great] sunny [day] to you.';
const matches = [...str.matchAll(/{([^{]+)}|\[([^\]]+)\]|([^[{]+)/g)]
.map(match => ({
group: match[1] ? 'braces' : match[2] ? 'brackets' : 'other',
word: match[1] || match[2] || match[3]
}));
console.log(matches);

Convert user input string to an object to be accessed by function

I have data in the format (input):
doSomething({
type: 'type',
Unit: 'unit',
attributes: [
{
attribute: 'attribute',
value: form.first_name
},
{
attribute: 'attribute2',
value: form.family_name
}
],
groups: [
{
smth: 'string1',
smth2: 'string2',
start: timeStart.substring(0, 9)
}
]
})
I managed to take out the doSomething part with the parenthesis as to load the function from the corresponding module with
expression.split('({',1)[0]
However using the loaded function with the rest, obtained with:
expression.split(temp+'(')[1].trim().replace(/\n+/g, '').slice(0, -1)
does not work because it should be an object and not a string. Hardcoding the data in does work as it is automatically read as an object.
My question is if there is any way of converting the string that I get from the user and convert it to an object. I have tried to convert it to a json object with JSON.parse but I get an unexpected character t at position 3. Also I have tried new Object(myString) but that did not work either.
What I would like is to have the body of the provided function as an object as if I would hard code it, so that the function can evaluate the different fields properly.
Is there any way to easily achieve that?
EDIT: the "output" would be:
{
type: 'type',
Unit: 'unit',
attributes: [
{
attribute: 'attribute',
value: form.first_name
},
{
attribute: 'attribute2',
value: form.family_name
}
],
groups: [
{
smth: 'string1',
smth2: 'string2',
start: timeStart.substring(0, 9)
}
]
}
as an object. This is the critical part because I have this already but as a string. However the function that uses this, is expecting an object. Like previously mentioned, hard coding this would work, as it is read as an object, but I am getting the input mentioned above as a string from the user.
Aside: I know eval is evil. The user could do by this certain injections. This is only one possibility to do this there are certain other ways.
I just added before "output =", cut from the input-string the "doSomething(" and the last ")". By this I have a normal command-line which I could execute by eval.
I highly not recommend to use eval this way; especially you don't
know what the user will do, so you don't know what could all happen
with your code and data.
let form = {first_name: 'Mickey', family_name: 'Mouse'};
let timeStart = (new Date()).toString();
let input = `doSomething({
type: 'type',
Unit: 'unit',
attributes: [
{
attribute: 'attribute',
value: form.first_name
},
{
attribute: 'attribute2',
value: form.family_name
}
],
groups: [
{
smth: 'string1',
smth2: 'string2',
start: timeStart.substring(0, 9)
}
]
})`;
let pos= "doSomething(".length;
input = 'output = ' + input.substr(pos, input.length-pos-1);
eval(input);
console.log(output);

how to split a string which has multiple repeated keywords in it to an array in javascript?

I has a string like this:
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples';
and now I want to split the string by following keywords:
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
now I want to get result like this:
const result = [
{isKeyword: true, text: 'John Smith'},
{isKeyword: false, text: 'I want to buy '},
{isKeyword: true, text: '100'},
{isKeyword: true, text:'apples'},
{isKeyword: false, text:'\r\nI want to buy'},
{isKeyword: true, text:'200'},
{isKeyword: true, text:'oranges'},
{isKeyword: false, text:'\r\n, and add'},
{isKeyword: true, text:'300'},
{isKeyword: true, text:'apples'}];
Keywords could be lowercase or uppercase, I want to keep the string in array just the same as string.
I also want to keep the array order as the same as the string but identify the string piece in array whether it is a keyword.
How could I get it?
I would start by finding the indexes of all your keywords. From this you can make you can know where all the keywords in the sentence start and stop. You can sort this by the index of where the keyword starts.
Then it's just a matter of taking substrings up to the start of the keywords -- these will be the keyword: false substrings, then add the keyword substring. Repeat until you are done.
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
// find all indexes of a keyword
function getInd(kw, arr) {
let regex = new RegExp(kw, 'gi'), result, pos = []
while ((result = regex.exec(string)) != null)
pos.push([result.index, result.index + kw.length]);
return pos
}
// find all index of all keywords
let positions = keywords.reduce((a, word) => a.concat(getInd(word, string)), [])
positions.sort((a, b) => a[0] - b[0])
// go through the string and make the array
let start = 0, res = []
for (let next of positions) {
if (start + 1 < next[0])
res.push({ isKeyword: false,text: string.slice(start, next[0]).trim()})
res.push({isKeyword: true, text: string.slice(next[0], next[1])})
start = next[1]
}
// get any remaining text
if (start < string.length) res.push({isKeyword: false, text: string.slice(start, string.length).trim()})
console.log(res)
I'm trimming whitespace as I go, but you may want to do something different.
If you are willing to pick a delimiter
Here's a much more succinct way to do this if you are willing to pick a set of delimiters that can't appear in your text for example, use {} below
Here we simply wrap the keywords with the delimiter and then split them out. Grabbing the keyword with the delimiter makes it easy to tell which parts of the split are your keywords:
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
let res = keywords.reduce((str, k ) => str.replace(new RegExp(`(${k})`, 'ig'), '{$1}'), string)
.split(/({.*?})/).filter(i => i.trim())
.map(s => s.startsWith('{')
? {iskeyword: true, text: s.slice(1, s.length -1)}
: {iskeyword: false, text: s.trim()})
console.log(res)
Use a regular expression
rx = new RegExp('('+keywords.join('|')+')')
thus
str.split(rx)

Convert Java tokenizing regex into Javascript

As an answer to my question Tokenizing an infix string in Java, I got the regex (?<=[^\.a-zA-Z\d])|(?=[^\.a-zA-Z\d]. However, now I'm writing the same code in Javascript, and I'm stuck as to how I would get a Javascript regex to do the same thing.
For example, if I have the string sin(4+3)*2, I would need it parsed into ["sin","(","4","+","3",")","*","2"]
What regex would I use to tokenize the string into each individual part.
Before, what I did is I just did a string replace of every possible token, and put a space around it, then split on that whitespace. However, that code quickly became very bloated.
The operators I would need to split on would be the standard math operators (+,-,*,/,^), as well as function names (sin,cos,tan,abs,etc...), and commas
What is a fast, efficient way to do this?
You can take advantage of regular expression grouping to do this. You need a regex that combines the different possible tokens, and you apply it repeatedly.
I like to separate out the different parts; it makes it easier to maintain and extend:
var tokens = [
"sin",
"cos",
"tan",
"\\(",
"\\)",
"\\+",
"-",
"\\*",
"/",
"\\d+(?:\\.\\d*)?"
];
You glue those all together into a big regular expression with | between each token:
var rtok = new RegExp( "\\s*(?:(" + tokens.join(")|(") + "))\\s*", "g" );
You can then tokenize using regex operations on your source string:
function tokenize( expression ) {
var toks = [], p;
rtok.lastIndex = p = 0; // reset the regex
while (rtok.lastIndex < expression.length) {
var match = rtok.exec(expression);
// Make sure we found a token, and that we found
// one without skipping garbage
if (!match || rtok.lastIndex - match[0].length !== p)
throw "Oops - syntax error";
// Figure out which token we matched by finding the non-null group
for (var i = 1; i < match.length; ++i) {
if (match[i]) {
toks.push({
type: i,
txt: match[i]
});
// remember the new position in the string
p = rtok.lastIndex;
break;
}
}
}
return toks;
}
That just repeatedly matches the token regex against the string. The regular expression was created with the "g" flag, so the regex machinery will automatically keep track of where to start matching after each match we make. When it doesn't see a match, or when it does but has to skip invalid stuff to find it, we know there's a syntax error. When it does match, it records in the token array which token it matched (the index of the non-null group) and the matched text. By remembering the matched token index, it saves you the trouble of having to figure out what each token string means after you've tokenized; you just have to do a simple numeric comparison.
Thus calling tokenize( "sin(4+3) * cos(25 / 3)" ) returns:
[ { type: 1, txt: 'sin' },
{ type: 4, txt: '(' },
{ type: 10, txt: '4' },
{ type: 6, txt: '+' },
{ type: 10, txt: '3' },
{ type: 5, txt: ')' },
{ type: 8, txt: '*' },
{ type: 2, txt: 'cos' },
{ type: 4, txt: '(' },
{ type: 10, txt: '25' },
{ type: 9, txt: '/' },
{ type: 10, txt: '3' },
{ type: 5, txt: ')' } ]
Token type 1 is the sin function, type 4 is left paren, type 10 is a number, etc.
edit — if you want to match identifiers like "x" and "y", then I'd probably use a different set of token patterns, with one just to match any identifiers. That'd mean that the parser would not find out directly about "sin" and "cos" etc. from the lexer, but that's OK. Here's an alternative list of token patterns:
var tokens = [
"[A-Za-z_][A-Za-z_\d]*",
"\\(",
"\\)",
"\\+",
"-",
"\\*",
"/",
"\\d+(?:\\.\\d*)?"
];
Now any identifier will be a type 1 token.
I don't know if this will do everything of what you want to achieve, but it works for me:
'sin(4+3)*2'.match(/\d+\.?\d*|[a-zA-Z]+|\S/g);
// ["sin", "(", "4", "+", "3", ")", "*", "2"]
You may replace [a-zA-Z]+ part with sin|cos|tan|etc to support only math functions.
Just offer up a few possibilities:
[a-zA-Z]+|\d+(?:\.\d+)?|.

Categories

Resources