javascript regex to remove whitespace fails, why? - javascript

I use text.replace(/\s/g, '') to remove all whitespace characters from a String.
I'm trying this on a russian text. I do an alert(text) which shows me the correct string, but the replace function throws this error - Bad Argument /\s/g
I'm creating .jsx files for Adobe InDesign scripting. The replace method works for some strings but fails sometimes. Any idea why?
Thanks.
EDIT
for (var i=0; i<arr.length; i++) {
// If there is no text for the current entry, remove it
alert(arr[i].text);
if (arr[i].text == undefined || arr[i].text === "") {
arr.splice(i,1);
i--;
continue;
}
var trimmed = arr[i].text.replace(/\s/g, '');
if (trimmed.text === "") {
entries.splice(i,1);
i--;
}
.
.
.
}

You need to escape ("\\") if there are any regex special characters like $, ^, etc... in your text.
-Try to post the fiddle or paste the failing text, we can check the issue.

My bad - this is my edited answer.
var str = "Hello this is my test string";
var newStr = str.replace(/ /g, '');
alert(newStr) // "Hellothisismyteststring";

I was using .text( ) to populate the text objects. I learnt that this function converts space to non breaking space (character 160).
Had to strip that too...
text.replace(/ |\s+/g)

Related

Escaping apostrophes and the like in JavaScript [duplicate]

I want to remove all special characters except space from a string using JavaScript.
For example,
abc's test#s
should output as
abcs tests.
You should use the string replace function, with a single regex.
Assuming by special characters, you mean anything that's not letter, here is a solution:
const str = "abc's test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));
You can do it specifying the characters you want to remove:
string = string.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Alternatively, to change all characters except numbers and letters, try:
string = string.replace(/[^a-zA-Z0-9]/g, '');
The first solution does not work for any UTF-8 alphabet. (It will cut text such as Привіт). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).
search all not (word characters || space):
str.replace(/[^\w ]/, '')
I don't know JavaScript, but isn't it possible using regex?
Something like [^\w\d\s] will match anything but digits, characters and whitespaces. It would be just a question to find the syntax in JavaScript.
I tried Seagul's very creative solution, but found it treated numbers also as special characters, which did not suit my needs. So here is my (failsafe) tweak of Seagul's solution...
//return true if char is a number
function isNumber (text) {
if(text) {
var reg = new RegExp('[0-9]+$');
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '')) {
result += text[i];
}
}
return result;
}
return '';
}
const str = "abc's#thy#^g&test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));
Try to use this one
var result= stringToReplace.replace(/[^\w\s]/g, '')
[^] is for negation, \w for [a-zA-Z0-9_] word characters and \s for space,
/[]/g for global
With regular expression
let string = "!#This tool removes $special *characters* /other/ than! digits, characters and spaces!!!$";
var NewString= string.replace(/[^\w\s]/gi, '');
console.log(NewString);
Result //This tool removes special characters other than digits characters and spaces
Live Example : https://helpseotools.com/text-tools/remove-special-characters
dot (.) may not be considered special. I have added an OR condition to Mozfet's & Seagull's answer:
function isNumber (text) {
reg = new RegExp('[0-9]+$');
if(text) {
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '') || (lower[i].trim() === '.')) {
result += text[i];
}
}
return result;
}
return '';
}
Try this:
const strippedString = htmlString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);
const input = `#if_1 $(PR_CONTRACT_END_DATE) == '23-09-2019' #
Test27919<alerts#imimobile.com> #elseif_1 $(PR_CONTRACT_START_DATE) == '20-09-2019' #
Sender539<rama.sns#gmail.com> #elseif_1 $(PR_ACCOUNT_ID) == '1234' #
AdestraSID<hello#imimobile.co> #else_1#Test27919<alerts#imimobile.com>#endif_1#`;
const replaceString = input.split('$(').join('->').split(')').join('<-');
console.log(replaceString.match(/(?<=->).*?(?=<-)/g));
Whose special characters you want to remove from a string, prepare a list of them and then user javascript replace function to remove all special characters.
var str = 'abc'de#;:sfjkewr47239847duifyh';
alert(str.replace("'","").replace("#","").replace(";","").replace(":",""));
or you can run loop for a whole string and compare single single character with the ASCII code and regenerate a new string.

Delete special characters from an ng-repeat list (parsed from CSV) [duplicate]

I want to remove all special characters except space from a string using JavaScript.
For example,
abc's test#s
should output as
abcs tests.
You should use the string replace function, with a single regex.
Assuming by special characters, you mean anything that's not letter, here is a solution:
const str = "abc's test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));
You can do it specifying the characters you want to remove:
string = string.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Alternatively, to change all characters except numbers and letters, try:
string = string.replace(/[^a-zA-Z0-9]/g, '');
The first solution does not work for any UTF-8 alphabet. (It will cut text such as Привіт). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).
search all not (word characters || space):
str.replace(/[^\w ]/, '')
I don't know JavaScript, but isn't it possible using regex?
Something like [^\w\d\s] will match anything but digits, characters and whitespaces. It would be just a question to find the syntax in JavaScript.
I tried Seagul's very creative solution, but found it treated numbers also as special characters, which did not suit my needs. So here is my (failsafe) tweak of Seagul's solution...
//return true if char is a number
function isNumber (text) {
if(text) {
var reg = new RegExp('[0-9]+$');
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '')) {
result += text[i];
}
}
return result;
}
return '';
}
const str = "abc's#thy#^g&test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));
Try to use this one
var result= stringToReplace.replace(/[^\w\s]/g, '')
[^] is for negation, \w for [a-zA-Z0-9_] word characters and \s for space,
/[]/g for global
With regular expression
let string = "!#This tool removes $special *characters* /other/ than! digits, characters and spaces!!!$";
var NewString= string.replace(/[^\w\s]/gi, '');
console.log(NewString);
Result //This tool removes special characters other than digits characters and spaces
Live Example : https://helpseotools.com/text-tools/remove-special-characters
dot (.) may not be considered special. I have added an OR condition to Mozfet's & Seagull's answer:
function isNumber (text) {
reg = new RegExp('[0-9]+$');
if(text) {
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '') || (lower[i].trim() === '.')) {
result += text[i];
}
}
return result;
}
return '';
}
Try this:
const strippedString = htmlString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);
const input = `#if_1 $(PR_CONTRACT_END_DATE) == '23-09-2019' #
Test27919<alerts#imimobile.com> #elseif_1 $(PR_CONTRACT_START_DATE) == '20-09-2019' #
Sender539<rama.sns#gmail.com> #elseif_1 $(PR_ACCOUNT_ID) == '1234' #
AdestraSID<hello#imimobile.co> #else_1#Test27919<alerts#imimobile.com>#endif_1#`;
const replaceString = input.split('$(').join('->').split(')').join('<-');
console.log(replaceString.match(/(?<=->).*?(?=<-)/g));
Whose special characters you want to remove from a string, prepare a list of them and then user javascript replace function to remove all special characters.
var str = 'abc'de#;:sfjkewr47239847duifyh';
alert(str.replace("'","").replace("#","").replace(";","").replace(":",""));
or you can run loop for a whole string and compare single single character with the ASCII code and regenerate a new string.

How to replace all newlines after reading a file

How can I replace a newline in a string with a ','? I have a string that is read from a file:
const fileText = (<FileReader>fileLoadedEvent.target).result.toString();
file.readCSV(fileText);
It takes a string from a file:
a,b,c,d,e,f
,,,,,
g,h,i,j,k,l
I'm able to detect the newline with this:
if (char === '\n')
But replacing \n like this doesn't work
str = csvString.replace('/\n/g');
I want to get the string to look like this:
a,b,c,d,e,f,
,,,,,,
g,h,i,j,k,l,
You can add , at end of each line like this
$ - Matches end of line
let str = `a,b,c,d,e,f
,,,,,
g,h,i,j,k,l`
let op = str.replace(/$/mg, "$&"+ ',')
console.log(op)
Try replacing the pattern $ with ,, comma:
var input = 'a,b,c,d,e,f';
input = input.replace(/$/mg, ",");
console.log(input);
Since you intend to retain the newlines/carriage returns, we can just take advantage of $ to represent the end of each line.
let text = `a,b,c,d,e,f
,,,,,
g,h,i,j,k,l`;
let edited = text.replace(/\s+/g, '');
console.log( edited )
You can try this solution also. \s means white spaces.
You may try out like,
// Let us have some sentences havin linebreaks as \n.
let statements = " Programming is so cool. \n We love to code. \n We can built what we want. \n :)";
// We will console it and see that they are working fine.
console.log(statements);
// We may replace the string via various methods which are as follows,
// FIRST IS USING SPLIT AND JOIN
let statementsWithComma1 = statements.split("\n").join(",");
// RESULT
console.log("RESULT1 : ", statementsWithComma1);
// SECOND IS USING REGEX
let statementsWithComma2 = statements.replace(/\n/gi, ',');
// RESULT
console.log("RESULT2 : ", statementsWithComma2);
// THIRS IS USING FOR LOOP
let statementsWithComma3 = "";
for(let i=0; i < statements.length; i++){
if(statements[i] === "\n")
statementsWithComma3 += ','
else
statementsWithComma3 += statements[i]
}
// RESULT
console.log("RESULT3 : ", statementsWithComma3);
I believe in some systems newline is \r\n or just \r, so give /\r?\n|\r/ a shot

Regex to validate a texarea input which must be URLs separated by new lines

I am trying to create a regex which will ultimately be used with Google Forms to validate a texarea input.
The rule is,
Input area can have one or more URLs (http or https)
Each URL must be separated either by one or more new lines
Each line which has text, must be a single valid URL
Last URL may have or may not have new line character/s after it
Till now, I have written this regex ^(https?://.+[\r\n]+)*(https?://.+[\r\n]+?)$ but the problem is that if a line has more than 1 url, it validates that too.
Here is my testing playground: http://goo.gl/YPdvBH.
Here is what you are looking for
Demo , Demo with your URLS
function validate(ele) {
str = ele.value;
str = str.replace(/\r/g, "");
while (/\s\n/.test(str)) {
str = str.replace(/\s\n/g, "\n");
}
while (/\n\n/.test(str)) {
str = str.replace(/\n\n/g, "\n");
}
ele.value = str;
str = str.replace(/\n/g, "_!_&_!_").split("_!_&_!_")
var result = [], counter = 0;
for (var i = 0; i < str.length; i++) {
str[i] = str[i].replace(/(?:(?:^|\n)\s+|\s+(?:$|\n))/g, '').replace(/\s+/g, ' ');
if(str[i].length !== 0){
if (isValidAddress(str[i])) {
result.push(str[i]);
}
counter += 1;
}
}
function isValidAddress(s) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(s)
}
return (result.length === str.length);
}
var ele = document.getElementById('urls');
validate(ele);
This is closer to the regex you are looking for:
^(https?://[\S]+[\r\n]+)*(https?://[\S]+[\r\n]+?)$
The difference between your regex and this one is that you use .+ which will match all characters except newline whereas I use [\S]+ (note it is a capital S) which will match all non-whitespace characters. So, this doesn't match more than one token on one line. Hence, on each line you can match at max one token and that must be of the form that you have defined.
For a regex to match a single URL, look at this question on StackOverflow:
What is the best regular expression to check if a string is a valid URL?
I don't know whether google-forms have a length limit. But if they have, it is sure to almost bounce into it.
If i understand right - in your regexp missing m flag for multiline, so you need something like this
/^(https?://.+this your reg exp for one url)$/m
sample with regexp from Javascript URL validation regex
/^(ht|f)tps?:\/\/[a-z0-9-\.]+\.[a-z]{2,4}\/?([^\s<>\#%"\,\{\}\\|\\\^\[\]`]+)?$/m

removing BBcode from textarea with Javascript

I'm creating a small javscript for phpBB3 forum, that counts how much character you typed in.
But i need to remove the special characters(which i managed to do so.) and one BBcode: quote
my problem lies with the quote...and the fact that I don't know much about regex.
this is what I managed to do so far but I'm stranded:
http://jsfiddle.net/emjkc/
var text = '';
var char = 0;
text = $('textarea').val();
text = text.replace(/[&\/\\#,+()$~%.'":*?<>{}!?(\r\n|\n|\r)]/gm, '');
char = text.length;
$('div').text(char);
$('textarea').bind('input propertychange', function () {
text = $(this).val();
text = text.replace(/[&\/\\#,+()$~%.'":*?<>{}!?\-\–_;(\r\n|\n|\r)]/gm, '');
char = text.length;
$('div').text(char);
});
You'd better write a parser for that, however if you want to try with regexes, this should do the trick:
text = $('textarea').val();
while (text.match(/\[quote.*\[\/quote\]/i) != null) {
//remove the least inside the innermost found quote tags
text = text.replace(/^(.*)\[quote.*?\[\/quote\](.*)$/gmi, '\$1\$2');
}
// now strip anything non-character
text = text.replace(/[^a-z0-9]/gmi, '');
I'm not sure if this would work, but I think you can replace all bbcodes with a regex like this:
var withoutBBCodes = message.replace(/\[[^\]]*\]/g,"");
It just replaces everything like [any char != ']' goes here]
EDIT: sorry, didn't see that you only want to replace [quote] and not all bbcodes:
var withoutBBQuote = message.replace(/\[[\/]*quote[^\]]*\]/g,"");
EDIT: ok, you also want quoted content removed:
while (message.indexOf("[quote") != -1) {
message = message.replace(/\[quote[^\]]*\]((?!\[[[\/]*quote).)*\[\/quote\]/g,"");
}
I know you already got a solution thanks to #guido but didn't want to leave this answer wrong.

Categories

Resources