Replace multiple characters by one character with regex - javascript

I have this string :
var str = '#this #is____#a test###__'
I want to replace all the character (#,_) by (#) , so the excepted output is :
'#this #is#a test#'
Note :
I did not knew How much sequence of (#) or (_) in the string
what I try :
I try to write :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]/g,'#')
alert(str)
But the output was :
#this #is## ###a test#####
my try online
I try to use the (*) for sequence But did not work :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]*/g,'#')
alert(str)
so How I can get my excepted output ?

A well written RegEx can handle your problem rather easily.
Quoting Mohit's answer to have a startpoint:
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );
Line 2:
Put in formattedStr the result of the replace method on str.
How does replace work? The first parameter is a string or a RegEx.
Note: RegExps in Javascripts are Objects of type RegExp, not strings. So writing
/yourRegex/
or
New RegExp('/yourRegex/')
is equivalent syntax.
Now let's discuss this particular RegEx itself.
The leading and trailing slashes are used to surround the pattern, and the g at the end means "globally" - do not stop after the first match.
The square parentheses describe a set of characters who can be supplied to match the pattern, while the + sign means "1 or more of this group".
Basically, ### will match, but also # or #####_# will, because _ and # belong to the same set.
A preferred behavior would be given by using (#|_)+
This means "# or _, then, once found one, keep looking forward for more or the chosen pattern".
So ___ would match, as well as #### would, but __## would be 2 distinct match groups (the former being __, the latter ##).
Another problem is not knowing wheter to replace the pattern found with a _ or a #.
Luckily, using parentheses allows us to use a thing called capturing groups. You basically can store any pattern you found in temporary variabiles, that can be used in the replace pattern.
Invoking them is easy, propend $ to the position of the matched (pattern).
/(foo)textnotgetting(bar)captured(baz)/ for example would fill the capturing groups "variables" this way:
$1 = foo
$2 = bar
$3 = baz
In our case, we want to replace 1+ characters with the first occurrence only, and the + sign is not included in the parentheses!
So we can simply
str.replace("/(#|_)+/g", "$1");
In order to make it work.
Have a nice day!

Your regex replaces single instance of any matched character with character that you specified i.e. #. You need to add modifier + to tell it that any number of consecutive matching characters (_,#) should be replaced instead of each character individually. + modifier means that 1 or more occurrences of specified pattern is matched in one go. You can read more about modifiers from this page:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );

You should use the + to match one-or-more occurrences of the previous group.
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]+/g,'#')
alert(str)

Related

How to use regex with an array of keywords to replace?

I am trying to create a loop that will replace certain words with their uppercase version. However I cannot seem to get it to work with capture groups as I need to only uppercase words surrounded by whitespace or a start-line marker. If I understand correctly \b is the boundary matcher? The list below is shortened for convenience.
raw_text = 'crEate Alter Something banana'
var lower_text = raw_text.toLowerCase();
var sql_keywords = ['ALTER', 'ANY', 'CREATE']
for (i = 0; i < sql_keywords.length; i++){
search_key = '(\b)' + sql_keywords[i].toLowerCase() + '(\b)';
replace_key = sql_keywords[i].toUpperCase();
lower_text = lower_text.replace(search_key, '$1' + replace_key + '$2');
}
It loops fine but the replace fails. I assume I have formatted it incorrectly but I cannot work out how to correctly format it. To be clear, it is searching for a word surrounded by either line start or a space, then replacing the word with the upper case version while keeping the boundaries preserved.
Several issues:
A backslash inside a string literal is an escape character, so if you intend to have a literal backslash (for the purpose of generating regex syntax), you need to double it
You did not create a regular expression. A dynamic regular expression is created with a call to RegExp
You would want to provide regex option flags, including g for global, and you might as well ease things by adding the i (case insensitive) flag.
There is no reason to make a capture group of a \b as it represents no character from the input. So even if your code would work, then $1 and $2 would just resolve to empty strings -- they serve no purpose.
You are casting the input to all lower case, so you will lose the capitalisation on words that are not matched.
It will be easier when you create one regular expression for all at the same time, and use the callback argument of replace:
var raw_text = 'crEate Alter Something banana';
var sql_keywords = ['ALTER','ANY','CREATE'];
var regex = RegExp("\\b(" + sql_keywords.join("|") + ")\\b", "gi");
var result = raw_text.replace(regex, word => word.toUpperCase());
console.log(result);
BTW, you probably also want to match reserved words when they are followed by punctuation, such as a comma. \b will match any switch between alphanumerical and non-alphanumerical, and vice versa, so that seems fine.
You can use the RegExp constructor.
Then make a function:
const listRegexp = list => new RegExp(list.map(word => `(${word})`).join("|"), "gi");
Then use it:
const re = listRegexp(sql_keywords);
Then replace:
const output = raw_text.replace(r, x => x.toUpperCase())

Javascript: Remove trailing chars from string if they are non-numeric

I am passing codes to an API. These codes are alphanumeric, like this one: M84.534D
I just found out that the API does not use the trailing letters. In other words, the API is expecting M84.534, no letter D at the end.
The problem I am having is that the format is not the same for the codes.
I may have M84.534DAC, or M84.534.
What I need to accomplish before sending the code is to remove any non-numeric characters from the end of the code, so in the examples:
M84.534D -> I need to pass M84.534
M84.534DAC -> I also need to pass M84.534
Is there any function or regex that will do that?
Thank you in advance to all.
You can use the regex below. It will remove anything from the end of the string that is not a number
let code = 'M84.534DAC'
console.log(code.replace(/[^0-9]+?$/, ""));
[^0-9] matches anything that is not a numer
+? Will match between 1 and unlimited times
$ Will match the end of the string
So linked together, it will match any non numbers at the end of the string, and replace them with nothing.
You could use the following expression:
\D*$
As in:
var somestring = "M84.534D".replace(/\D*$/, '');
console.log(somestring);
Explanation:
\D stands for not \d, the star * means zero or more times (greedily) and the $ anchors the expression to the end of the string.
Given your limited data sample, this simple regular expression does the trick. You just replace the match with an empty string.
I've used document.write just so we can see the results. You use this whatever way you want.
var testData = [
'M84.534D',
'M84.534DAC'
]
regex = /\D+$/
testData.forEach((item) => {
var cleanValue = item.replace(regex, '')
document.write(cleanValue + '<br>')
})
RegEx breakdown:
\D = Anything that's not a digit
+ = One or more occurrences
$ = End of line/input

Regex match Array words with dash

I want to match some keywords in the url
var parentURL = document.referrer;
var greenPictures = /redwoods-are-big.jpg|greendwoods-are-small.jpg/;
var existsGreen = greenPictures.test(parentURL);
var existsGreen turns true when it finds greendwoods-are-small.jpg but also when it finds small.jpg
What can i do that it only turns true if there is exactly greendwoods-are-small.jpg?
You can use ^ to match the beginning of a string and $ to match the end:
var greenPictures = /^(redwoods-are-big.jpg|greendwoods-are-small.jpg)$/;
var existsGreen = greenPictures.test(parentURL);
But of cause the document.referrer is not equal ether redwoods-are-big.jpg or greendwoods-are-small.jpg so i would match /something.png[END]:
var greenPictures = /\/(redwoods-are-big\.jpg|greendwoods-are-small\.jpg)$/; // <-- See how I escaped the / and the . there? (\/ and \.)
var existsGreen = greenPictures.test(parentURL);
Try this regex:
/(redwoods-are-big|greendwoods-are-small)\.jpg/i
I used the i flag for ignoring the character cases in parentURL variable.
Description
Demo
http://regex101.com/r/aI4yJ6
Dashes does not have any special meaning outside character sets, e.g.:
[a-f], [^x-z] etc.
The characters with special meaning in your regexp is | and .
/redwoods-are-big.jpg|greendwoods-are-small.jpg/
| denotes either or.
. matches any character except the newline characters \n \r \u2028 or \u2029.
In other words: There is something else iffy going on in your code.
More on RegExp.
Pages like these can be rather helpful if you struggle with writing regexp's:
regex101 (with sample)
RegexPlanet
RegExr
Debuggex
etc.

Regular expression with asterisk quantifier

This documentation states this about the asterisk quantifier:
Matches the preceding character 0 or more times.
It works in something like this:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
var str = "<html>";
console.log(str.match(regex));
The result of the above is : <html>
But when tried on the following code to get all the "r"s in the string below, it only returns the first "r". Why is this?
var regex = /r*/;
var str = "rodriguez";
console.log(str.match(regex));
Why, in the first example does it cause "the preceding" character/token to be repeated "0 or more times" but not in the second example?
var regex = /r*/;
var str = "rodriguez";
The regex engine will first try to match r in rodriguez from left to right and since there is a match, it consumes this match.
The regex engine then tries to match another r, but the next character is o, so it stops there.
Without the global flag g (used as so var regex = /r*/g;), the regex engine will stop looking for more matches once the regex is satisfied.
Try using:
var regex = /a*/;
var str = "cabbage";
The match will be an empty string, despite having as in the string! This is because at first, the regex engine tries to find a in cabbage from left to right, but the first character is c. Since this doesn't match, the regex tries to match 0 times. The regex is thus satisfied and the matching ends here.
It might be worth pointing out that * alone is greedy, which means it will first try to match as many as possible (the 'or more' part from the description) before trying to match 0 times.
To get all r from rodriguez, you will need the global flag as mentioned earlier:
var regex = /r*/g;
var str = "rodriguez";
You'll get all the r, plus all the empty strings inside, since * also matches 'nothing'.
Use global switch to match 1 or more r anywhere in the string:
var regex = /r+/g;
In your other regex:
var regex = /<[A-Za-z][A-Za-z0-9]*>/;
You're matching literal < followed by a letter followed by 0 or more letter or digits and it will perfectly match <html>
But if you have input as <foo>:<bar>:<abc> then it will just match <foo> not other segments. To match all segments you need to use /<[A-Za-z][A-Za-z0-9]*>/g with global switch.

how to regex a string between two tokens in Javascript?

Asked many times, but I can't get it to work...
I have strings like:
"text!../tmp/widgets/tmp_widget_header.html"
and am trying like this to extract widget_header:
var temps[i] = "text!../tmp/widgets/tmp_widget_header.html";
var thisString = temps[i].regexp(/.*tmp_$.*\.*/) )
but that does not work.
Can someone tell me what I'm doing wrong here?
Thanks!
This prints widget_header:
var s = "text!../tmp/widgets/tmp_widget_header.html";
var matches = s.match(/tmp_(.*?)\.html/);
console.log(matches[1]);
var s = "text!../tmp/widgets/tmp_widget_header.html",
re = /\/tmp_([^.]+)\./;
var match = re.exec(s);
if (match)
alert(match[1]);
This will match:
a / character
the characters tmp_
one or more of any character that is not the . character. These are captured.
a . character
If a match was found, it will be at index 1 of the resulting Array.
In your code:
var temps[i] = "text!../tmp/widgets/tmp_widget_header.html";
var thisString = temps[i].regexp(/.*tmp_$.*\.*/) )
You are saying:
"Match any string that starts with any number of any characters, followed by "tmp_", followed by the end of input, followed by any number of periods."
.* : Any number of any character (except newline)
tmp_ : Literally "tmp_"
$ : End of input/newline - this will never be true in this position
\. : " . ", a period
\.* : Any number of periods
Plus when using the regex() function you need to pass a string, using string notation like var re = new RegExp("ab+c") or var re = new RegExp('ab+c') not in regex notation using slash. You also have either an extra, or missing parenthesis, and no characters are actually being captured.
What you want to do is:
"Find a string that preceded by the begining of input, followed by one or more of any character, followed by "tmp_"; followed by a single period, followed by one or more of any character, followed by the end of input;t that contains one or more of any character. Capture that string."
So:
var string = "text!../tmp/widgets/tmp_widget_header.html";
var re = /^.+tmp_(.+)\..+$/; //I use the simpler slash notation
var out = re.exec(string); //execute the regex
console.log(out[1]); //Note that out is an array, the first (here only) catpture sting is at index 1
This regex /^.+tmp_(.+)\..+$/ means:
^ : Match beginning of input/line
.+ : One or more of any character (except newline), "+" is one or more
tmp_ : Constant "tmp_"
\. : A single period
.+ : As above
$ : End of input/line
You could also use this as RegEx('^.+tmp_(.+)\..+$'); not that when we use RegEx(); we do not have the slash marks, instead we use quote marks (single or double will work), to pass it as a string.
Now this would also match var string = "Q%$#^%$^%$^%$^43etmp_ebeb.45t4t#$^g" and out == 'ebeb'. So depending on the specific use you may wish to replace any " . " used to signify any character (except newline) with bracketed "[ ]" character lists, as this may filter out unwanted results. You milage may vary.
For more information visit: https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions

Categories

Resources