regex to remove certain characters at the beginning and end of a string - javascript

Let's say I have a string like this:
...hello world.bye
But I want to remove the first three dots and replace .bye with !
So the output should be
hello world!
it should only match if both conditions apply (... at the beginning and .bye at the end)
And I'm trying to use js replace method. Could you please help? Thanks

First match the dots, capture and lazy-repeat any character until you get to .bye, and match the .bye. Then, you can replace with the first captured group, plus an exclamation mark:
const str = '...hello world.bye';
console.log(str.replace(/\.\.\.(.*)\.bye/, '$1!'));
The lazy-repeat is there to ensure you don't match too much, for example:
const str = `...hello world.bye
...Hello again! Goodbye.`;
console.log(str.replace(/\.\.\.(.*)\.bye/g, '$1!'));

You don't actually need a regex to do this. Although it's a bit inelegant, the following should work fine (obviously the function can be called whatever makes sense in the context of your application):
function manipulate(string) {
if (string.slice(0, 3) == "..." && string.slice(-4) == ".bye") {
return string.slice(4, -4) + "!";
}
return string;
}
(Apologies if I made any stupid errors with indexing there, but the basic idea should be obvious.)
This, to me at least, has the advantage of being easier to reason about than a regex. Of course if you need to deal with more complicated cases you may reach the point where a regex is best - but I personally wouldn't bother for a simple use-case like the one mentioned in the OP.

Your regex would be
const rx = /\.\.\.([\s\S]*?)\.bye/g
const out = '\n\nfoobar...hello world.bye\nfoobar...ok.bye\n...line\nbreak.bye\n'.replace(rx, `$1!`)
console.log(out)
In English, find three dots, anything eager in group, and ending with .bye.
The replacement uses the first match $1 and concats ! using a string template.

An arguably simpler solution:
const str = '...hello world.bye'
const newStr = /...(.+)\.bye/.exec(str)
const formatted = newStr ? newStr[1] + '!' : str
console.log(formatted)
If the string doesn't match the regex it will just return the string.

Related

Finding punctuation marks in text with string-methods

how can I find out when a punctuation(?!;.) or "<" character comes in the string. I don’t want to use an array or compare any letter, but try to solve it with string methods. Something like that:
var text = corpus.substr(0, corpus.indexOf(".");
Ok, if I explicitly specify a character like a punct, it works fine. The problem with my parsing is that with a long text in a loop, I no longer know how a sentence ends, whether with question marks or exclamation points. I tried following, but it doesn’t work:
var text = corpus.substr(0, corpus.indexOf(corpus.search("."));
I want to loop through a long string and use every punctuation found to use it as the end-of-sentence character.
Do you know how can I solve my problem?
You can start with RegExp and weight it against going character by character and compare ascii codes essentially. Split is another way ( just posted above ).
RegExp solution
function getTextUpToPunc( text ) {
const regExp = /^.+(\!|\?|\.)/mg;
for (let match; (match = regExp.exec( text )) !== null;) {
console.log(match);
}
}
getTextUpToPunc(
"what a chunky funky monkey! this is really someting else"
)
The key advantage here is that you do not need to loop through the entire string and hold control over the iteration by doing regExp.exec( text ).
The split solution posted earlier would work but split will loop over the entire string. Typically that would not be an issue but if your strings are thousands upon thousands of characters and you do this operation a lot that it would make sense to think about performance.
And if this function will be ran many many times, a small performance improvement would be to memoize the RegExp creation:
const regExp = /^.+(\!|\?|\.)/mg;
Into something like this
function getTextUpToPunc( text ) {
if( !this._regExp ) this._regExp = /^.+(\!|\?|\.)/mg;;
const regExp = this._regExp;
for (let match; (match = regExp.exec( text )) !== null;) {
console.log(match);
}
}
Use a regular expression:
var text = corpus.split(/[(?!;.)<]/g);

How to allow only certain words consecutively with Regex in javascript

I'm trying to write a regex that will return true if it matches the format below, otherwise, it should return false. It should only allow words as below:
Positive match (return true)
UA-1234-1,UA-12345-2,UA-34578-2
Negative match (return false or null)
Note: A is missing after U
UA-1234-1,U-12345-2
It should always give me true when the string passed to regex is
UA-1234-1,UA-12345-2,UA-34578-2,...........
Below is what I am trying to do but it is matching only the first element and not returning null.
var pattern=/^UA-[0-9]+(-[0-9]+)?/g;
pattern.match("UA-1234-1,UA-12345-2,UA-34578-2");
pattern.exec("UA-1234-1,UA-12345-2,UA-34578-2)
Thanks in advance. Help is greatly appreciated.
The pattern you need is a pattern enclosed with anchors (^ - start of string and $ - end of string) that matches your pattern at first (the initial "block") and then matches 0 or more occurrences of a , followed with the block pattern.
It looks like /^BLOCK(?:,BLOCK)*$/. You may introduce optional whitespaces in between, e.g. /^BLOCK(?:,\s*BLOCK)*$/.
In the end, the pattern looks like ^UA-[0-9]+(?:-[0-9]+)?(?:,UA-[0-9]+(?:-[0-9]+)?)*$. It is best to build it dynamically to keep it readable and easy to maintain:
const block = "UA-[0-9]+(?:-[0-9]+)?";
let rx = new RegExp(`^${block}(?:,${block})*$`); // RegExp("^" + block + "(?:," + block + ")*$") // for non-ES6
let tests = ['UA-1234-1,UA-12345-2,UA-34578-2', 'UA-1234-1,U-12345-2'];
for (var s of tests) {
console.log(s, "=>", rx.test(s));
}
split the string by commas, and test each element instead.

How to split a string by a character not directly preceded by a character of the same type?

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

Embed comments within JavaScript regex like in Perl

Is there any way to embed a comment in a JavaScript regex, like you can do in Perl? I'm guessing there is not, but my searching didn't find anything stating you can or can't.
You can't embed a comment in a regex literal.
You may insert comments in a string construction that you pass to the RegExp constructor :
var r = new RegExp(
"\\b" + // word boundary
"A=" + // A=
"(\\d+)"+ // what is captured : some digits
"\\b" // word boundary again
, 'i'); // case insensitive
But a regex literal is so much more convenient (notice how I had to escape the \) I'd rather separate the regex from the comments : just put some comments before your regex, not inside.
EDIT 2018: This question and answer are very old. EcmaScript now offers new ways to handle this, and more precisely template strings.
For example I now use this simple utility in node:
module.exports = function(tmpl){
let [, source, flags] = tmpl.raw.toString()
.replace(/\s*(\/\/.*)?$\s*/gm, "") // remove comments and spaces at both ends of lines
.match(/^\/?(.*?)(?:\/(\w+))?$/); // extracts source and flags
return new RegExp(source, flags);
}
which lets me do things like this or this or this:
const regex = rex`
^ // start of string
[a-z]+ // some letters
bla(\d+)
$ // end
/ig`;
console.log(regex); // /^[a-z]+bla(\d+)$/ig
console.log("Totobla58".match(regex)); // [ 'Totobla58' ]
Now with the grave backticky things, you can do inline comments with a little finagling. Note that in the example below there are some assumptions being made about what won't appear in the strings being matched, especially regarding the whitespace. But I think often you can make intentional assumptions like that, if you write the process() function carefully. If not, there are probably creative ways to define the little "mini-language extension" to regexes in such a way as to make it work.
function process() {
var regex = new RegExp("\\s*([^#]*?)\\s*#.*$", "mg");
var output = "";
while ((result = regex.exec(arguments[0])) !== null ){
output += result[1];
}
return output;
}
var a = new RegExp(process `
^f # matches the first letter f
.* # matches stuff in the middle
h # matches the letter 'h'
`);
console.log(a);
console.log(a.test("fish"));
console.log(a.test("frog"));
Here's a codepen.
Also, to the OP, just because I feel a need to say this, this is neato, but if your resulting code turns out just as verbose as the string concatenation or if it takes you 6 hours to figure out the right regexes and you are the only one on your team who will bother to use it, maybe there are better uses of your time...
I hope you know that I am only this blunt with you because I value our friendship.

Regex: match word (but delete commas after OR before)

I have tried to delete an item from a string divided with commas:
var str="this,is,unwanted,a,test";
if I do a simple str.replace('unwanted',''); I end up with 2 commas
if I do a more complex str.replace('unwanted','').replace(',,','');
It might work
But the problem comes when the str is like this:
var str="unwanted,this,is,a,test"; // or "...,unwanted"
However, I could do a 'if char at [0 or str.length] == comma', then remove it
But I really think this is not the way to go, it is absurd I need to do 2 replaces and 2 ifs to achieve what I want
I have heard that regex can do powerful stuff, but I simply can't understand it no matter how hard I try
Important Notes:
It should match after OR before (not both), or we will end with
"this,is,,a,test"
There are no spaces between commas
How about something less flaky than a regex for this sort of replacement?
str = str
.split(',')
.filter(function(token) { return token !== 'unwanted' })
.join(',');
jsFiddle.
However if you are convinced a regex is the best way...
str = str.replace(/(^|,)?unwanted(,|$)?/g, function(all, leading, trailing) {
return leading && trailing ? ',' : '';
});
(thanks Logan F. Smyth.)
jsFiddle.
Since Alex hasn't fixed this in his solution, I wanted to get a fully functional version up somewhere.
var unwanted = 'unwanted';
var regex = new RegExp('(^|,)' + unwanted + '(,|$)', 'g');
str = str.replace(regex, function(a, pre, suf) {
return pre && suf ? ',' : '';
});
The only thing to be careful of when dynamically building a regex, is that the 'unwanted' variable can't have anything in it that could be interpretted as a regex pattern.
There are way easier ways to parse this though, as Alex mentioned. Don't resort to regular expressions unless you have to.

Categories

Resources