Javascript lookbehind stopped at the first match

Javascript lookbehind stopped at the first match - javascript

I have to parse (in JS) an e-mail address that originally, instead of a # has a dot. Obviously I want to show the # instead of the dot.
var mail = "name.domain.xx"
We have two cases:
name contains some dots itself and they are backslashed:
name\.surname.domain.xx
name contains only regular characters (non dots)
In this topic I found a way to implement the negative lookbehind and this is what I did:
mail = mail.replace(/(\\)?\./, function ($0, $1) { return $1?$0:"#"; });
but it's not working because in case (1) it finds the \., it does not touch it, and of course it stops.
On the other end, if I use the option g, it substitute also the third dot obtaining name.surname#domain#xx
Now, is there a way to say:
I want to look in the whole string but I want to stop in the first match?
I hope I explained myself.
Cheers

I misunderstood when first answering your question. So I have changed my answer.
If you don't put the /g flag, it will just replace the first match, meaning you can look for the first punctuation without a \ in front of it. Second you can replace all the \. belonging to the user part of the email with a regular punctuation.
http://jsfiddle.net/GLVNY/2/
var emailSingle = 'myname.domain.com',
emailDouble = 'my\\.name\\.another\\.name.domain.com',
regAt = /([^\\])\./,
repAt = '$1#',
regPunct = /\\./g,
repPunct = '.';
emailSingle = emailSingle.replace(regAt, repAt).replace(regPunct, repPunct);
emailDouble = emailDouble.replace(regAt, repAt).replace(regPunct, repPunct);
alert('emailSingle: ' + emailSingle + '\nemailDouble: ' + emailDouble);

Related

Mixed results with White Spaces, and add a dash in Javascript?

How do you combine eliminating white-spaces and special characters with only a single '-' character?
Here's a little Background:
When publishing a job to my career section for my company, the ATS will turn a job title for the URL, e.g if a job title is:
Olympia, WA: SLP Full or Part Time it will become olympia-wa-slp-full-or-part-time
I've experimented from other similar questions, but have only come close with this bit of code:
function newTitle(str) {
var x = str.replace(/[\W]/g, '-').toLowerCase();
return x;
now if I run it, the output generated is olympia--wa--slp-full-or-part-time
(has 2 dashes from the extra spaces). What am I not getting right?
I've tried the other following bits:
str.replace(/\s+/g, '');
and
str.replaceAll("[^a-zA-Z]+", " ");
but neither get close to the desired format.
Thanks!

You got pretty close in your first example, just add + after [\W] to match one or more non-word characters. You can also give it a try in Regexr
function newTitle(str) {
var x = str.replace(/[\W]+/g, '-').toLowerCase();
return x;
}
alert(newTitle('Olympia, WA: SLP Full or Part Time'));

What you actually want, it looks like, is to create a slug from a string.
Here is a nice reusable function that also takes care of multiple dashes:
function slugify(s) {
s = s.replace(/[^\w\s-]/g, '').trim().toLowerCase();
s = s.replace(/[-\s]+/g, '-');
return s;
}
console.log(
slugify("Olympia, WA: SLP Full or Part Time")
);

Your last example [^a-zA-Z]+ almost works if you use a dash as the replacement. This uses a negated character class to match not what you specified so that would include whitespaces and special characters.
Note that if you have a job with for example a digit or an underscore that that would also be replaced. Your could expand the character class with what you don't want to be replaced like [^a-zA-Z0-9]+ or if you also want to keep the underscore \W+ as that would match [^a-zA-Z0-9_]
function newTitle(str) {
return str.replace(/[^a-zA-Z]+/g, '-').toLowerCase();
}
console.log(newTitle("Olympia, WA: SLP Full or Part Time"));

Finding characters with spaces

I was trying last week to find parts of a text containing specific words delimited by punctuation characters. That works well.
[^.?!:]*\b(why|how)\b[^.?!]*[.?!]
On the following sentence "How did you do it? bla bla bla! why did you do it?", it's giving me the following output :
"How did you do it?"
"why did you do it?"
Now I am trying to add the hyphen character : I want to detect if there is an hyphen with spaces around (a new sentence delimiter):
"The man went walking upstairs - why was he there?
That would return me : "why was he there?"
It would follow the following rules:
hello - bye -> this would be the only one to be matched
hello-bye -> not matched
hello -bye -> not matched
hello- bye -> not matched
Using the negation, I tried to add that part :
[^.?!:\\s\\-\\s] => ignore everything that ends with a "." or a "?" or a "!" or a ":" or a " - "
I doesn't work, but as I am pretty bad using regex, I am probably missing something obvious.
var regex = /[^.?!:\\s\\-\\s]*\b(why|how)\b[^.?!]*[.?!]/igm
var text = "Here I am - why did you want to see me?"
var match;
while ((match = regex.exec(text)) != null) {
console.log(match);
}
Output :
Here I am - why did you want to see me?
Expected output :
why did you want to see me?

There are two issues that I see:
backslashes (use single inside a regex literal, double in constructor) and
Sequence is used inside a character class (replace [^.?!:\s\-\s] with (?:(?!\s-\s)[^.?!:])*).
You may use
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|how)\b[^.?!]*)[.?!]/ig
where (?:(?!\s-\s)[^.?!:])* is a tempered greedy token matching any character other than ^.?!: that is not starting a whitespace+-+whitespace pattern.
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|where|pourquoi|how)\b[^.?!]*)[.?!]/ig;
var text = "L'Inde a déjà acheté nos rafales, pourquoi la France ne le -dirait-elle pas ?";
var match;
while ((match = regex.exec(text)) != null) {
console.log(match[1]);
}

[ ] is always a character class, which means that at one position, you can match one character. The "negation" in your example is in fact probably not even doing what you thing it does.
What you probably want to match is either the beginning of a string, the end of a sentence, or a dash with two spaces around, so just replace it with (^|[.?!]| - )\b((why|how)...etc). You will need some post processing of the result, as JavaScript does not support look-behind assertions as far as I know.

Given your 4 examples, this works.
/\s-\s(\w*)/g
Test it here - https://regex101.com/r/YQhRBI/1
I'm matching ANY character within the question portion. If you want to match specific key words, you'd swap the (\w*) with ([why|how|who|what|where|when])
I think if you had a paragraph, you'd have to be sure to find a way to terminate the answer portion with a specific delimiter. If this was more along the lines of a question/answer per new line, then you'd need only to end the regex with an end-of-line anchor.

Regex with multiple start and end characters that must be the same

I would like to be able to search for strings inside a special tag in a string in JavaScript. Strings in JavaScript can start with either " or ' character.
Here an example to illustrate what I want to do. My custom tag is called <my-tag. My regex is /('|")*?<my-tag>((.|\n)[^"']*?)<\/my-tag>*?('|")/g. I use this regex pattern on the following strings:
var a = '<my-tag>Hello World</my-tag>'; //is found as expected
var b = "<my-tag>Hello World" + '</my-tag>'; //is NOT found, this is good!
var c = "<my-tag>Hello World</my-tag>"; //is found as expected
var d = '<my-tag>something "special"</my-tag>'; //here the " char causes a problem
var e = "<my-tag>something 'special'</my-tag>"; //here the " char causes a problem
It works well with a and also c where it finds the tag with the containing text. It also does not find the text in b which is what I want. But in case d and e the tag with content is not found due to the occurrence of the " and ' character. What I want is a regex where inside the tag " is allowed if the string is start with ', and vice versa.
Is it possible to achieve this with one regex, or is the only thing I can do is to work with two separate regex expressions like
/(")*?<my-tag>((.|\n)[^']*?)<\/my-tag>*?(")/g and /(')*?<my-tag>((.|\n)[^"]*?)<\/my-tag>*?(')/g ?

It's not pretty, but I think this would work:
/("<my-tag>((.|\n)[^"]*?)<\/my-tag>"|'<my-tag>((.|\n)[^']*?)<\/my-tag>')/g

You should be able to use de match from the first match ('|") and reuse it for the second match. Something like the following:
/('|")<my-tag>.*?<\/my-tag>\1/g
This should make sure to match the same character at the beginning and the end.
But you really shouldn't use regex for parsing HTML.

Javascript Regex match everything after last occurrence of string

I am trying to match everything after (but not including!) the last occurrence of a string in JavaScript.
The search, for example, is:
[quote="user1"]this is the first quote[/quote]\n[quote="user2"]this is the 2nd quote and some url https://www.google.com/[/quote]\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
Edit: I'm looking to match everything after the last quote block. So I was trying to match everything after the last occurrence of "quote]" ? Idk if this is the best solution but its what i've been trying.
I'll be honest, i suck at this Regex stuff.. here is what i've been trying with the results..
regex = /(quote\].+)(.*)/ig; // Returns null
regex = /.+((quote\]).+)$/ig // Returns null
regex = /( .* (quote\]) .*)$/ig // Returns null
I have made a JSfiddle for anyone to have a play with here:
https://jsfiddle.net/au4bpk0e/

One option would be to match everything up until the last [/quote], and then get anything following it. (example)
/.*\[\/quote\](.*)$/i
This works since .* is inherently greedy, and it will match every up until the last \[\/quote\].
Based on the string you provided, this would be the first capturing group match:
\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
But since your string contains new lines, and . doesn't match newlines, you could use [\s\S] in place of . in order to match anything.
Updated Example
/[\s\S]*\[\/quote\]([\s\S]*)$/i
You could also avoid regex and use the .lastIndexOf() method along with .slice():
Updated Example
var match = '[\/quote]';
var textAfterLastQuote = str.slice(str.lastIndexOf(match) + match.length);
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;
Alternatively, you could also use .split() and then get the last value in the array:
Updated Example
var textAfterLastQuote = str.split('[\/quote]').pop();
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;

Remove everything after the first instance of one of several characters

Say I have a string like:
var str = "Good morningX Would you care for some tea?"
Where the X could be one of several characters, like a ., ?, or !.
How can I remove everything after that character?
If it could only be one type of character, I would use indexOf and substr, but it looks like I need a different method to find the position in this case. Perhaps a regular expression?
Clarification: I do not know what character X is. I'd like to cut the string off at the first occurrence of any one of the specified characters.
Ok, further clarification:
What I'm actually doing is scrubbing posts from a website. I'm taking the first bit from each post and stitching them together. By 'bit', I mean characters before the first piece of punctuation. I need to cut everything off after that punctuation. Does that make sense?

Just replace everything within the [ and ] with your delimiters. Escape if necessary.
var str = "Good morning! Would you care for some tea?";
var beginning = str.split(/[.?!]/)[0];
// "Good morning"

Try this, If the X have this ',' character , then try below
var s = 'Good morning, would you care for some tea?';
s = s.substring(0, s.indexOf(','));
document.write(s);
Demo : http://jsfiddle.net/L4hna/490/
and if the X have '!' , then try below
var s = 'Good morning! would you care for some tea?';
s = s.substring(0, s.indexOf('!'));
document.write(s);
Demo : http://jsfiddle.net/L4hna/491/
Try this way for your requirement string.
Both are will return Good Morning

The below code will do as you expect:
var s = "Good morningX Would you care for some tea?";
s = s.substring(X, n != -1 ? n : s.length);
document.write(s);

http://jsfiddle.net/JEFnY/
The regex would be
str.replace(/(.*?)([\.\?\!])(.*)/i, '$1$2');
The first capturing group is a lazy expression to match everything before the next capturing group.
The second capturing group only looks for the characters that you specify - which in this case are .!?, all escaped.
The last capturing group is discarded. Hence the substitution string is $1$2, or the first two capturing groups together.

Develop Reference

JavaScript is the programming language of the Web.

Javascript lookbehind stopped at the first match - javascript

Related

Mixed results with White Spaces, and add a dash in Javascript?

Finding characters with spaces

Regex with multiple start and end characters that must be the same

Javascript Regex match everything after last occurrence of string

Remove everything after the first instance of one of several characters

Categories

Resources