Finding characters with spaces - javascript

I was trying last week to find parts of a text containing specific words delimited by punctuation characters. That works well.
[^.?!:]*\b(why|how)\b[^.?!]*[.?!]
On the following sentence "How did you do it? bla bla bla! why did you do it?", it's giving me the following output :
"How did you do it?"
"why did you do it?"
Now I am trying to add the hyphen character : I want to detect if there is an hyphen with spaces around (a new sentence delimiter):
"The man went walking upstairs - why was he there?
That would return me : "why was he there?"
It would follow the following rules:
hello - bye -> this would be the only one to be matched
hello-bye -> not matched
hello -bye -> not matched
hello- bye -> not matched
Using the negation, I tried to add that part :
[^.?!:\\s\\-\\s] => ignore everything that ends with a "." or a "?" or a "!" or a ":" or a " - "
I doesn't work, but as I am pretty bad using regex, I am probably missing something obvious.
var regex = /[^.?!:\\s\\-\\s]*\b(why|how)\b[^.?!]*[.?!]/igm
var text = "Here I am - why did you want to see me?"
var match;
while ((match = regex.exec(text)) != null) {
console.log(match);
}
Output :
Here I am - why did you want to see me?
Expected output :
why did you want to see me?

There are two issues that I see:
backslashes (use single inside a regex literal, double in constructor) and
Sequence is used inside a character class (replace [^.?!:\s\-\s] with (?:(?!\s-\s)[^.?!:])*).
You may use
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|how)\b[^.?!]*)[.?!]/ig
where (?:(?!\s-\s)[^.?!:])* is a tempered greedy token matching any character other than ^.?!: that is not starting a whitespace+-+whitespace pattern.
var regex = /(?:(?!\s-\s)[^.?!:])*\b((?:why|where|pourquoi|how)\b[^.?!]*)[.?!]/ig;
var text = "L'Inde a déjà acheté nos rafales, pourquoi la France ne le -dirait-elle pas ?";
var match;
while ((match = regex.exec(text)) != null) {
console.log(match[1]);
}

[ ] is always a character class, which means that at one position, you can match one character. The "negation" in your example is in fact probably not even doing what you thing it does.
What you probably want to match is either the beginning of a string, the end of a sentence, or a dash with two spaces around, so just replace it with (^|[.?!]| - )\b((why|how)...etc). You will need some post processing of the result, as JavaScript does not support look-behind assertions as far as I know.

Given your 4 examples, this works.
/\s-\s(\w*)/g
Test it here - https://regex101.com/r/YQhRBI/1
I'm matching ANY character within the question portion. If you want to match specific key words, you'd swap the (\w*) with ([why|how|who|what|where|when])
I think if you had a paragraph, you'd have to be sure to find a way to terminate the answer portion with a specific delimiter. If this was more along the lines of a question/answer per new line, then you'd need only to end the regex with an end-of-line anchor.

Related

How to add letters / words / characters to a special word in a string in javascript?

Let us consider we have a string str and a function addify() and we can do something like this with it :
var str = "I am #java";
console.log(addify(str, "script");
//=> I am #javascript
So, you may understand what happened ! The addify() finds all the words with the special character # and then adds our desired words or letter or any character to it. Another example :
var str = "I wrote a #s in #javas";
console.log(addify(str, " cript");
//=> I wrote a #script in #javascript
So, can anyone teach me how to make the addify() function ?
Thanks in advance
Find substring in Javascript and prepend/append some characters
StackOverflow to the rescue!
At the provided link you can find a regex example on how you can identify a special character within a provided string and then edit the result using the string .replace() method...
Quick breakdown of the regex: find the (\w+) word after the string # which is then represented as $1 as the second parameter in the string .replace() method where you can modify the string into a new format.
Bonus points: this will only find instances where the string being searched is connected to another word. If you targeted identifier (#) exists alone, then it will not update a blank space.
function addify( str, ending ){
return str.replace(/#(\w+)/g, `#$1${ending}`);
}
console.log( addify( 'i like #cheese', 'burgers' ) );
console.log( addify( 'party # my place', 'not!' ) );

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

Javascript Regex match everything after last occurrence of string

I am trying to match everything after (but not including!) the last occurrence of a string in JavaScript.
The search, for example, is:
[quote="user1"]this is the first quote[/quote]\n[quote="user2"]this is the 2nd quote and some url https://www.google.com/[/quote]\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
Edit: I'm looking to match everything after the last quote block. So I was trying to match everything after the last occurrence of "quote]" ? Idk if this is the best solution but its what i've been trying.
I'll be honest, i suck at this Regex stuff.. here is what i've been trying with the results..
regex = /(quote\].+)(.*)/ig; // Returns null
regex = /.+((quote\]).+)$/ig // Returns null
regex = /( .* (quote\]) .*)$/ig // Returns null
I have made a JSfiddle for anyone to have a play with here:
https://jsfiddle.net/au4bpk0e/
One option would be to match everything up until the last [/quote], and then get anything following it. (example)
/.*\[\/quote\](.*)$/i
This works since .* is inherently greedy, and it will match every up until the last \[\/quote\].
Based on the string you provided, this would be the first capturing group match:
\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
But since your string contains new lines, and . doesn't match newlines, you could use [\s\S] in place of . in order to match anything.
Updated Example
/[\s\S]*\[\/quote\]([\s\S]*)$/i
You could also avoid regex and use the .lastIndexOf() method along with .slice():
Updated Example
var match = '[\/quote]';
var textAfterLastQuote = str.slice(str.lastIndexOf(match) + match.length);
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;
Alternatively, you could also use .split() and then get the last value in the array:
Updated Example
var textAfterLastQuote = str.split('[\/quote]').pop();
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;

Remove everything after the first instance of one of several characters

Say I have a string like:
var str = "Good morningX Would you care for some tea?"
Where the X could be one of several characters, like a ., ?, or !.
How can I remove everything after that character?
If it could only be one type of character, I would use indexOf and substr, but it looks like I need a different method to find the position in this case. Perhaps a regular expression?
Clarification: I do not know what character X is. I'd like to cut the string off at the first occurrence of any one of the specified characters.
Ok, further clarification:
What I'm actually doing is scrubbing posts from a website. I'm taking the first bit from each post and stitching them together. By 'bit', I mean characters before the first piece of punctuation. I need to cut everything off after that punctuation. Does that make sense?
Just replace everything within the [ and ] with your delimiters. Escape if necessary.
var str = "Good morning! Would you care for some tea?";
var beginning = str.split(/[.?!]/)[0];
// "Good morning"
Try this, If the X have this ',' character , then try below
var s = 'Good morning, would you care for some tea?';
s = s.substring(0, s.indexOf(','));
document.write(s);
Demo : http://jsfiddle.net/L4hna/490/
and if the X have '!' , then try below
var s = 'Good morning! would you care for some tea?';
s = s.substring(0, s.indexOf('!'));
document.write(s);
Demo : http://jsfiddle.net/L4hna/491/
Try this way for your requirement string.
Both are will return Good Morning
The below code will do as you expect:
var s = "Good morningX Would you care for some tea?";
s = s.substring(X, n != -1 ? n : s.length);
document.write(s);
http://jsfiddle.net/JEFnY/
The regex would be
str.replace(/(.*?)([\.\?\!])(.*)/i, '$1$2');
The first capturing group is a lazy expression to match everything before the next capturing group.
The second capturing group only looks for the characters that you specify - which in this case are .!?, all escaped.
The last capturing group is discarded. Hence the substitution string is $1$2, or the first two capturing groups together.

Need a regex that finds "string" but not "[string]"

I'm trying to build a regular expression that parses a string and skips things in brackets.
Something like
string = "A bc defg hi [hi] jkl mnop.";
The .match() should return "hi" but not [hi]. I've spent 5 hours running through RE's but I'm throwing in the towel.
Also this is for javascript or jquery if that matters.
Any help is appreciated. Also I'm working on getting my questions formatted correctly : )
EDIT:
Ok I just had a eureka moment and figured out that the original RegExp I was using actually did work. But when I was replaces the matches with the [matches] it simply replaced the first match in the string... over and over. I thought this was my regex refusing to skip the brackets but after much time of trying almost all of the solutions below, I realized that I was derping Hardcore.
When .replace was working its magic it was on the first match, so I quite simply added a space to the end of the result word as follows:
var result = string.match(regex);
var modifiedResult = '[' + result[0].toString() + ']';
string.replace(result[0].toString() + ' ', modifiedResult + ' ');
This got it to stop targeting the original word in the string and stop adding a new set of brackets to it with every match. Thank you all for your help. I am going to give answer credit to the post that prodded me in the right direction.
preprocess the target string by removing everything between brackets before trying to match your RE
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*\]/, "")
then apply your RE to tmpstring
correction: made the match for brackets eager per nhahtd comment below, and also, made the RE global
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*?\]/g, "")
You don't necessarily need regex for this. Simply use string manipulation:
var arr = string.split("[");
var final = arr[0] + arr[1].split("]")[1];
If there are multiple bracketed expressions, use a loop:
while (string.indexOf("[") != -1){
var arr = string.split("[");
string = arr[0] + arr.slice(1).join("[").split("]").slice(1).join("]");
}
Using only Regular Expressions, you can use:
hi(?!])
as an example.
Look here about negative lookahead: http://www.regular-expressions.info/lookaround.html
Unfortunately, javascript does not support negative lookbehind.
I used http://regexpal.com/ to test, abcd[hi]jkhilmnop as test data, hi(?!]) as the regex to find. It matched 'hi' without matching '[hi]'. Basically it matched the 'hi' so long as there was not a following ']' character.
This of course, can be expanded if needed. This has a benefit of not requiring any pre-processing for the string.
r"\[(.*)\]"
Just play arounds with this if you wanto to use regular expressions.
What do yo uwant to do with it? If you want to selectively replace parts like "hi" except when it's "[hi]", then I often use a system where I match what I want to avoid first and then what I want to watch; if it matches what I want to avoid then I return the match, otherwise I return the processed match.
Like this:
return string.replace(/(\[\w+\])|(\w+)/g, function(all, m1, m2) {return m1 || m2.toUpperCase()});
which, with the given string, returns:
"A BC DEFG HI [hi] JKL MNOP."
Thus: it replaces every word with uppercase (m1 is empty), except if the word is between square brackets (m1 is not empty).
This builds an array of all the strings contained in [ ]:
var regex = /\[([^\]]*)\]/;
var string = "A bc defg hi [hi] [jkl] mnop.";
var results=[], result;
while(result = regex.exec(string))
results.push(result[1]);
edit
To answer to the question, this regex returns the string less all is in [ ], and trim whitespaces:
"A bc defg [hi] mnop [jkl].".replace(/(\s{0,1})\[[^\]]*\](\s{0,1})/g,'$1')
Instead of skipping the match you can probably try something different - match everything but do not capture the string within square brackets (inclusive) with something like this:
var r = /(?:\[.*?[^\[\]]\])|(.)/g;
var result;
var str = [];
while((result = r.exec(s)) !== null){
if(result[1] !== undefined){ //true if [string] matched but not captured
str.push(result[1]);
}
}
console.log(str.join(''));
The last line will print parts of the string which do not match the [string] pattern. For example, when called with the input "A [bc] [defg] hi [hi] j[kl]u m[no]p." the code prints "A hi ju mp." with whitespaces intact.
You can try different things with this code e.g. replacing etc.

Categories

Resources