On JavaScript regex alternation and character sets - javascript

Maybe I'm stupid, but this problem really got me baffled.
var text = "aaa\naaa\naaa";
console.log("A: " + text.match(/(.|[\r\n])+/)[0]);
console.log("B: " + text.match(/[\r\n.]+/)[0]);
Output:
A: aaa
aaa
aaa
B:
I really don't see why. I think they should do the same thing (besides grouping stuff).
Another question which might be related:
I have a string read from a file created in a Windows system. I tried to match everything.
/[\n\r.]+/ matches the entire string.
/[\n\r.]+/g does not (I got a lot of '\r\n' in the returned array).
but both /[\s\S]+/ and /[\s\S]+/g matches the entire string.
What's the problem?

(.|[\r\n]) this pattern means any charcter | carriage return/new line
[\r\n.] this one means carriage return/new line/literal dot

Related

Split string into array but still keep it's comma when applying join. - Javascript

I'm having a hard time remembering how to split a string while still keeping the comma in the string and splitting special cases as well.
Example of what I'm trying to do is this:
> Input: "Welcome, <p>how<b>are you</p>do-ing</b>?"
> Output: ["Welcome,", " ", "<p>", "how", "<b>", "are you", "</b>", "</p>", "do-ing", "</b>", "?"]
What I have tried:
var str = "Welcome, <p>how<b>are you</p>doing</b>?",
arr = str.split(/([,\s])/);
Unfortunately the only way I can think about splitting the special cases is replace them with comma's before and after them, but all this does is cause problems trying to keep the original comma. I've been scratching my head at this and I know it's right in front of me. I have tried looking all over for examples or answers and I'm drawing a blank in what I'm trying to look for.
Use .match instead of .split:
var str = "Welcome, <p>how<b>are you</p>doing</b>?",
arr = str.match(/<[^>]+>|[^,<]+,?/g);
console.log(arr);
The pattern <[^>]+>|[^,<]+,? means:
Alternate between
<[^>]+> - Match < followed eventually by a >, or
[^,<]+,? - Match characters other than , and <, optionally followed by a ,

Is it possible to make 2 replace operations in 1 Regex expression?

I have text data like below
aaa
bbb
ccc
And I should return 'aaa','bbb','ccc'
Currently I achieve this in 2 steps:
Replace [\r\n]+ with ','
Replace ($|^) with '
Is it possible to do the same via only 1 step?
Ok, if this is just for curiosity, you can do this with a function replacement value. I would never suggest actually using it for this purpose, but this should do it:
const s = `aaa
bbb
ccc`
const s2 = s.replace(/$|^|\n/g, (s) => s === '\n' ? "','" : "'")
console.log(s2)
But of course, in some sense, I'm cheating, not really using the RegEx fundamentals here, only putting the logic in a callback function.
So, yes, this can be done.
But please don't use this anywhere.
(Oh, and I simplified to only checking for '\n'. Obviously you could extend it to your [\r\n]+ as you like.)
You may use a more robust solution with the same approach as Scott suggested in his answer, but my suggestion is based on capturing groups that will be used to decide which replacement logic to use rather than listing all possible chars you need to replace.
So, here is the slightly modified solution:
const s = `aaa
bbb
ccc`
const s2 = s.replace(/$|^|([\r\n]+)/g, ($0,$1) => $1 ? "','" : "'")
console.log(s2)
POIs:
/$|^|([\r\n]+)/g pattern matches multiple occurrences of start or end of string or matches and captures 1+ occurrences of CRs or LFs into Group 1
Now, with ($0,$1), we have access to the whole match and Group 1 ($1)
If Group 1 is not undefined, if it matched, we replace with ',', else, we replace with ' ($1 ? "','" : "'").
No. At a time only one replacement keyword can be used.

Regex with multiple start and end characters that must be the same

I would like to be able to search for strings inside a special tag in a string in JavaScript. Strings in JavaScript can start with either " or ' character.
Here an example to illustrate what I want to do. My custom tag is called <my-tag. My regex is /('|")*?<my-tag>((.|\n)[^"']*?)<\/my-tag>*?('|")/g. I use this regex pattern on the following strings:
var a = '<my-tag>Hello World</my-tag>'; //is found as expected
var b = "<my-tag>Hello World" + '</my-tag>'; //is NOT found, this is good!
var c = "<my-tag>Hello World</my-tag>"; //is found as expected
var d = '<my-tag>something "special"</my-tag>'; //here the " char causes a problem
var e = "<my-tag>something 'special'</my-tag>"; //here the " char causes a problem
It works well with a and also c where it finds the tag with the containing text. It also does not find the text in b which is what I want. But in case d and e the tag with content is not found due to the occurrence of the " and ' character. What I want is a regex where inside the tag " is allowed if the string is start with ', and vice versa.
Is it possible to achieve this with one regex, or is the only thing I can do is to work with two separate regex expressions like
/(")*?<my-tag>((.|\n)[^']*?)<\/my-tag>*?(")/g and /(')*?<my-tag>((.|\n)[^"]*?)<\/my-tag>*?(')/g ?
It's not pretty, but I think this would work:
/("<my-tag>((.|\n)[^"]*?)<\/my-tag>"|'<my-tag>((.|\n)[^']*?)<\/my-tag>')/g
You should be able to use de match from the first match ('|") and reuse it for the second match. Something like the following:
/('|")<my-tag>.*?<\/my-tag>\1/g
This should make sure to match the same character at the beginning and the end.
But you really shouldn't use regex for parsing HTML.

Javascript Regex match everything after last occurrence of string

I am trying to match everything after (but not including!) the last occurrence of a string in JavaScript.
The search, for example, is:
[quote="user1"]this is the first quote[/quote]\n[quote="user2"]this is the 2nd quote and some url https://www.google.com/[/quote]\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
Edit: I'm looking to match everything after the last quote block. So I was trying to match everything after the last occurrence of "quote]" ? Idk if this is the best solution but its what i've been trying.
I'll be honest, i suck at this Regex stuff.. here is what i've been trying with the results..
regex = /(quote\].+)(.*)/ig; // Returns null
regex = /.+((quote\]).+)$/ig // Returns null
regex = /( .* (quote\]) .*)$/ig // Returns null
I have made a JSfiddle for anyone to have a play with here:
https://jsfiddle.net/au4bpk0e/
One option would be to match everything up until the last [/quote], and then get anything following it. (example)
/.*\[\/quote\](.*)$/i
This works since .* is inherently greedy, and it will match every up until the last \[\/quote\].
Based on the string you provided, this would be the first capturing group match:
\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
But since your string contains new lines, and . doesn't match newlines, you could use [\s\S] in place of . in order to match anything.
Updated Example
/[\s\S]*\[\/quote\]([\s\S]*)$/i
You could also avoid regex and use the .lastIndexOf() method along with .slice():
Updated Example
var match = '[\/quote]';
var textAfterLastQuote = str.slice(str.lastIndexOf(match) + match.length);
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;
Alternatively, you could also use .split() and then get the last value in the array:
Updated Example
var textAfterLastQuote = str.split('[\/quote]').pop();
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;

How not to match a certain regexp in javascript?

I want to check if a variable do not match this regexp:
DEMO
So this is the pattern that match the regexp in my code:
rxAfterPrint = new RegExp(/^ *\+ *("(?:[^"]*)"|(?:[a-zA-Z]\w*)) *(.*)$/);
and in this way I check for matching:
var t2 = t[2].match(rxAfterPrint);
and now I want to create e varible t3 that dont match this pattern
How can I do this? can you please help me?
(Admitting I have an unfair advantage because I knew why this problem did arise: How can I interpret strings in textarea with JavaScript/jQuery?)
So my guess is you want to implement String concatenation as part of a print statement as follows:
<string> ::= '"' <character>* '"' | <variable>
<print> ::= 'print' <string> ('+' <string>)*
<print> ::= 'print' (<string> '+')* <string>
The two <print> actually express the same, using the 2nd version you can first (after matching /^ *print */) try to apply the pattern rxConcat as many times a possible and if this doesn't match, then you apply the 2nd expression rxStringValEOL to match the remainder (if no success, it's an invalid statement):
rxConcat = new RegExp(/ *(?:"([^"]*)"|([a-zA-Z]\w*)) *\+ */);
rxStringValEOL = new RegExp(/ *(?:"([^"]*)"|([a-zA-Z]\w*)) *$/);
This also shows that it is pretty difficult to design a language that is easy for the programmers and for those who write the compilers.
It's really unclear what you mean by "I want to create a variable that don't match this pattern". Since t2 is your match, it seems like you want t3 to be objects that don't match.
Because you're anchoring to the start of the string (^), this is a really great place to use a negative lookahead with almost the identical regex. Literally, all I did was surround it with (?! and ) and .* at the end..
output1.value = input.value.match(/^(?! *\+ *("(?:[^"]*)"|(?:[a-zA-Z]\w*)) *(.*)).*$/gm).join("\r\n")
An alternative is to use replace() like so, but I would believe match() is the better option.
output2.value = input.value.replace(/(^ *\+ *("(?:[^"]*)"|(?:[a-zA-Z]\w*)) *(.*)$\s*)+/gm,"")
For both cases, I added the global and multiline to easily test several lines at once. If you're only testing one, remove both the g and the m, otherwise it could cause bugs by incorrectly telling you a string passed or failed when it didn't.
Demo: JSFiddle

Categories

Resources