How can I use regex to match a string without double characters - javascript

I've got a block of text that I want to run some regex on in javascript, to match [code]. I know I can use /\[code\]/g to do this.
However, I want to ignore cases where double brackets are used, as in[[code]]. So in other words, in the string [code] [[code]] [code], only the first and last occurrences should match.
Is this possible?
http://regexr.com/395kr

JS does not support negative lookbehind assertions, but seems like the negative lookahead is enough in your case:
'[code] [[code]] [code]'.match(/\[code\](?!\])/g)
This regex ensures that the next character after matched [code] is not a ]
UPD:
It could be improved to
'[code] [[code]] [code]'.match(/\[(?!\[)code\](?!\])/g)
thanks to Felix Kling.
A note: it will behave weird in case of unpaired braces.

You can use filter() for this:
var data = "[code] [[code]] [code]";
data = data.match(/\[+code\]+/g) // gives us ["[code]", "[[code]]", "[code]"]
.filter(function(x) { return !x.match(/\[\[code\]\]/); });
// data is now ["[code]", "[code]"]
Felix's method also works nicely.

Related

Search for a string with a point in JavaScript

When I try to seach for a string with a point (i.e. '1.'), js also points to substrings with commas instead of points. It's better to look at the example:
'1,'.search('1.'); // 0
'xxx1,xxx'.search('1.'); // 3
// Normal behaviour
'1.'.search('1,'); // -1
Does anyone know why JavaScript behave itself so?
Is there a way to search for exactly passed string?
Per the docs:
The search() method executes a search for a match between a regular expression and this String object.
. has a special meaning in Regular Expressions. You need to escape the . before matching it. Try the following:
console.log('xxx1,xxx'.search('1\\.'));
Use indexOf().
let str = "abc1,231.4";
console.log(str.indexOf("1."));
indexOf() method should work fine in this case
'1,'.indexOf('1.');
The above code should return -1
String.search is taking regex as parameter.
Regex evaluate . by any character; you gotta escape it using double anti-slash \\.
console.log('1,'.search('1\\.'));
console.log('xxx1,xxx'.search('1\\.'));
console.log('xxx1.xxx'.search('1\\.'));
console.log('1.'.search('1,'));

Why would the replace with regex not work even though the regex does?

There may be a very simple answer to this, probably because of my familiarity (or possibly lack thereof) of the replace method and how it works with regex.
Let's say I have the following string: abcdefHellowxyz
I just want to strip the first six characters and the last four, to return Hello, using regex... Yes, I know there may be other ways, but I'm trying to explore the boundaries of what these methods are capable of doing...
Anyway, I've tinkered on http://regex101.com and got the following Regex worked out:
/^(.{6}).+(.{4})$/
Which seems to pass the string well and shows that abcdef is captured as group 1, and wxyz captured as group 2. But when I try to run the following:
"abcdefHellowxyz".replace(/^(.{6}).+(.{4})$/,"")
to replace those captured groups with "" I receive an empty string as my final output... Am I doing something wrong with this syntax? And if so, how does one correct it, keeping my original stance on wanting to use Regex in this manner...
Thanks so much everyone in advance...
The code below works well as you wish
"abcdefHellowxyz".replace(/^.{6}(.+).{4}$/,"$1")
I think that only use ()to capture the text you want, and in the second parameter of replace(), you can use $1 $2 ... to represent the group1 group2.
Also you can pass a function to the second parameter of replace,and transform the captured text to whatever you want in this function.
For more detail, as #Akxe recommend , you can find document on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace.
You are replacing any substring that matches /^(.{6}).+(.{4})$/, with this line of code:
"abcdefHellowxyz".replace(/^(.{6}).+(.{4})$/,"")
The regex matches the whole string "abcdefHellowxyz"; thus, the whole string is replaced. Instead, if you are strictly stripping by the lengths of the extraneous substrings, you could simply use substring or substr.
Edit
The answer you're probably looking for is capturing the middle token, instead of the outer ones:
var str = "abcdefHellowxyz";
var matches = str.match(/^.{6}(.+).{4}$/);
str = matches[1]; // index 0 is entire match
console.log(str);

why this regexp returns match?

http://jsfiddle.net/sqee98xr/
var reg = /^(?!managed).+\.coffee$/
var match = '20150212214712-test-managed.coffee'.match(reg)
console.log(match) // prints '20150212214712-test-managed.coffee'
I want to match regexp only if there is not word "managed" present in a string - how I can do that?
Negative lookaheads are weird. You have to match more than just the word you are looking for. It's weird, I know.
var reg = /^(?!.*managed).+\.coffee$/
http://jsfiddle.net/sqee98xr/3/
EDIT: It seems I really got under some people's skin with the "weird" descriptor and lay description. It's weird because on a surface level the term "negative lookahead" implies "look ahead and make sure the stuff in these parenthesis isn't up there, then come back and continue matching". As a lover of regex, I still proclaim this naming is weird, especially to first time users of the assertion. To me it's easier to think of it as a "not" operator as opposed to something which actually crawls forward and "looks ahead". In order to get behavior to resemble an actual "look ahead", you have to match everything before the search term, hence the .*.
An even easier solution would have been to remove the start-of-string (^) assertion. Again, to me it's easier to read ?! as "not".
var reg = /(?!managed).+\.coffee$/
While #RyanWheale's solution is correct, the explanation isn't correct. The reason essentially is that a string that contains the word "managed" (such as "test-managed" ) can count as not "managed". To get an idea of this first lets look at the regular expression:
/^(?!managed).+\.coffee$/
// (Not "managed")(one or more characters)(".")("coffee")
So first we cannot have a string with the text "managed", then we can have one or more characters, then a dot, followed by the text "coffee". Here is an example that fulfills this.
"Hello.coffee" [ PASS ]
Makes sense, "Hello" certainly is not "managed". Here is another example that works from your string:
"20150212214712-test-managed.coffee" [ PASS ]
Why? Because "20150212214712-test-managed" is not the string "managed" even though it contains the string, the computer does not know that's what you mean. It thinks that "20150212214712-test-managed" as a string that isn't "managed" in the same way "andflaksfj" isn't "managed". So the only way it fails is if "managed" was at the start of the string:
"managed.coffee" [ FAIL ]
This isn't just because the text "managed" is there. Say the computer said that "managed." was not "managed". It would indeed pass the (?!managed) part but the rest of the string would just be coffee and it would fail because there is no ".".
Finally the solution to this is as suggested by the other answer:
/^(?!.*managed).+\.coffee$/
Now the string "20150212214712-test-managed.coffee" fails because no matter how it's looked at: "test-managed", "-managed", "st-managed", etc. Would still count as (?!.*managed) and fail. As in the example above this one it could try adding a sub-string from ".coffee", but as explained this would cause the string to fail in the rest of the regexp ( .+\.coffee$ ).
Hopefully this long explanation explained that Negative look-aheads are not weird, just takes your request very literally.

Javascript regexp lets undesirable characters

I'm using a regExp in my project but some how I'm getting some undesirable characters
my RegExp looks like this:
new RegExp("[א-ת,A-z,',','(',')','.','-',''']");
which supposed to avoid characters like \ or []
but let my use one and more from (,),-,alphabets etc.
Unfortunately it doesnt happen
Which pattren includes both desirable and undesirable characters??
thanks for your help
Well your regular expression just says to match one "good" character (and incorrectly at that).
I think something closer to this would be what you want, though I'm not sure about the higher-page UTC characters:
var regexp = /^[א-תA-Za-z,()\-']*$/;
If the alefbet part doesn't work (it looks backwards to me, but I guess that's kind of a conundrum :-), try:
var regexp = /^[\u05DA-\05EAA-Za-z,()\-']*$/;
Might be good to tack an "i" (ignore case) modifier on the end too:
var regexp = /^[\u05DA-\05EAA-Za-z,()\-']*$/i;
This also does not handler the various diacritical marks; I don't know if you need those matched or not.
First of all, you don't need all those single quotes and commas. Second, you want A-Za-z, not.A-z. The latter includes ASCII characters between "Z" and "a".
var re = new RegExp("[א-תA-Za-z,()\.'\s-]");

Regex to match all instances not inside quotes

From this q/a, I deduced that matching all instances of a given regex not inside quotes, is impossible. That is, it can't match escaped quotes (ex: "this whole \"match\" should be taken"). If there is a way to do it that I don't know about, that would solve my problem.
If not, however, I'd like to know if there is any efficient alternative that could be used in JavaScript. I've thought about it a bit, but can't come with any elegant solutions that would work in most, if not all, cases.
Specifically, I just need the alternative to work with .split() and .replace() methods, but if it could be more generalized, that would be the best.
For Example:
An input string of: +bar+baz"not+or\"+or+\"this+"foo+bar+
replacing + with #, not inside quotes, would return: #bar#baz"not+or\"+or+\"this+"foo#bar#
Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+.
The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:
\+(?=([^"]*"[^"]*")*[^"]*$)
Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]* , which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*"). Combined, you arrive at
\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
I admit it is a little cryptic. =)
Azmisov, resurrecting this question because you said you were looking for any efficient alternative that could be used in JavaScript and any elegant solutions that would work in most, if not all, cases.
There happens to be a simple, general solution that wasn't mentioned.
Compared with alternatives, the regex for this solution is amazingly simple:
"[^"]+"|(\+)
The idea is that we match but ignore anything within quotes to neutralize that content (on the left side of the alternation). On the right side, we capture all the + that were not neutralized into Group 1, and the replace function examines Group 1. Here is full working code:
<script>
var subject = '+bar+baz"not+these+"foo+bar+';
var regex = /"[^"]+"|(\+)/g;
replaced = subject.replace(regex, function(m, group1) {
if (!group1) return m;
else return "#";
});
document.write(replaced);
Online demo
You can use the same principle to match or split. See the question and article in the reference, which will also point you code samples.
Hope this gives you a different idea of a very general way to do this. :)
What about Empty Strings?
The above is a general answer to showcase the technique. It can be tweaked depending on your exact needs. If you worry that your text might contain empty strings, just change the quantifier inside the string-capture expression from + to *:
"[^"]*"|(\+)
See demo.
What about Escaped Quotes?
Again, the above is a general answer to showcase the technique. Not only can the "ignore this match" regex can be refined to your needs, you can add multiple expressions to ignore. For instance, if you want to make sure escaped quotes are adequately ignored, you can start by adding an alternation \\"| in front of the other two in order to match (and ignore) straggling escaped double quotes.
Next, within the section "[^"]*" that captures the content of double-quoted strings, you can add an alternation to ensure escaped double quotes are matched before their " has a chance to turn into a closing sentinel, turning it into "(?:\\"|[^"])*"
The resulting expression has three branches:
\\" to match and ignore
"(?:\\"|[^"])*" to match and ignore
(\+) to match, capture and handle
Note that in other regex flavors, we could do this job more easily with lookbehind, but JS doesn't support it.
The full regex becomes:
\\"|"(?:\\"|[^"])*"|(\+)
See regex demo and full script.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
You can do it in three steps.
Use a regex global replace to extract all string body contents into a side-table.
Do your comma translation
Use a regex global replace to swap the string bodies back
Code below
// Step 1
var sideTable = [];
myString = myString.replace(
/"(?:[^"\\]|\\.)*"/g,
function (_) {
var index = sideTable.length;
sideTable[index] = _;
return '"' + index + '"';
});
// Step 2, replace commas with newlines
myString = myString.replace(/,/g, "\n");
// Step 3, swap the string bodies back
myString = myString.replace(/"(\d+)"/g,
function (_, index) {
return sideTable[index];
});
If you run that after setting
myString = '{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}';
you should get
{:a "ab,cd, efg"
:b "ab,def, egf,"
:c "Conjecture"}
It works, because after step 1,
myString = '{:a "0", :b "1", :c "2"}'
sideTable = ["ab,cd, efg", "ab,def, egf,", "Conjecture"];
so the only commas in myString are outside strings. Step 2, then turns commas into newlines:
myString = '{:a "0"\n :b "1"\n :c "2"}'
Finally we replace the strings that only contain numbers with their original content.
Although the answer by zx81 seems to be the best performing and clean one, it needes these fixes to correctly catch the escaped quotes:
var subject = '+bar+baz"not+or\\"+or+\\"this+"foo+bar+';
and
var regex = /"(?:[^"\\]|\\.)*"|(\+)/g;
Also the already mentioned "group1 === undefined" or "!group1".
Especially 2. seems important to actually take everything asked in the original question into account.
It should be mentioned though that this method implicitly requires the string to not have escaped quotes outside of unescaped quote pairs.

Categories

Resources