Escape a white space character using Javascript - javascript

I have the following jquery statement. I wish to remove the whitespace as shown below. So if I have a word like:
For example
#Operating/System I would like the
end result to show me
#Operating\/System. (ie with a
escape sequence).
But if I have #Operating/System test then I want to show
#Operating\/System + escape
sequence for space. The .replace(/ /,'')
part is incorrect but .replace("/","\\/") works
well as per my requirements.
Please help!
$("#word" + lbl.eq(i).text().replace("/","\\/").replace(/ /,'')).hide();

$( "#word" + lbl.eq(i).text().replace(/([ /])/g, '\\$1') ).hide();
This matches all spaces and slashes in a string (and saves the respective char in group $1):
/([ /])/g
replacement with
'\\$1'
means a backslash plus the original char in group $1.
"#Operating/System test".replace(/([ /])/g, '\\$1');
-->
"#Operating\/System\ test"
Side advantage - there is only a singe call to replace().
EDIT: As requested by the OP, a short explanation of the regular expression /([ /])/g. It breaks down as follows:
/ # start of regex literal
( # start of match group $1
[ /] # a character class (spaces and slashes)
) # end of group $1
/g # end of regex literal + "global" modifier
When used with replace() as above, all spaces and slashes are replaced with themselves, preceded by a backslash.

Related

Regular Expression to match text between # and only if # is not preceded by '

Hello I'm trying to find a regular expression that can help me find all matches inside a string when they're inside # and only if # are not preceded by an apostrophe "'".
Basically I need to bold the text just as here when we use double * to bold text like this, but the apostrophe should work as an escape character.
For example
#Hello my name is Noé# should look like Hello my name is Noé
#Hello this has an escape apostrophe '# so I'll match until here# should look like Hello this has an escape apostrophe '# so I'll match until here
Inside a long text there might or might not be several matches:
"Hello I'm a text #I'm bold#, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
So i can print it like
"Hello I'm a text I'm bold, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
If thats not possible with a RegExp I could program a finite state machine, but I was hoping I was possible, thank you in advance God bless you!
Note: I will handle the escape characters later by now I just need to know how to mach this
/(?<!')#.*(?<!')#/gim
This was the only thing I could come up with, but honestly, I have no idea how negative look behind works :(, with this regexp it would match wrong. For example, if I type:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
matches from the first # occurrence until the last one, like so:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
I think this should work:
(?<!')#(.*?)(?<!')#
Here you can see the regexp working with your examples: https://regex101.com/r/wnguiA/1
(?<!') is Negative Lookbehind, it tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a b that is not preceded by an a.
More easy is the (.*?) that matches any character (except for line terminators); adding ? tells the capturing group to be not-greedy and stop at the first occourence of the succesive token.
To prevent triggering the negatilve lookbehind at all the positions not asserting a ' to the left, you can also first match # and do the assertion after it.
#(?<!'#)(.*?)#(?<!'#)
Regex demo
Another option instead of using the non greedy .*? is to use a negated character class matching any char except #
Then when you encounter # only match it if there is ' before it using a positive lookbehind.
#(?<!'#)([^#\n]*(?:#(?<='#)[^#\n]*)*)#(?<!'#)
#(?<!'#) Match # not directly preceded by '
( Capture group 1
[^#\n]* Optionally match any char except # or a newline
(?: Non capture group
#(?<='#) Match # not directly preceded by '
[^#\n]* Match optional repetitions of any char except # or a newline
)* Close non capture group and optionally repeat it to match all occurrences
) Close group 1
#(?<!'#) Match # not directly preceded by '
Regex demo

Create a regex to extract a string that contain a noral character and escaped string without DOS

I have a string like this:
///////AB?\a\b\c\d\d\e\\f\a\a\b\cd\ed\fmnopqrstuvwxy\z\a\a\a\a\a\a\a\a\a///imgy
it started with /// and ended with ///imgy (i and/or m and/or g and/or y), and between the beginning and end are the character are normal character like a or escaped character like \a.
Here is my regex:
/^\/{3}((?:\\?[\s\S])+?)\/{3}([imgy]{0,4})(?!\w)/
But the problem is that it is reported as "vulnerable to denial-of-service attacks". The main part that has the problem is
(?:\\?[\s\S])+
How can I create a right one that can figure out both a and \a? Thank you!
Regex Demo
Update:
I just found to use the following regex:
(?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?)
to replace the old problematic part (?:\\?[\s\S])+?, and in this way, it can avoid requires exponential time to match certain inputs, and avoid vulnerable to denial-of-service attacks.
The details:
(?:\\[\s\S]+?) match any \a
(?:(?<!\\)[\s\S]+?) match any a, but not following \.
(?:(?<=\\\\)[\s\S]+?) match any a, but much following \\. This to make sure f is matched that following \\.
So the whole regex will look like this:
^\/{3}((?:\\[\s\S]+?)|(?:(?<!\\)[\s\S]+?)|(?:(?<=\\\\)[\s\S]+?))\/{3}([imgy]{0,4})(?!\w)
You might list the characters that are allowed to a character class, and optionally repeat an escaped character [a-z]
^\/{3,}[A-Za-z?]+(?:\\[a-z\\][A-Za-z?]*)*\/\/\/[imgy]{0,4}$
The pattern matches:
^ Start of string
\/{3,}[A-Za-z?]+ Match 3 or more / and 1 or more times any of the listed allowed chars
(?: Non capture group
\\[a-z\\] Match an escaped char a-z or \\
[A-Za-z?]* Optionally match any of the listed
)* Close an optionally repeat the group
\/\/\/[imgy]{0,4} Match /// and 0-4 times any of i m g or y If there should be at least a single char, you can use {1,4}
$ End of string
Regex demo

Regular expression to match line separated size strings

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).
Valid input example:
300x200
50x80
100x100
The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):
^(\d+x\d+[\r\n|\r|\n]*)+$
This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):
300x200
50x80
100x100x200
Adding a word boundary at the end seems to have fixed this issue:
^(\d+x\d+[\r\n|\r|\n]*\b)+$
My questions:
Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:
500x500\n100x100\n\n\n384384
^(\d+x\d+[\r\n|\r|\n]\b)+|[\r\n|\r|\n]$
Isolate the problem with this target 100x100x200
For now, forget about the anchors in the regex.
The minimum regex is \d+x\d+ since it only has to be satisfied once
for a match to take place.
The maximum is something like this \d+x\d+ (?: (?:\r?\n | \r)* \d+x\d+ )*
Since \r?\n|\r is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*
The result, when you applied to the target string is:
100x100x200 matches.
But, since you've anchored the regex ^$, it is forced to break up
the middle 100 to make it match.
100x10 from \d+x\d+
0x200 from (?: \d+x\d+ )*
So, that is why the first regex seemingly matches 100x100x200.
To avoid all of that, just require a line break between them, and
make the trailing linebreaks optional (if you need to validate the whole
string, otherwise leave it and the end anchor off).
^\d+x\d+(?:(?:\r?\n|\r)+\d+x\d+)*(?:\r?\n|\r)*$
A better view of it
^
\d+ x \d+
(?:
(?: \r? \n | \r )+
\d+ x \d+
)*
(?: \r? \n | \r )*
$
Your initial regular expression "fails" because of the +:
^(\d+x\d+[\r\n|\r|\n]*)+$
-----------------------^ here
Your parenthesis pattern (\d+x\d+[\r\n|\r|\n]*) says match one or more number followed by an "x" followed by one or more number followed by zero or more newlines. The + after that says match one or more of the entire parenthesis pattern, which means that for an input like 100x200x300 your pattern matches 100x200 and then 200x300, so it looks like it matches the entire line.
If you're simply trying to extract dimensions from a newline-separated string, I would use the following regular expression with a multiline flag:
^(\d+x\d+)$
https://regex101.com/r/H9JDjA/2
Side note: In your expression, [\r\n|\r|\n] is actually saying match any one instance of \r, \n, |, \r, |, or \n (i.e. it's quite redundant, and you probably aren't meaning to match |). If you want to match a sequential set of any combination of \r or \n, you can simply use [\r\n]+.
You can use multiline modifier, which should make life easier:
var input = "\n\
300x200x400\n\
50x80\n\
\n\
\n\
300x200\n\
50x80\n\
100x100x200x100\n";
var allSizes = input.match(/^\d+x\d+/gm); // multiline modifier assumes each line has start and end
for (var size in allSizes)
console.log(allSizes[size]);
Prints:
300x200
50x80
300x200
50x80
100x100
Try this regex out
^[0-9]{1,4}x[0-9]{1,4}|[(\r\n|\r|\n)]+$
It'll match these inputs.
1x1
10x10
100x100
2000x2938
\n
\r
\r\n
but not this 100x100x200

Match everything but not quoted strings

I want to match everything but no quoted strings.
I can match all quoted strings with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
So I tried to match everything but no quoted strings with this: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/ but it doesn't work.
I would like to use only regex because I will want to replace it and want to get the quoted text after it back.
string.replace(regex, function(a, b, c) {
// return after a lot of operations
});
A quoted string is for me something like this "bad string" or this 'cool string'
So if I input:
he\'re is "watever o\"k" efre 'dder\'4rdr'?
It should output this matches:
["he\'re is ", " efre ", "?"]
And than I wan't to replace them.
I know my question is very difficult but it is not impossible! Nothing is impossible.
Thanks
EDIT: Rewritten to cover more edge cases.
This can be done, but it's a bit complicated.
result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g);
will return
, he said.
, she replied.
, he reminded her.
,
from this string (line breaks added and enclosing quotes removed for clarity):
"Hello", he said. "What's up, \"doc\"?", she replied.
'I need a 12" crash cymbal', he reminded her.
"2\" by 4 inches", 'Back\"\'slashes \\ are OK!'
Explanation: (sort of, it's a bit mindboggling)
Breaking up the regex:
(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*
'
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*
'
)*
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*
"
(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*
"
)*
(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*
$
)
(?:\\.|[^\\'"]) # Match text between quoted sections
)+
First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:
(?= # Assert that the following can be matched:
(?: # Match this group:
(?: # Match either:
\\. # an escaped character
| # or
"(?:\\.|[^"\\])*" # a double-quoted string
| # or
[^\\'"] # any character except backslashes or quotes
)* # any number of times.
' # Then match a single quote
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*' # Repeat once to ensure even number,
# (but don't allow single quotes within nested double-quoted strings)
)* # Repeat any number of times including zero
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* # Then match the same until...
$ # ... end of string.
) # End of lookahead assertion.
The double quotes part works the same.
Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:
(?: # Match either
\\. # an escaped character
| # or
[^\\'"] # any character except backslash, single or double quote
) # End of non-capturing group
The whole thing is repeated once or more, as many times as possible. The /g modifier makes sure we get all matches in the string.
See it in action here on RegExr.
Here is a tested function that does the trick:
function getArrayOfNonQuotedSubstrings(text) {
/* Regex with three global alternatives to section the string:
('[^'\\]*(?:\\[\S\s][^'\\]*)*') # $1: Single quoted string.
| ("[^"\\]*(?:\\[\S\s][^"\\]*)*") # $2: Double quoted string.
| ([^'"\\]*(?:\\[\S\s][^'"\\]*)*) # $3: Un-quoted string.
*/
var re = /('[^'\\]*(?:\\[\S\s][^'\\]*)*')|("[^"\\]*(?:\\[\S\s][^"\\]*)*")|([^'"\\]*(?:\\[\S\s][^'"\\]*)*)/g;
var a = []; // Empty array to receive the goods;
text = text.replace(re, // "Walk" the text chunk-by-chunk.
function(m0, m1, m2, m3) {
if (m3) a.push(m3); // Push non-quoted stuff into array.
return m0; // Return this chunk unchanged.
});
return a;
}
This solution uses the String.replace() method with a replacement callback function to "walk" the string section by section. The regex has three global alternatives, one for each section; $1: single quoted, $2: double quoted, and $3: non-quoted substrings, Each non-quoted chunk is pushed onto the return array. It correctly handles all escaped characters, including escaped quotes, both inside and outside quoted strings. Single quoted substrings may contain any number of double quotes and vice-versa. Illegal orphan quotes are removed and serve to divide a non-quoted section into two chunks. Note that this solution requires no lookaround and requires only one pass. It also implements Friedl's "Unrolling-the-Loop" efficiency technique and is quite efficient.
Additional: Here is some code to test the function with the original test string:
// The original test string (with necessary escapes):
var s = "he\\'re is \"watever o\\\"k\" efre 'dder\\'4rdr'?";
alert(s); // Show the test string without the extra backslashes.
console.log(getArrayOfNonQuotedSubstrings(s).toString());
You can't invert a regex. What you have tried was making a character class out of it and invert that - but also for doing that you would have to escape all closing brackets "\]".
EDIT: I would have started with
/(^|" |' ).+?($| "| ')/
This matches anything between the beginning or the end of a quoted string (very simple: a quotation mark plus a blank) and the end of the string or the start of a quoted string (a blank plus a quotation mark). Of course this doesn't handle any escape sequences or quotations which don't follow the scheme / ['"].*['"] /. See above answers for more detailed expressions :-)

Why this Regex, matches incorrect characters?

I need to match these characters. This quote is from an API documentation (external to our company):
Valid characters: 0-9 A-Z a-z & # - . , ( ) / : ; ' # "
I used this Regex to match characters:
^[0-9a-z&#-\.,()/:;'""#]*$
However, this wrongly matches characters like %, $, and many other characters. What's wrong?
You can test this regular expression online using http://regexhero.net/tester/, and this regular expression is meant to work in both .NET and JavaScript.
You are not escaping the dash -, which is a reserved character. If you add replace the dash with \- then the regex no longer matches those characters between # and \
Move the literal - to the front of the character set:
^[-0-9a-z&#\.,()/:;'""#]*$
otherwise it is taken as specifying a range like when you use it in 0-9.
- sign, when not escaped, has special meaning in square brackets. #-\. is transformed into #-. (BTW, backslash before dot is not necessary in square brackets), which means "any character between # (ASCII 0x23) and . (ASCII 0x2E). The correct notation is
^[0-9a-z&#\-.,()/:;'"#]*$
The special characters in a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).
As such, you should either escape them with a backslash (\), or put them in a position where there is no ambiguity and they do not need escaping. In the case of a hyphen, this would be the first or last position.
You also do not need to escape the dot (.).
Your regex thus becomes:
^[-0-9a-z&#.,()/:;'"#]*$
As a side note, there are many available regex evaluators which provide code hinting. This way, you can simply hover your mouse over your regular expression and it can be explained in English words.
One such free one is RegExr.
Typing your original regex in it and hovering over the hyphen shows:
Matches characters in the range '#-\'
Try that
^[0-9a-zA-Z\&\#\-\.\,\(\)\/\:\;\'\"\#]*$

Categories

Resources