regexp problem, the dot selects all text - javascript

I use some jquery to highlight search results. For some reason if i enter a basis dot, all of the text get selected. I use regex and replace to wrap the results in a tag to give the found matches a color.
the code that i use
var pattern = new.RegExp('('+$.unique(text.split(" ")).join("|")+")","gi");
how can i prevent that the dot selects all text, so i want to leave the point out of the code(the dot has no power)

You may be able to get there by doing this:
var pattern = new.RegExp('('+$.unique(text.replace('.', '\\.').split(" ")).join("|")+")","gi");
The idea here is that you're attempting to escape the period, which acts as a wild card in regex.

This will replace all special RegExp characters (except for | since you're using that to join the terms) with their escaped version so you won't get unwanted matches or syntax errors:
var str = $.unique(text.split(" ")).join("|"),
pattern;
str = str.replace(/[\\\.\+\*\?\^\$\[\]\(\)\{\}\/\'\#\:\!\=]/ig, "\\$&");
pattern = new RegExp('('+str+')', 'gi');

The dot is supposed to match all text (almost everything, really). If you want to match a period, you can just escape it as \..

If you have a period in your RegExp it's supposed to match any character besides newline characters. If you don't want that functionality you need to escape the period.
Example RegExp with period escaped /word\./

You need to escape the text you're putting into the regex, so that special characters don't have unwanted meanings. My code is based on some from phpjs.org:
var words = $.unique(text.split(" ")).join("|");
words = words.replace(/[.\\+*?\[\^\]$(){}=!<>|:\\-]/h, '\\$&'); // escape regex special chars
var pattern = new RegExp('(' + words + ")","gi");
This escapes the following characters: .\+*?[^]$(){}=!<>|:- with a backslash \ so you can safely insert them into your new RegExp construction.

Related

regex custom lenght but no whitespace allowed [duplicate]

I have a username field in my form. I want to not allow spaces anywhere in the string. I have used this regex:
var regexp = /^\S/;
This works for me if there are spaces between the characters. That is if username is ABC DEF. It doesn't work if a space is in the beginning, e.g. <space><space>ABC. What should the regex be?
While you have specified the start anchor and the first letter, you have not done anything for the rest of the string. You seem to want repetition of that character class until the end of the string:
var regexp = /^\S*$/; // a string consisting only of non-whitespaces
Use + plus sign (Match one or more of the previous items),
var regexp = /^\S+$/
If you're using some plugin which takes string and use construct Regex to create Regex Object i:e new RegExp()
Than Below string will work
'^\\S*$'
It's same regex #Bergi mentioned just the string version for new RegExp constructor
This will help to find the spaces in the beginning, middle and ending:
var regexp = /\s/g
This one will only match the input field or string if there are no spaces. If there are any spaces, it will not match at all.
/^([A-z0-9!##$%^&*().,<>{}[\]<>?_=+\-|;:\'\"\/])*[^\s]\1*$/
Matches from the beginning of the line to the end. Accepts alphanumeric characters, numbers, and most special characters.
If you want just alphanumeric characters then change what is in the [] like so:
/^([A-z])*[^\s]\1*$/

Split string on spaces except for in quotes, but include incomplete quotes

I am trying to split a string in JS on spaces except when the space is in a quote. However, an incomplete quote should be maintained. I'm not skilled in regex wizardry, and have been using the below regex:
var list = text.match(/[^\s"]+|"([^"]*)"/g)
However, if I provide input like sdfj "sdfjjk this will become ["sdfj","sdfjjk"] rather than ["sdfj",""sdfjjk"].
You can use
var re = /"([^"]*)"|\S+/g;
By using \S (=[^\s]) we just drop the " from the negated character class.
By placing the "([^"]*)" pattern before \S+, we make sure substrings in quotes are not torn if they come before. This should work if the string contains well-paired quoted substrings and the last is unpaired.
Demo:
var re = /"([^"]*)"|\S+/g;
var str = 'sdfj "sdfjjk';
document.body.innerHTML = JSON.stringify(str.match(re));
Note that to get the captured texts in-between quotes, you will need to use RegExp#exec in a loop (as String#match "drops" submatches).
UPDATE
No idea what downvoter thought when downvoting, but let me guess. The quotes are usually used around word characters. If there is a "wild" quote, it is still a quote right before/after a word.
So, we can utilize word boundaries like this:
"\b[^"]*\b"|\S+
See regex demo.
Here, "\b[^"]*\b" matches a " that is followed by a word character, then matches zero or more characters other than " and then is followed with a " that is preceded with a word character.
Moving further in this direction, we can make it as far as:
\B"\b[^"\n]*\b"\B|\S+
With \B" we require that " should be preceded with a non-word character, and "\B should be followed with a non-word character.
See another regex demo
A lot depends on what specific issue you have with your specific input!
Try the following:
text.match(/".*?"|[^\s]+/g).map(s => s.replace(/^"(.*)"$/, "$1"))
This repeatedly finds either properly quoted substrings (first), OR other sequences of non-whitespace. The map part is to remove the quotes around the quoted substrings.
> text = 'abc "def ghi" lmn "opq'
< ["abc", "def ghi", "lmn", ""opq"]

Regex for both newline and backslash for Replace function

I am using a replace function to escape some characters (both newline and backslash) from a string.
Here is my code:
var str = strElement.replace(/\\/\n/g, "");
I am trying to use regex, so that I can add more special characters if needed. Is this a valid regex or can someone tell me what am I doing wrong here?
You're ending the regex early with an unescaped forward slash. You also want to use a set to match individual characters. Additionally you might want to add "\r" (carriage return) in as well as "\n" (new line).
This should work:
var str = strElement.replace(/[\\\n\r]/g, "");
This is not a valid regex as the slash is a delimiter and ends the regex. What you probably wanted is the pipe (|), which is an alternation:
var str = strElement.replace(/\\|\n/g, "");
In case you need to extend it in the future it may be helpful to use a character class to improve readability:
var str = strElement.replace(/[\\\nabcx]/g, "");
A character class matches a single character from it's body.
This should work. The regular expression replaces both the newline characters and the backslashes in escaped html text:
var str = strElement.replace(/\\n|\\r|\\/g, '');

JS & Regex: how to replace punctuation pattern properly?

Given an input text such where all spaces are replaced by n _ :
Hello_world_?. Hello_other_sentenc3___. World___________.
I want to keep the _ between words, but I want to stick each punctuation back to the last word of a sentence without any space between last word and punctuation. I want to use the the punctuation as pivot of my regex.
I wrote the following JS-Regex:
str = str.replace(/(_| )*([:punct:])*( |_)/g, "$2$3");
This fails, since it returns :
Hello_world_?. Hello_other_sentenc3_. World_._
Why it doesn't works ? How to delete all "_" between the last word and the punctuation ?
http://jsfiddle.net/9c4z5/
Try the following regex, which makes use of a positive lookahead:
str = str.replace(/_+(?=\.)/g, "");
It replaces all underscores which are immediately followed by a punctuation character with the empty string, thus removing them.
If you want to match other punctuation characters than just the period, replace the \. part with an appropriate character class.
JavaScript doesn't have :punct: in its regex implementation. I believe you'd have to list out the punctuation characters you care about, perhaps something like this:
str = str.replace(/(_| )+([.,?])/g, "$2");
That is, replace any group of _ or space that is immediately followed by punctation with just the punctuation.
Demo: http://jsfiddle.net/9c4z5/2/

Javascript word boundary unicode space issue

I want to write a regex pattern that matches for full words or phrases even if they have unicode chars to wrap them with some html code. So I use this pattern:
var pattern=new RegExp('(^|\\s)'+phrase+'(?=\\s|$)', "gi");
It works perfectly even on multi-word phrases expect for one issue. If the phrase isn't the start of the string, it matches with the space before the word. So after I wrap it I'll lose that space. I only want to wrap the phrase variable and not the spaces.
For example:
var string="This is a nice sentence.";
var phrase="is a nice";
/*OUTPUT: Thisis a nicesentence*//*HTML OUTPUT: This<span>is a nice</span>sentence*/
/*What I want: This <span>is a nice</span> sentence*/
Of course this pattern could work:
var pattern=new RegExp(phrase, "gi");
But I'm not looking for those strings that are substrings of another.
Is it possible to solve my issue with a better regex pattern?
Simply write back what you captured in group 1:
output = string.replace(pattern, '$1<span>' + phrase + '</span>');
If you are not using replace but match or exec and do the replacement manually, you can still access the capturing group in the returned array and insert the space or empty string before your span.
By the way, if you capture the phrase as well, you don't need any string concatenation in the replacement:
var pattern = new RegExp('(^|\\s)('+phrase+')(?=\\s|$)', "gi");
output = string.replace(pattern, '$1<span>$2</span>');

Categories

Resources