Regex - Match escape characters inside attributes - javascript

Well, I'm not an expert with regex.
My problem is simple, Im trying to match some escape characters from a string which have this format (It is a string before I parse it to a DOM)
<info type="map" name="Double quotes test name" author="Escape < character"></info>
<info type="map" name='Test name with single quotes' author='Escape < character'></info>
As you can see, there are 2 types of properties that I'm trying to match, these are name and author.
I want to convert < character to &lt, however, my patter is not matching properly.
My pattern currently match the whole attribute value. It even matches attributes which aren't author or name.
/(?!author|name\s*=\s*)(?:\'[^']*\')/g
I hope you can bring me a hand with this, thanks for reading and best regards.

You can try matching all < that are not preceeded by either newline, start of string or >:
(?<=[^\n>])<
Check the demo here.
If you want to make sure the < is found within the value of either the name or author attribute, you can use:
(?<=(?:author|name)=(?:"[^<"]+?|'[^<']+?))<
where the < is preceeded by:
(?:author|name): either "author" or "name" keyword
=: equals
(?:"[^<"]+?|'[^<']+?):
"[^<"]+?: a " followed by least amount of characters which don't include " and <
|: or
'[^<']+?: a ' followed by least amount of characters which don't include ' and <
Check the demo here.

Related

regex exact match multiple search words using jQuery

I'm using jQuery. I have to check if a given list of words are in a paragraph or not. I want the exact match of a word or a phrase(whole word match).ie, if i search for 'be' in 'Be a bee', only one match is there. I have done like this.
var searchText="tool,media,be,team";
var regexExactMatch = new RegExp('\^' + searchText.split(",").join("|") + '\$');
if (regexExactMatch.test(item.Name))
{
//Found
}
It is working for one search term, ie, without any comma (eg: media).
But for comma separated search, it will break.
How to do a exact match search for multiple search terms. I'm very very new to regex. Also I have to do the same search for integers and date (MM/dd/yyyy). Thanks in advance.
For full input string match use
new RegExp('^(?:' + searchText.split(",").join("|") + ')$');
^^^ ^
For a whole word search, replace ^ and $ with \b:
new RegExp('\\b(?:' + searchText.split(",").join("|") + ')\\b');
Otherwise, the anchors are applied respectively to the first and last alternatives only (i.e. your regex will look like /^tool|media|be|team$/ looking for tool at the beginning only, media and be anywhere in the string and team only at the end of the string).
Note I am using (?:...) non-capturing group since grouping is only necessary here, not capturing (no storing of the submatch). If you need to access the matched text, you can access the 0th group that equals the whole match.
Also, you do not need those \s before ^ and $, they are not necessary at all and are ignored in the constructor notation since there are no escape sequences like \^ and \$.
Remove the ^ from the beginning and $ from the end of the RegExp. Like this :
var regexExactMatch = new RegExp(searchText.split(",").join("|"));
Reason
^ will set the condition that the matched text need to be at the beginning of the string and $ set the condition that the matched text need to be at the end of the string, which can only happen if there is only that text in the string.

Add HTML tags to this regex string

I'm using a tiny little JS plugin to truncate multiple lines of text on a site I'm working on.
The only problem is that the script is counting HTML tags for example in the character count which is throwing things off a little.
This is how the script currently excludes characters;
regex = /[!-\/:-#\[-`{-~]$/
Which basically just strips out certain punctuation characters.
I've tried changing it to this;
regex = [!-\/:-#\[-`{-~]$<[^>]*>
But, not being too familiar with regex, it didn't seem to work.
If someone could nudge me in the right direction that would be great.
In your initial regex you're looking for single characters that matches the tail of the string - either it be a character, word, line. Note the dollar sign '$'.
regex = /[!-\/:-#\[-`{-~]$/
Now you want to match anything between < and >.
regex = /[!-\/:-#\[-`{-~]$|<[^>]*$/
Note that you'll match: <, <aaaa, <aaaa< until the end of the string that you are matching against.
greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*/
non_greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*?/
If you remove the second '$' - greedy_regex - it will do a greedy match, matching <b>c</b> of a<b>c</b>d. Using the ? as in non_greedy_regex it will match the '` only.

Regular expression to check contains only

EDIT: Thank you all for your inputs. What ever you answered was right.But I thought I didnt explain it clear enough.
I want to check the input value while typing itself.If user is entering any other character that is not in the list the entered character should be rolled back.
(I am not concerning to check once the entire input is entered).
I want to validate a date input field which should contain only characters 0-9[digits], -(hyphen) , .(dot), and /(forward slash).Date may be like 22/02/1999 or 22.02.1999 or 22-02-1999.No validation need to be done on either occurrence or position. A plain validation is enough to check whether it has any other character than the above listed chars.
[I am not good at regular expressions.]
Here is what I thought should work but not.
var reg = new RegExp('[0-9]./-');
Here is jsfiddle.
Your expression only tests whether anywhere in the string, a digit is followed by any character (. is a meta character) and /-. For example, 5x/- or 42%/-foobar would match.
Instead, you want to put all the characters into the character class and test whether every single character in the string is one of them:
var reg = /^[0-9.\/-]+$/
^ matches the start of the string
[...] matches if the character is contained in the group (i.e. any digit, ., / or -).
The / has to be escaped because it also denotes the end of a regex literal.
- between two characters describes a range of characters (between them, e.g. 0-9 or a-z). If - is at the beginning or end it has no special meaning though and is literally interpreted as hyphen.
+ is a quantifier and means "one or more if the preceding pattern". This allows us (together with the anchors) to test whether every character of the string is in the character class.
$ matches the end of the string
Alternatively, you can check whether there is any character that is not one of the allowed ones:
var reg = /[^0-9.\/-]/;
The ^ at the beginning of the character class negates it. Here we don't have to test every character of the string, because the existence of only character is different already invalidates the string.
You can use it like so:
if (reg.test(str)) { // !reg.test(str) for the first expression
// str contains an invalid character
}
Try this:
([0-9]{2}[/\-.]){2}[0-9]{4}
If you are not concerned about the validity of the date, you can easily use the regex:
^[0-9]{1,2}[./-][0-9]{1,2}[./-][0-9]{4}$
The character class [./-] allows any one of the characters within the square brackets and the quantifiers allow for either 1 or 2 digit months and dates, while only 4 digit years.
You can also group the first few groups like so:
^([0-9]{1,2}[./-]){2}[0-9]{4}$
Updated your fiddle with the first regex.

Using regular expression in Javascript

I need to check whether information entered are 3 character long, first one should be 0-9 second A-Z and third 0-9 again.
I have written pattern as below:
var pattern = `'^[A-Z]+[0-9]+[A-Z]$'`;
var valid = str.match(pattern);
I got confused with usage of regex for selecting, matching and replacing.
In this case, does[A-Z] check only one character or whole string ?
Does + separate(split?) out characters?
1) + matches one or more. You want exactly one
2) declare your pattern as a REGEX literal, inside forward slashes
With these two points in mind, your pattern should be
/^[A-Z][0-9][A-Z]$/
Note also you can make the pattern slightly shorter by replacing [0-9] with the \d shortcut (matches any numerical character).
3) Optionally, add the case-insensitive i flag after the final trailing slash if you want to allow either case.
4) If you want to merely test a string matches a pattern, rather than retrieve a match from it, use test(), not match() - it's more efficient.
var valid = pattern.test(str); //true or false
+ means one or more characters so a possible String would be ABCD1234EF or A3B, invalid is 3B or A 6B
This is the regex you need :
^[0-9][A-Z][0-9]$
In this case, does[A-Z] check only one character or whole string ?
It's just check 1 char but a char can be many times in a string..
you should add ^ and $ in order to match the whole string like I did.
Does + separate(split?) out characters?
no.
+ sign just shows that a chars can repeat 1+ times.
"+" means one or more. In your case you should use exact quantity match:
/^\w{1}\d{1}\w{1}$/

New to Regular Expressions need help

I need a form with one button and window for input
that will check an array, via a regular expression.
And will find a exact match of letters + numbers. Example wxyz [some space btw] 0960000
or a mix of numbers and letters [some space btw] + numbers 01xg [some space btw] 0960000
The array has four objects for now.
Once found i need a function the will open a new page or window when match is found .
Thanks you for your help.
Michael
To answer the Javascript part, here's one way to "grep" through the array to find matching elements:
var matches = [];
var re = /whatever/;
foo.forEach(
function(el) {
if( re.exec(el) )
matches.push(el);
}
);
To attempt to answer the regular expression part: I don't know what "exact match" means to you, and I'm assuming "some space" belongs only in between the other terms, and I'm assuming letters means the English alphabet from 'a' to 'z' in lower and upper case and the digits should be 0-9 (otherwise, other language characters might be matched).
The first pattern would be /[a-zA-Z0-9]+\s*0960000/. Change "\s*" to "\s+" if there is at least one space, instead of zero or more space characters. Change "\s" to " " if matching the tab character (and some lesser-used space chars) is not desirable.
For the second pattern, I don't know what "numbers 01xg" means, but if it means numbers followed by that string, then the pattern would be /[a-zA-Z0-9]+\s*[0-9]+\s*01xg\s*0960000/. The same caveats apply as above.
Additionally, this will match a partial string. If the string much be matched in entirety (if nothing in the string must exist except that which is matched), add "^" to the beginning of the pattern to anchor it to the beginning of the string, and "$" at the end to anchor it to the end of the string. For example, /[a-zA-Z0-9]+\s*0960000/ matches "foo_bar 5 0960000", but /^[a-zA-Z0-9]+\s*0960000$/ does not.
For more on regular expressions in Javascript, take a look at developer.mozilla.org's article on the RegExp object (the link takes you to JS version 1.5 reference, which should apply to all JS-capable browsers).
(edited to add): To match either situation, since they have overlapping parts, you could use the following pattern: /[a-zA-Z0-9]+(?:\s*[0-9]+\s*01xg)?\s*0960000/. The question mark says to match the part that differs -- in a non-matching group (?:foo) -- once or zero times. (?:foo)? and (?:foo|) do the same thing in this case, but I'm not sure whether there is a performance difference; I would recommend to use the one that makes the most sense to you, so you can read it later.

Categories

Resources