Regular expression for subpattern match [duplicate]

Regular expression for subpattern match [duplicate] - javascript

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I want regular expression which return true if any continuous three charters match.
For e.g /[money]{3,}/g
It return true for mon, one, ney and return false for mny.

Regular expressions function as a search for a character string, your application would require taking a base string and dynamically building an insane regex with many ORs and lookaheads/behinds. For your application, write a function that uses indexOf
function stringContainsSubstr(sourceStr, subStr) {
return sourceStr.indexOf(subStr) !== -1;
}
var exampleStrs = ["mon", "one", "ney", "mny"];
var str = "money";
for (var i = 0; i < exampleStrs.length; i++) {
console.log(stringContainsSubstr(str, exampleStrs[i]));
}
http://plnkr.co/edit/2mtV1NeD1MYta5v49oWr

I would not use regex, why not use indexOf, it's less code and better to read.
something like "money".indexOf("mon")>-1
Here a Demo, with all listed examples:
let values = ["mon","one", "ney", "mny"];
let shouldMatch = "money";
for (let idx = 0; idx<values.length;idx++){
console.info(values[idx], "=", shouldMatch.indexOf(values[idx])>-1);
}
But If you want to use RegExp, you could use it like this:
(BTW: this is only a "fancy" way to write the example above)
let values = ["mon","one", "ney", "mny"];
function matcher(word, value){
return (new RegExp(value)).test(word);
}
for (let idx = 0; idx<values.length;idx++){
console.info(values[idx], "=", matcher("money", values[idx]));
}
The Code Basically:
Creates a new Regular Expression exp. (new RegExp("mon")) (equal to /mon/) and than just testing, if the "pattern" matches the word "money" (new RegExp("mon")).test("money") this returns true.
Here it is all turned around, we are checking if money fits into the (sub)-pattern mon.

Related

Javascript regex error " /?/: nothing to repeat " It worked fine earlier [duplicate]

This question already has answers here:
What does the "Nothing to repeat" error mean when using a regex in javascript?
(7 answers)
Closed 4 years ago.
I'm trying to clear a string of any invalid characters to be set as a directory.
Tried a number of methods and this one eventually worked[custom encoding] but now it doesn't, it says "nothing to repeat" in the console. What does that mean? using Chrome.
Here's the code(using random string):
var someTitle = "wa?";
var cleanTitle = cleanTitle(someTitle);
function cleanTitle(title){
var obstructions = ['\\','/',':','*','?','"','<','>','|'];
var solutions = [92,47,58,42,63,34,60,62,124];
var encodedTitle = title;
for (var obstruction = 0; obstruction < obstructions.length; obstruction++){
var char = obstructions[obstruction];
if (encodedTitle.includes(char)){
var enCode = "__i!__"+solutions[obstruction]+"__!i__";
var rEx = new RegExp(char,"g");
encodedTitle = encodedTitle.replace(rEx,enCode);
}
}
console.log("CLEAN: "+title);
console.log("ENCODED: "+encodedTitle);
return encodedTitle;
}
Heres the error:
Uncaught SyntaxError: Invalid regular expression: /?/: Nothing to
repeat
It points to this line -> var rEx = new RegExp(char,"g");

You need to escape some characters when using them as literals in a regular expression. Among those are most of the characters you have in your array.
Given your function replaces the obstruction characters with their ASCII code (and some wrapping __i!__), I would suggest to make your function a bit more concise, by performing the replacement with one regular expression, and a callback passed to .replace():
function cleanTitle(title){
return title.replace(/[\\/:*?"<>|]/g, function (ch) {
return "__i!__"+ch.charCodeAt(0)+"__!i__";
});
}
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);
...and if you are in an ES6 compatible environment:
var cleanTitle = t=>t.replace(/[\\/:*?"<>|]/g, c=>"__i!__"+c.charCodeAt(0)+"__!i__");
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);

The ? is a regex modifier. When you want to look for it (and build a regex with it), you need to escape it.
That beeing said, a harmless unuseful escaping doesn't hurt (or makes your other search params useable, as there are many modifiers or reserved chars in it) your other search params. So go with
var char = '\\' + obstructions[obstruction];
to replace them all with a (for the regex) string representation

/?/ is not a valid regex. For it to be a regex, you need /\?/.
Regex here would be awkward, as most of the characters need escaping. Instead, consider using a literal string replacement until it is no longer found:
while( encodedTitle.indexOf(char) > -1) {
encodedTitle = encodedTitle.replace(char,enCode);
}

How to remove string between two characters every time they occur [duplicate]

This question already has answers here:
Strip HTML from Text JavaScript
(44 answers)
removing html tags from string
(3 answers)
Closed 7 years ago.
I need to get rid of any text inside < and >, including the two delimiters themselves.
So for example, from string
<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>
I would like to get this one
that
This is what i've tried so far:
var str = annotation.split(' ');
str.substring(str.lastIndexOf("<") + 1, str.lastIndexOf(">"))
But it doesn't work for every < and >.
I'd rather not use RegEx if possible, but I'm happy to hear if it's the only option.

You can simply use the replace method with /<[^>]*>/g.It matches < followed by [^>]* any amount of non> until > globally.
var str = '<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>';
str = str.replace(/<[^>]*>/g, "");
alert(str);

For string removal you can use RegExp, it is ok.
"<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>".replace(/<\/?[^>]+>/g, "")

Since the text you want is always after a > character, you could split it at that point, and then the first character in each String of the array would be the character you need. For example:
String[] strings = stringName.split("<");
String word = "";
for(int i = 0; i < strings.length; i++) {
word += strings[i].charAt(0);
}
This is probably glitchy right now, but I think this would work. You don't need to actually remove the text between the "<>"- just get the character right after a '>'

Using a regular expression is not the only option, but it's a pretty good option.
You can easily parse the string to remove the tags, for example by using a state machine where the < and > characters turns on and off a state of ignoring characters. There are other methods of course, some shorter, some more efficient, but they will all be a few lines of code, while a regular expression solution is just a single replace.
Example:
function removeHtml1(str) {
return str.replace(/<[^>]*>/g, '');
}
function removeHtml2(str) {
var result = '';
var ignore = false;
for (var i = 0; i < str.length; i++) {
var c = str.charAt(i);
switch (c) {
case '<': ignore = true; break;
case '>': ignore = false; break;
default: if (!ignore) result += c;
}
}
return result;
}
var s = "<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>";
console.log(removeHtml1(s));
console.log(removeHtml2(s));

There are several ways to do this. Some are better than others. I haven't done one lately for these two specific characters, so I took a minute and wrote some code that may work. I will describe how it works. Create a function with a loop that copies an incoming string, character by character, to an outgoing string. Make the function a string type so it will return your modified string. Create the loop to scan from incoming from string[0] and while less than string.length(). Within the loop, add an if statement. When the if statement sees a "<" character in the incoming string it stops copying, but continues to look at every character in the incoming string until it sees the ">" character. When the ">" is found, it starts copying again. It's that simple.
The following code may need some refinement, but it should get you started on the method described above. It's not the fastest and not the most elegant but the basic idea is there. This did compile, and it ran correctly, here, with no errors. In my test program it produced the correct output. However, you may need to test it further in the context of your program.
string filter_on_brackets(string str1)
{
string str2 = "";
int copy_flag = 1;
for (size_t i = 0 ; i < str1.length();i++)
{
if(str1[i] == '<')
{
copy_flag = 0;
}
if(str1[i] == '>')
{
copy_flag = 2;
}
if(copy_flag == 1)
{
str2 += str1[i];
}
if(copy_flag == 2)
{
copy_flag = 1;
}
}
return str2;
}

RegExp (in Javascript) minimum length and minimum length after pipes, if pipes are present

I'm trying to write a regular expression that requires a minimum of 4 characters, and, if separated by pipes, that there are at least 4 characters present after each pipe.
For example, these entries would be valid:
weather|bronco|flock
weather
Whereas, these ones would not:
red
weather|br||flock|red
What I have so far almost works, except that it allows users adding only 1 alphanumeric before entering in another pipe:
^((?:(?!([|][|])|^[|]|[|]\\s|[|]$).)*)

It's easier to do without a regex:
'foo|bar||baz'.split('|').every(function(elem) {
return elem.length >= 4;
});
But if you insist on a regex, this should work:
/^[^|]{4,}(?:\|[^|]{4,})*$/

if you really need regex, you may use the following
^(\w{4,}\|)+\w{4,}$
DEMO
I assumed you need \w between pipes. It could be changed to other symbols. (You may replace it with [^|] for example, if you want allow anything except pipe).
Without regex:
valid = true;
for(word in "string|with|pipes".split()) {
if(word.length < 4) {
valid = false;
break;
}
}

var array = "weather|br||flock|red".split('|'); //["weather", "br", "", "flock", "red"]
var valid = true;
//now iterate array and check if any item is < 4 length
for(var i=0; i < array.length; i++){
if(array[i].length < 4 ) valid = false;
}
console.log(valid);

You can do it with this pattern:
var regex = /^(?:\w{4,}\|?)+$/
or with more constraint:
var regex = /^(?:\w{4,}(?:\|(?!$))?)+$/

regular expression javascript returning unexpected results

In the below code, I want to validate messageText with first validationPattern and display the corresponding message from the validationPatterns array. Pattern and Message are separated by Pipe "|" character.
for this I am using the below code and always getting wrong result. Can some one look at this and help me?
var messageText = "Message1234";
var validationPatterns = [
['\/^.{6,7}$/|message one'],
['\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|message two']
];
for (var i = 0; i < validationPatterns.length; i++) {
var validationvalues = validationPatterns[i].toString();
var expr = validationvalues.split("|")[0];
console.log(expr.constructor);
if(expr.test(messageText)) {
console.log("yes");
} else {
console.log("no");
}
}
I know that we cannot use pipe as separator as pipe is also part of regular expression. However I will change that later.

Your validationpatterns are strings. That means:
The backslashes get eaten as they just string-escape the following characters. "\b" is equivalent to "b". You would need to double escape them: "\\b"
You cannot call the test method on them. You would need to construct RegExp objects out of them.
While it's possible to fix this, it would be better if you just used regex literals and separated them from the message as distinct properties of an object (or in an array).
var inputText = "Message1234";
var validationPatterns = [
[/^.{6,7}$/, 'message one'],
[/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/, 'message two']
];
for (var i = 0; i < validationPatterns.length; i++) {
var expr = validationPatterns[i][0],
message = validationPatterns[i][1];
console.log(expr.constructor); // RegExp now, not String
if(expr.test(inputText)) {
console.log(message+": yes");
} else {
console.log(message+": no");
}
}

Your expr variable is still just a string (validationvalues.split("|")[0] will return a string). That's the reason it does not work as a regular expression.
You need to add a line after the initial definition of expr.
expr = new RegExp(expr, 'i');
The 'i' is just an example of how you would use a case-insensitive flag or other flags. Use an empty string if you want a case-sensitive search (the default).
Also, you need to take out the / and / which are surrounding your first pattern. They are only needed when using regular expression literals in JavaScript code, and are not needed when converting strings into regular expressions.

How do I split a string into an array of characters? [duplicate]

This question already has answers here:
How to get character array from a string?
(14 answers)
Closed 5 years ago.
var s = "overpopulation";
var ar = [];
ar = s.split();
alert(ar);
I want to string.split a word into array of characters.
The above code doesn't seem to work - it returns "overpopulation" as Object..
How do i split it into array of characters, if original string doesn't contain commas and whitespace?

You can split on an empty string:
var chars = "overpopulation".split('');
If you just want to access a string in an array-like fashion, you can do that without split:
var s = "overpopulation";
for (var i = 0; i < s.length; i++) {
console.log(s.charAt(i));
}
You can also access each character with its index using normal array syntax. Note, however, that strings are immutable, which means you can't set the value of a character using this method, and that it isn't supported by IE7 (if that still matters to you).
var s = "overpopulation";
console.log(s[3]); // logs 'r'

Old question but I should warn:
Do NOT use .split('')
You'll get weird results with non-BMP (non-Basic-Multilingual-Plane) character sets.
Reason is that methods like .split() and .charCodeAt() only respect the characters with a code point below 65536; bec. higher code points are represented by a pair of (lower valued) "surrogate" pseudo-characters.
'𝟙𝟚𝟛'.length // —> 6
'𝟙𝟚𝟛'.split('') // —> ["�", "�", "�", "�", "�", "�"]
'😎'.length // —> 2
'😎'.split('') // —> ["�", "�"]
Use ES2015 (ES6) features where possible:
Using the spread operator:
let arr = [...str];
Or Array.from
let arr = Array.from(str);
Or split with the new u RegExp flag:
let arr = str.split(/(?!$)/u);
Examples:
[...'𝟙𝟚𝟛'] // —> ["𝟙", "𝟚", "𝟛"]
[...'😎😜🙃'] // —> ["😎", "😜", "🙃"]
For ES5, options are limited:
I came up with this function that internally uses MDN example to get the correct code point of each character.
function stringToArray() {
var i = 0,
arr = [],
codePoint;
while (!isNaN(codePoint = knownCharCodeAt(str, i))) {
arr.push(String.fromCodePoint(codePoint));
i++;
}
return arr;
}
This requires knownCharCodeAt() function and for some browsers; a String.fromCodePoint() polyfill.
if (!String.fromCodePoint) {
// ES6 Unicode Shims 0.1 , © 2012 Steven Levithan , MIT License
String.fromCodePoint = function fromCodePoint () {
var chars = [], point, offset, units, i;
for (i = 0; i < arguments.length; ++i) {
point = arguments[i];
offset = point - 0x10000;
units = point > 0xFFFF ? [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)] : [point];
chars.push(String.fromCharCode.apply(null, units));
}
return chars.join("");
}
}
Examples:
stringToArray('𝟙𝟚𝟛') // —> ["𝟙", "𝟚", "𝟛"]
stringToArray('😎😜🙃') // —> ["😎", "😜", "🙃"]
Note: str[index] (ES5) and str.charAt(index) will also return weird results with non-BMP charsets. e.g. '😎'.charAt(0) returns "�".
UPDATE: Read this nice article about JS and unicode.

.split('') splits emojis in half.
Onur's solutions work for some emojis, but can't handle more complex languages or combined emojis.
Consider this emoji being ruined:
[..."🏳️‍🌈"] // returns ["🏳", "️", "‍", "🌈"] instead of ["🏳️‍🌈"]
Also consider this Hindi text अनुच्छेद which is split like this:
[..."अनुच्छेद"] // returns ["अ", "न", "ु", "च", "्", "छ", "े", "द"]
but should in fact be split like this:
["अ","नु","च्","छे","द"]
This happens because some of the characters are combining marks (think diacritics/accents in European languages).
You can use the grapheme-splitter library for this:
It does proper standards-based letter split in all the hundreds of exotic edge-cases - yes, there are that many.

It's as simple as:
s.split("");
The delimiter is an empty string, hence it will break up between each single character.

The split() method in javascript accepts two parameters: a separator and a limit.
The separator specifies the character to use for splitting the string. If you don't specify a separator, the entire string is returned, non-separated. But, if you specify the empty string as a separator, the string is split between each character.
Therefore:
s.split('')
will have the effect you seek.
More information here

A string in Javascript is already a character array.
You can simply access any character in the array as you would any other array.
var s = "overpopulation";
alert(s[0]) // alerts o.
UPDATE
As is pointed out in the comments below, the above method for accessing a character in a string is part of ECMAScript 5 which certain browsers may not conform to.
An alternative method you can use is charAt(index).
var s = "overpopulation";
alert(s.charAt(0)) // alerts o.

To support emojis use this
('Dragon 🐉').split(/(?!$)/u);
=> ['D', 'r', 'a', 'g', 'o', 'n', ' ', '🐉']

You can use the regular expression /(?!$)/:
"overpopulation".split(/(?!$)/)
The negative look-ahead assertion (?!$) will match right in front of every character.

Develop Reference

JavaScript is the programming language of the Web.

Regular expression for subpattern match [duplicate] - javascript

This question already has an answer here: Reference - What does this regex mean? (1 answer) Closed 5 years ago. I want regular expression which return true if any continuous three charters match. For e.g /[money]{3,}/g It return true for mon, one, ney and return false for mny.

Related

Javascript regex error " /?/: nothing to repeat " It worked fine earlier [duplicate]

How to remove string between two characters every time they occur [duplicate]

RegExp (in Javascript) minimum length and minimum length after pipes, if pipes are present

regular expression javascript returning unexpected results

How do I split a string into an array of characters? [duplicate]

Categories

Resources