Matching css selectors with RegExp doesn't work in browser - javascript

I try to match css selectors as can be seen here:
https://regex101.com/r/kI3rW9/1
. It matches the teststring as desired, however when loading a .js file to test it in the browser it fails both in firefox and chrome.
The .js file:
window.onload = function() {
main();
}
main = function() {
var regexSel = new RegExp('([\.|#][a-zA-Z][a-zA-Z0-9.:_-]*) ?','g');
var text = "#left_nav .buildings #rfgerf .rtrgrgwr .rtwett.ww-w .tw:ffwwwe";
console.log(regexSel.exec(text));
}
In the browser it returns:["#left_nav ", "#left_nav", index: 0, input: "#left_nav .buildings #rfgerf .rtrgrgwr .rtwett.ww-w .tw:ffwwwe"]
So it appears it only captures the first selector with and without the whitespace, despite the whitespace beeing outside the () and the global flag set.
Edit:
So either looping over RegExp.exec(text) or just using String.match(str) will lead to the correct solution. Thanks to Wiktor's answer i was able to implement a convenient way of calling this functionality:
function Selector(str){
this.str = str;
}
with(Selector.prototype = new String()){
toString = valueOf = function () {
return this.str;
};
}
Selector.prototype.constructor = Selector;
Selector.prototype.parse = function() {
return this.match(/([\.|#][a-zA-Z][a-zA-Z0-9.:_-]*) ?/g);
}
//Using it the following way:
var text = new Selector("#left_nav .buildings #rfgerf .rtrgrgwr .rtwett.ww-w .tw:ffwwwe");
console.log(text.parse());
I decided however using
/([\.|#][a-zA-Z][a-zA-Z0-9.:_-]*) ?/g over the suggested
/([.#][a-zA-Z][a-zA-Z0-9.:_-]*)(?!\S)/g because it matches with 44 vs. 60 steps on regex101.com on my teststring.

You ran exec once, so you got one match object. You'd need to run it inside a loop.
var regexSel = new RegExp('([\.|#][a-zA-Z][a-zA-Z0-9.:_-]*) ?','g');
var text = "#left_nav .buildings #rfgerf .rtrgrgwr .rtwett.ww-w .tw:ffwwwe";
while((m=regexSel.exec(text)) !== null) {
console.log(m[1]);
}
A regex with a (?!\S) lookaround at the end (that fails the match if there is no non-whitespace after your main consuming pattern) will allow simpler code:
var text = "#left_nav .buildings #rfgerf .rtrgrgwr .rtwett.ww-w .tw:ffwwwe";
console.log(text.match(/[.#][a-zA-Z][a-zA-Z0-9.:_-]*(?!\S)/g));
Note that you should consider using regex literal notation when defining your static regexps. Only prefer constructor notation with RegExp when your patterns are dynamic, have some variables or too many / that you do not want to escape.
Look also at [.#]: the dot does not have to be escaped and | inside is treated as a literal pipe symbol (not alternation operator).

Related

JavaScript regex escape multiple characters

Is is possible escape parameterized regex when parameter contains multiple simbols that need to be escaped?
const _and = '&&', _or = '||';
let reString = `^(${_and}|${_or})`; //&{_or} needs to be escaped
const reToken = new RegExp(reString);
Working but not optimal:
_or = '\\|\\|';
Or:
let reString = `^(${_and}|\\|\\|)`;
It is preferred to reuse _or variable and keep regex parameterized.
You can make your own function which would escape your parameters, so that these works in final regexp. To save you time, I already found one written in this answer. With that function, you can write clean parameters without actually escaping everything by hand. Though I would avoid modifying build in classes (RegExp) and make a wrapper around it or something separate. In example below I use exact function I found in the other answer, which extends build in RegExp.
RegExp.escape = function(s) {
return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
};
const and = RegExp.escape('&&');
const or = RegExp.escape('||');
const andTestString = '1 && 2';
const orTestString = '1 || 2';
const regexp = `${and}|${or}`;
console.log(new RegExp(regexp).test(andTestString)); // true
console.log(new RegExp(regexp).test(orTestString)); // true
EDITED
https://jsfiddle.net/ao4t0pzr/1/
You can use a Template Literal function to escape the characters within the string using a Regular Expression. You can then use that string to propagate a new RegEx filled with escaped characters:
function escape(s) {
return s[0].replace(/[-&\/\\^$*+?.()|[\]{}]/g, '\\$&');
};
var or = escape`||`;
var and = escape`&&`;
console.log(new RegExp(`${and}|${or}`)); // "/\&\&|\|\|/"

Javascript regex error " /?/: nothing to repeat " It worked fine earlier [duplicate]

This question already has answers here:
What does the "Nothing to repeat" error mean when using a regex in javascript?
(7 answers)
Closed 4 years ago.
I'm trying to clear a string of any invalid characters to be set as a directory.
Tried a number of methods and this one eventually worked[custom encoding] but now it doesn't, it says "nothing to repeat" in the console. What does that mean? using Chrome.
Here's the code(using random string):
var someTitle = "wa?";
var cleanTitle = cleanTitle(someTitle);
function cleanTitle(title){
var obstructions = ['\\','/',':','*','?','"','<','>','|'];
var solutions = [92,47,58,42,63,34,60,62,124];
var encodedTitle = title;
for (var obstruction = 0; obstruction < obstructions.length; obstruction++){
var char = obstructions[obstruction];
if (encodedTitle.includes(char)){
var enCode = "__i!__"+solutions[obstruction]+"__!i__";
var rEx = new RegExp(char,"g");
encodedTitle = encodedTitle.replace(rEx,enCode);
}
}
console.log("CLEAN: "+title);
console.log("ENCODED: "+encodedTitle);
return encodedTitle;
}
Heres the error:
Uncaught SyntaxError: Invalid regular expression: /?/: Nothing to
repeat
It points to this line -> var rEx = new RegExp(char,"g");
You need to escape some characters when using them as literals in a regular expression. Among those are most of the characters you have in your array.
Given your function replaces the obstruction characters with their ASCII code (and some wrapping __i!__), I would suggest to make your function a bit more concise, by performing the replacement with one regular expression, and a callback passed to .replace():
function cleanTitle(title){
return title.replace(/[\\/:*?"<>|]/g, function (ch) {
return "__i!__"+ch.charCodeAt(0)+"__!i__";
});
}
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);
...and if you are in an ES6 compatible environment:
var cleanTitle = t=>t.replace(/[\\/:*?"<>|]/g, c=>"__i!__"+c.charCodeAt(0)+"__!i__");
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);
The ? is a regex modifier. When you want to look for it (and build a regex with it), you need to escape it.
That beeing said, a harmless unuseful escaping doesn't hurt (or makes your other search params useable, as there are many modifiers or reserved chars in it) your other search params. So go with
var char = '\\' + obstructions[obstruction];
to replace them all with a (for the regex) string representation
/?/ is not a valid regex. For it to be a regex, you need /\?/.
Regex here would be awkward, as most of the characters need escaping. Instead, consider using a literal string replacement until it is no longer found:
while( encodedTitle.indexOf(char) > -1) {
encodedTitle = encodedTitle.replace(char,enCode);
}

regex in js, match pattern except keywords

I m trying to found a regex pattern in js
Any_Function() //match : Any_Function(
butnotthis() //I don't want to match butnotthis(
I have this pattern : /([a-zA-Z_]+\()/ig
and would like something like
/(not:butnotthis)|([a-zA-Z_]+\()/ig (don't try this)
Demo here :
http://regexr.com/38qag
Is it possible to don't match keywords ?
The way I interpreted your question, you wanted to be able to create a blacklist of ignored functions. As far as I know, you cannot do this with regular expressions; however, you could do it with a bit of JavaScript.
I created a JSFiddle: http://jsfiddle.net/DQN79/
var str = "Any_Function();butnotthis();",
matches = [],
blacklist = { butnotthis: true };
str.replace(/([a-zA-Z_]+\()/ig, function (match) {
if (!blacklist[match.substr(0, match.length - 1)])
matches.push(match);
});
console.log(matches);
In this example, I abused the String#replace() method because it accepts a callback that will be fired for each match. I used this callback to check for blacklisted function names - if the function is not blacklisted, it will be added to the matches array.
I used a hashmap for the blacklist because it is programmatically easier, but you could also use a string, array, etc.
You can establish a convention between functions and keywords, where the functions should start with uppercase letter. In that case the regex would be:
/(^[A-Z][a-zA-z_]+\()/ig
Here is one working version:
^(?!(butnotthis\())([a-zA-Z_]+\()/ig
specific the list of functions to be ignored within the braces
http://regexr.com/38qb8
For Javascript:
var str = "Any_Function();butnotthis();",
matches = [],
blacklist = ["butnotthis"];
// Uses filter method of jQuery
matches = str.match(/([a-zA-Z_]+\()/ig).filter(
function (e) {
var flag = false;
for (var i in blacklist) {
if (e.indexOf(blacklist[i]) !== 0) flag = true;
}
return flag;
});
console.log(matches)
jsBin : http://jsbin.com/vevip/1/edit

Node.JS - How would I do this regex?

Well I have this:
var regex = /convertID\s(\d+)/
var match = regex.exec(message);
if(match != null)
{
//do stuff here
}
That works fine and it recognizes if someone writes "convertID NumbersHere".
However I want to have another one under it as well checking if there's a specific link, for example:
var regex = /convertID\shttp://anysitehere dot com/id/[A-Z]
var match = regex.exec(message);
if(match != null)
{
//do stuff here
}
So how would I make it check for an specific site with any letters after /id/?
You can use this:
var regex = /convertID\shttp:\/\/thesite.com\/id\/[A-Za-z]+/;
slashes must be escaped since the slash is used to delimit the pattern. You can avoid this creating explicitly an instance of RegExp class:
var regex = new RegExp("convertID\\shttp://thesite.com/id/[A-Za-z]+");

Use JavaScript string operations to cut out exact text

I'm trying to cut out some text from a scraped site and not sure what functions or library's I can use to make this easier:
example of code I run from PhantomJS:
var latest_release = page.evaluate(function () {
// everything inside this function is executed inside our
// headless browser, not PhantomJS.
var links = $('[class="interesting"]');
var releases = {};
for (var i=0; i<links.length; i++) {
releases[links[i].innerHTML] = links[i].getAttribute("href");
}
// its important to take note that page.evaluate needs
// to return simple object, meaning DOM elements won't work.
return JSON.stringify(releases);
});
Class interesting has what I need, surrounded by new lines and tabs and whatnot.
here it is:
{"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null,"\n\t\t\t\n\t\t\t\tI_Am_Interesting\n\t\t\t\n\t\t":null}
I tried string.slice("\n"); and nothing happened, I really want a effective way to be able to cut out strings like this, based on its relationship to those \n''s and \t's
By the way this was my split code:
var x = latest_release.split('\n');
Cheers.
Its a simple case of stripping out all whitespace. A job that regexes do beautifully.
var s = " \n\t\t\t\n\t\t\t\tI Am Interesting\n\t\t \t \n\t\t";
s = s.replace(/[\r\t\n]+/g, ''); // remove all non space whitespace
s = s.replace(/^\s+/, ''); // remove all space from the front
s = s.replace(/\s+$/, ''); // remove all space at the end :)
console.log(s);
Further reading: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp
var interesting = {
"\n\t\t\t\n\t\t\t\tI_Am_Interesting1\n\t\t\t\n\t\t":null,
"\n\t\t\t\n\t\t\t\tI_Am_Interesting2\n\t\t\t\n\t\t":null,
"\n\t\t\t\n\t\t\t\tI_Am_Interesting3\n\t\t\t\n\t\t":null
}
found = new Array();
for(x in interesting) {
found[found.length] = x.match(/\w+/g);
}
alert(found);
Could you try with "\\n" as pattern? your \n may be understood as plain string rather than special character
new_string = string.replace("\n", "").replace("\t", "");

Categories

Resources