Regex to capture variables - javascript

I am trying to use an HTML form and javascript (i mention this, because some advanced features of regex processing are not available when using it on javascript) to acomplish the following:
feed the form some text, and use a regex to look into it and "capture" certain parts of it to be used as variables...
i.e. the text is:
"abcde email: asdf#gfds.com email: fake#mail.net sdfsdaf..."
... now, my problem is that I cannot think of an elegant way of capturing both emails as the variables e1 and e2, for example.
the regex I have so far is something like this: /email: (\b\w+\b)/g but for some reason, this is not giving back the 2 matches... it only gives back asdf#gfds.com ><
sugestions?

You can use RegExp.exec() to repeatedly apply a regex to a string, returning a new match each time:
var entry = "[...]"; //Whatever your data entry is
var regex = /email: (\b\w+\b)/g
var emails = []
while ((match = regex.exec(entry))) {
emails[emails.length] = match[1];
}
I stored all the e-mails in an array (so as to make this work far arbitrary input). It looks like your regex might be a little off, too; you'll have to change it if you just want to capture the full e-mail.

Related

How can I split a string but keep the delimiters in javascript?

I have searched online but it really doesnt make much sense.
I am trying to split a string and keep the delimiters too.
For example, I have this string:
var str = '45612+54721/121*124.2';
And, I would like to split it based on the operations & keep the operations.
So the output must look like this
[45612], [+], [54721], [/], [121], [*], [124.2];
If you use strings’ split with a regex, any captured groups become part of the resulting array.
var str = '45612+54721/121*124.2';
var tokens = str.split(/([+/*])/);
console.log(tokens);

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).
It needs to remove the trailing slash
It needs to be regex (No New URL)
It need to work with query parameters and anchor links
In other words all the following should return https://apple.com or https://www.apple.com for the last one.
https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo
These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar
The closer I got is with:
const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash
But this doesn't work with anchor links and queries which start right after the url without the / before
Any idea how I can get it to work on all cases?
You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.
For your example data, you could match:
^https?:\/\/[^#?\/]+
Regex demo
strings = [
"https://apple.com?query=true&slash=false",
"https://apple.com#anchor=true&slash=false",
"http://www.apple.com/#anchor=true&slash=true&whatever=foo",
"https://foo.bar/?q=true"
];
strings.forEach(s => {
console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})
You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.
Regex is a painful way to do something that the browser makes otherwise very simple.
I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.
let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"
let urlOne = new URL(one)
console.log(urlOne.origin)
let urlTwo = new URL(two)
console.log(urlTwo.origin)
let urlThree = new URL(three)
console.log(urlThree.origin)
const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');
This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.
^http.*?(?=\.com)
Or maybe you could do:
myUrl.Replace(/(#|\?|\/#).*$/, "")
To remove everything after the host name.

Extract characters in URL after certain character up to certain character

I'm trying to extract certain piece of a URL using regex (JavaScript) and having trouble excluding characters after a certain piece. Here's what I have so far:
URL: http://www.somesite.com/state-de
Using url.match(/\/[^\/]+$/)[0] I can extract the state-de like I want.
However when the URL becomes http://www.somesite.com/state-de?page=r and I do the same regex it pulls everything including the "?page=r" which I don't want. I want to only extract the state-de regardless of whats after it (looks like usually a "?" follows it)
This might work:
var arr = url.split("/")
arr[arr.length - 1].split("?")[0]
I'd recommend reading up on regular expressions in general. What you want to do here is make the regular expression stop when it hits the ? in the URL.
Using capturing groups to select which part of the match that you want might also be useful here.
Example:
url.match(/(\/[^\/?]+)(?:\?.*)?$/)[1]
I avoid overly complex RegExs when possible, so I tend to do this in multiple steps (with .replace()):
var stripped = url.replace(/[?#].*/, ''); // Strips anything after ? or #
You can now do the simpler transform to get the state, e.g.:
var state = stripped.split('/').pop()
If you want do it by regex try this one:
url.match(/https?:\/\/([a-z0-9-]+\.)+[a-z]+\/([a-z0-9_-])\/?(\?.*)?/)[1]
Or you could do it using JQuery:
var url = 'http://www.somesite.com/state-de?page=r#mark4';
// Create a special anchor element, set the URL to it
var a = $('<a>', { href:url } )[1];
console.log(a.hostname);
console.log(a.pathname);
console.log(a.search);
console.log(a.hash);

How to strip comments from Javascript using PHP

I want to remove the comments from these kind of scripts:
var stName = "MyName"; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain";
var stCountry = "United State of America";
What is (the best) ways of accomplish this using PHP?
The best way is to use an actual parser or write at least a lexer yourself.
The problem with Regex is that it gets enormously complex if you take everything into account that you have to.
For example, Cagatay Ulubay's suggested Regex'es /\/\/[^\n]?/ and /\/\*(.*)\*\// will match comments, but they will also match a lot more, like
var a = '/* the contents of this string will be matches */';
var b = '// and here you will even get a syntax error, because the entire rest of the line is removed';
var c = 'and actually, the regex that matches multiline comments will span across lines, removing everything between the first "/*" and here: */';
/*
this comment, however, will not be matched.
*/
While it is rather unlikely that strings contain such sequences, the problem is real with inline regex:
var regex = /^something.*/; // You see the fake "*/" here?
The current scope matters a lot, and you can't possibly know the current scope unless you parse the script from the beginning, character for character.
So you essentially need to build a lexer.
You need to split the code into three different sections:
Normal code, which you need to output again, and where the start of a comment could be just one character away.
Comments, which you discard.
Literals, which you also need to output, but where a comment cannot start.
Now the only literals I can think of are strings (single- and double-quoted), inline regex and template strings (backticks), but those might not be all.
And of course you also have to take escape sequences inside those literals into account, because you might encounter an inline regex like
/^file:\/\/\/*.+/
in which a single-character based lexer would only see the regex /^file:\/ and incorrectly parse the following /*.+ as the start of a multiline comment.
Therefore upon encountering the second /, you have to look back and check if the last character you passed was a \. The same goes for all kinds of quotes for strings.
I would go with preg_replace(). Assuming all comments are single line comments (// Comment here) you can start with this:
$JsCode = 'var stName = "MyName isn\'t \"Foobar\""; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain"; // Comment
var stLink2 = \'http://domain.com/mydomain\'; // This comment goes as well
var stCountry = "United State of America"; // Comment here';
$RegEx = '/(["\']((?>[^"\']+)|(?R))*?(?<!\\\\)["\'])(.*?)\/\/.*$/m';
echo preg_replace($RegEx, '$1$3', $JsCode);
Output:
var stName = "MyName isn't \"Foobar\"";
var stLink = "http://domain.com/mydomain";
var stLink2 = 'http://domain.com/mydomain';
var stCountry = "United State of America";
This solution is far from perfect and might have issues with strings containing "//" in them.

Matching invisible characters in JavaScript RegEx

I've got some string that contain invisible characters, but they are in somewhat predictable places. Typically the surround the piece of text I want to extract, and then after the 2nd occurrence I want to keep the rest of the text.
I can't seem to figure out how to both key off of the invisible characters, and exclude them from my result. To match invisibles I've been using this regex: /\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F/ which does seem to work.
Here's an example: [invisibles]Keep as match 1[invisibles]Keep as match 2
Here's what I've been using so far without success:
/([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)(.+)([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)/(.+)
I've got the capture groups in there, but it's bee a while since I've had to use regex's in this way, so I know I'm missing something important. I was hoping to just make the invisible matches non-capturing groups, but it seems that JavaScript does not support this.
Something like this seems like what you want. The second regex you have pretty much works, but the / is in totally the wrong place. Perhaps you weren't properly reading out the group data.
var s = "\x0EKeep as match 1\x0EKeep as match 2";
var r = /[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)/;
var match = s.match(r);
var part1 = match[1];
var part2 = match[2];

Categories

Resources