Regex - Extract digits from a url

Regex - Extract digits from a url - javascript

I have this url:
http://example.com/things/stuff/532453?morethings&stuff=things&ver=1
I need just that number in the middle there. Closest I got was
(\d*)?\?
but this includes the question mark. Basiclly all numbers that come before the ? all the way to the slash so the ouput is 532453.

Try the following regex (?!\/)\d+(?=\?):
url = "http://example.com/things/stuff/532453?morethings&stuff=things"
url.match(/(?!\/)\d+(?=\?)/) # outputs 532453
This regex will attempt to match any series of digits only after a / and before ? by using negative/positive lookahead without returning the / or ? as part of the match.
A quick test within developer tools:
# create a list of example urls to test against (only one should match regex)
urls = ["http://example.com/things/stuff/532453?morethings&stuff=things",
"http://example.com/things/stuff?morethings&stuff=things",
"http://example.com/things/stuff/123a?morethings&stuff=things"]
urls.forEach(function(value) {
console.log(value.match(/(?!\/)\d+(?=\?)/));
})
# returns the following:
["532453"]
null
null

Just use this:
([\d]+)
You can check this link out: https://regex101.com/r/hR2eY7/1
if you use javascript:
/([\d]+)/g

Try this :
url = "http://example.com/things/stuff/532453?morethings&stuff=things"
number = url.match(/(\d+)\?/g)[0].slice(0,-1)
Though the approach is slightly naive, it works. It grabs numbers with ? at the end then removes the ? from the end using slice.

Related

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks

Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h

As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

What RegEx would clean up this set of inputs?

I'm trying to figure out a RegEx that would match the following:
.../string-with-no-spaces -> string-with-no-spaces
or
string-with-no-spaces:... -> string-with-no-spaces
or
.../string-with-no-spaces:... -> string-with-no-spaces
where ... can be anything in these example strings:
example.com:8080/string-with-no-spaces:latest
string-with-no-spaces:latest
example.com:8080/string-with-no-spaces
string-with-no-spaces
and a bonus would be
http://example.com:8080/string-with-no-spaces:latest
and all would match string-with-no-spaces.
Is it possible for a single RegEx to cover all those cases?
So far I've gotten as far as /\/.+(?=:)/ but that not only includes the slash, but only works for case 3. Any ideas?
Edit: Also I should mention that I'm using Node.js, so ideally the solution should pass all of these: https://jsfiddle.net/ys0znLef/

How about:
(?:.*/)?([^/:\s]+)(?::.*|$)

Consider the following solution using specific regex pattern and String.match function:
var re = /(?:[/]|^)([^/:.]+?)(?:[:][^/]|$)/,
// (?:[/]|^) - passive group, checks if the needed string is preceded by '/' or is at start of the text
// (?:[:][^/]|$) - passive group, checks if the needed string is followed by ':' or is at the end of the text
searchString = function(str){
var result = str.match(re);
return result[1];
};
console.log(searchString("example.com:8080/string-with-no-spaces"));
console.log(searchString("string-with-no-spaces:latest"));
console.log(searchString("string-with-no-spaces"));
console.log(searchString("http://example.com:8080/string-with-no-spaces:latest"));
The output for all the cases above will be string-with-no-spaces

Here's the expression I've got... just trying to tweak to use the slash but not include it.
Updated result works in JS
\S([a-zA-Z0-9.:/\-]+)\S
//works on regexr, regex storm, & regex101 - tested with a local html file to confirm JS matches strings
var re = /\S([a-zA-Z0-9.:/\-]+)\S/;

How to find in javascript with regular expression string from url?

Good evening, How can I find in javascript with regular expression string from url address for example i have url: http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/ and I need only string between last slashes (/ /) http://something.cz/something/string/ in this example word that i need is mikronebulizer. Thank you very much for you help.

You could use a regex match with a group.
Use this:
/([\w\-]+)\/$/.exec("http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/")[1];
Here's a jsfiddle showing it in action
This part: ([\w\-]+)
Means at least 1 or more of the set of alphanumeric, underscore and hyphen and use it as the first match group.
Followed by a /
And then finally the: $
Which means the line should end with this
The .exec() returns an array where the first value is the full match (IE: "mikronebulizer/") and then each match group after that.
So .exec()[1] returns your value: mikronebulizer

Simply:
url.match(/([^\/]*)\/$/);
Should do it.
If you want to match (optionally) without a trailing slash, use:
url.match(/([^\/]*)\/?$/);
See it in action here: http://regex101.com/r/cL3qG3

If you have the url provided, then you can do it this way:
var url = 'http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/';
var urlsplit = url.split('/');
var urlEnd = urlsplit[urlsplit.length- (urlsplit[urlsplit.length-1] == '' ? 2 : 1)];
This will match either everything after the last slash, if there's any content there, and otherwise, it will match the part between the second-last and the last slash.

Something else to consider - yes a pure RegEx approach might be easier (heck, and faster), but I wanted to include this simply to point out window.location.pathName.
function getLast(){
// Strip trailing slash if present
var path = window.location.pathname.replace(/\/$?/, '');
return path.split('/').pop();
}

Alternatively you could get using split:
var pieces = "http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/".split("/");
var lastSegment = pieces[pieces.length - 2];
// lastSegment == mikronebulizer

var url = 'http://www.odsavacky.cz/blog/wpcproduct/mikronebulizer/';
if (url.slice(-1)=="/") {
url = url.substr(0,url.length-1);
}
var lastSegment = url.split('/').pop();
document.write(lastSegment+"<br>");

Regex multiple matches for HTML attributes

I want to match multiple data-i18n attributes with a JavaScript regexp.
I tried the following regexp :
var regexp = /(data\-i18n="[^"]+")/g;
which in my head seemed rather straight forward, but it ended up not working.
If you try to match on the following HTML tag :
<a random-attr="ok" data-i18n="first match" data-i18n="second match">my text</a>
doing an exec like this :
/(data\-i18n="[^"]+")/g.exec('<a random-attr="ok" data-i18n="first match" data-i18n="second match">my text</a>')
will raise the following issue :
There are two matches, but they are actually duplicate matches.
The result is :
[ 'data-i18n="first match"',
'data-i18n="first match"',
index: 20,
input: '<a random-attr="ok" data-i18n="first match" data-i18n="second match">my text</a>' ]
Any ideas on how to have multiple matches for my attribute ?
Thanks in advance !

The problem isn't with your regex; its with how you're expecting exec to behave. The return value of exec has the full match at position 0, and then the match of each capture group following that. Since you wrapped the whole regex in a capturing group, you're seeing the same string at positions 0 and 1 of the array.
The right way to use a global regex with exec is to keep calling exec until it returns null; it will return the next match each time. However, if you use String.match(Regexp), it will return what you expect - an array containing all of the matches.

Replace all besides the Regex group?

I was given a task to do which requires a long time to do.
The image say it all :
This is what I have : (x100 times):
And I need to extract this value only
How can I capture the value ?
I have made it with this regex :
DbCommand.*?\("(.*?)"\);
As you can see it does work :
And after the replace function (replace to $1) I do get the pure value :
but the problem is that I need only the pure values and not the rest of the unmatched group :
Question : In other words :
How can I get the purified result like :
Eservices_Claims_Get_Pending_Claims_List
Eservices_Claims_Get_Pending_Claims_Step1
Here is my code at Online regexer
Is there any way of replacing "all besides the matched group" ?
p.s. I know there are other ways of doing it but I prefer a regex solution ( which will also help me to understand regex better)

Unfortunately, JavaScript doesn't understand lookbehind. If it did, you could change your regular expression to match .*? preceded (lookbehind) by DbCommand.*?\(" and followed (lookahead) by "\);.
With that solution denied, i believe the cleanest solution is to perform two matches:
// you probably want to build the regexps dynamically
var regexG = /DbCommand.*?\("(.*?)"\);/g;
var regex = /DbCommand.*?\("(.*?)"\);/;
var matches = str.match(regexG).map(function(item){ return item.match(regex)[1] });
console.log(matches);
// ["Eservices_Claims_Get_Pending_Claims_List", "Eservices_Claims_Get_Pending_Claims_Step1"]
DEMO: http://jsbin.com/aqaBobOP/2/edit

You should be able to do a global replace of:
public static DataTable.*?{.*?DbCommand.*?\("(.*?)"\);.*?}
All I've done is changed it to match the whole block including the function definition using a bunch of .*?s.
Note: Make sure your regex settings are such that the dot (.) matches all characters, including newlines.
In fact if you want to close up all whitespace, you can slap a \s* on the front and replace with $1\n:
\s*public static DataTable.*?{.*?DbCommand.*?\("(.*?)"\);.*?}
Using your test case: http://regexr.com?37ibi

You can use this (without the ignore case and multiline option, with a global search):
pattern: (?:[^D]+|\BD|D(?!bCommand ))+|DbCommand [^"]+"([^"]+)
replace: $1\n

Try simply replacing the whole document replacing using this expression:
^(?: |\t)*(?:(?!DbCommand).)*$
You will then only be left with the lines that begin with the string DbCommand
You can then remove the spaces in between by replacing:
\r?\n\s* with \n globally.
Here is an example of this working: http://regexr.com?37ic4

Develop Reference

JavaScript is the programming language of the Web.

Regex - Extract digits from a url - javascript

I have this url: http://example.com/things/stuff/532453?morethings&stuff=things&ver=1 I need just that number in the middle there. Closest I got was (\d*)?\? but this includes the question mark. Basiclly all numbers that come before the ? all the way to the slash so the ouput is 532453.

Just use this: ([\d]+) You can check this link out: https://regex101.com/r/hR2eY7/1 if you use javascript: /([\d]+)/g

Try this : url = "http://example.com/things/stuff/532453?morethings&stuff=things" number = url.match(/(\d+)\?/g)[0].slice(0,-1) Though the approach is slightly naive, it works. It grabs numbers with ? at the end then removes the ? from the end using slice.

Related

javascript regex insert new element into expression

What RegEx would clean up this set of inputs?

How to find in javascript with regular expression string from url?

Regex multiple matches for HTML attributes

Replace all besides the Regex group?

Categories

Resources