Grabbing URL sans the last segment/file with a regular expression

Grabbing URL sans the last segment/file with a regular expression - javascript

Looking for a regex so that I may grab everything BUT the last segment + extension.
So for example
http://stackoverflow.com/stuff/code/apple.jpg
I need
http://stackoverflow.com/stuff/code/
I'm able to grab the last segment, but with a myriad of possible directories this images could be under, I'm unsure how to get everything sans the last segment.

Code
See regex in use here
https?:\/{2}.*\/
You may even be able to simply use .*\/ (if you don't need to ensure it starts with http or https). If that's the case, you may as well just split on the last occurrence of / - see second snippet below (it's hidden, so expand it).
var s = "http://stackoverflow.com/stuff/code/apple.jpg"
var r = /https?:\/{2}.*\//
console.log(r.exec(s))
Substring on last occurrence method:
var s = "http://stackoverflow.com/stuff/code/apple.jpg"
console.log(s.substr(0, s.lastIndexOf('/')) + '/')
Explanation
https? Match http or https (s is made optional by ?)
:\/{2} Match the colon character : literally, followed by two forward slash characters / literally
.* Match any character any number of times
\/ Match the forward slash character / literally

If you're able to manipulate the string, you can use this to remove the end piece:
var str = "http://stackoverflow.com/stuff/code/apple.jpg"
str = str.replace(/\/[^/]+$/, '/')
Regex explanation:
\/ match a slash
[^/]+$ match a string that has no / characters in it and continues until the end of the string
This will replace /apple.jpg with /

EDIT: i just noticed that you need the trailing / char and have updated my answer to meet that requirement. See # comment line in code
Here's a unix shell solution
var="http://stackoverflow.com/stuff/code/apple.jpg"
echo "${var%/*}/"
#--------------^-- we've deleted everything from that last / to end
#--------------+ just append a replacment char /
output
http://stackoverflow.com/stuff/code/
% is a parameter expansion feature that removes the smallest suffix pattern.
See POSIX definitions for Parameter Expansion for detailed info on this and similar features. (Go to the bottom of the page).
IHTH

Related

Is it possible to replace only in a match group - REGEX

I have always tried to avoid regex because I simply can't get my head around how it really works. Most of the time I manage to get the expected result by luck more than actual skill.
However, I am trying to replace any whitespace character in a bundled webpack source with the string-replace-loader or the String-Replace-Plugin (which ever turns out easier). But before I try to do this on the actual source, I want to understand the regex which I am trying to perform.
The problem
I have query strings which always start with dqlParse followed by \n then maybe some \t and other whitespace characters. I have already managed to get my whitespace characters removed in a test string if I match this
/\s+\s/g
and simply replace it with " ".
Since I don't have control over all the strings within my bundle, I thought I can indicate which string is set for replacement by adding dqlParse infront of the string and then match and replace by groups. Unfortunately no luck so far.
What I have tried
So far I have tried something like this
/(^dqlParse)(.*)/g
which basically does what it should since match group $1 is dqlParse and match group $2 is the rest of the string where I would like to do the replacement.
Is it possible to replace only in the second match group?
Thanks! Any help appreciated!

Yes, you can do that with String#replace:
text = text.replace(/^(dqlParse)(.*)/g, function(_, x, y) {return x + y.replace(/\s{2,}/g, ' ');})
This will match and capture dqlParse into Group 1 (x variable in the callback function), and the rest of the line will get captured into Group 2 (y in the callback function). So, once the match is found, the replacement will be the concatenation of x and y with all two or more whitespace chunks replaced with a single space.

RegEx match end character only when other character is not present

I'm having quite some trouble to define a regEx that I'm needing....
Basically the idea is to detect all lines that end with a , or a ; character. For this I have defined the following regex:
(,|;)$
Which works fine for this, but then I have the exception that if there's a * character within that line (not necessarily starting with, but at some position), then I don't want to detect that match. Based on this sample:
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
other,
I would intend to find 2 groups, the first one
/**
* Here there's a comment I don't want to find,
* but after this comment I do
*/
detectMe;
And the second one
other,
I've tried many things such as non capturing groups, negative looks ahead and also start of a string with [^\s*\*] with no success. Is there a way to do this?
Some of the regEx I've tried...
^[^\*](.*?)(,|;)$
^[^\s*\*](.*?)(,|;)$

To match an optional C comment and the following line ending with ; or , you may use
/(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)?.*[;,]$/gm
See this regex demo
Details
(?:\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/\r?\n)? - an optional (as there is a ? quantifier after the group) non-capturing group matching 1 or 0 occurrences of
\/\*+[^*]*\*+(?:[^\/*][^*]*\*+)*\/ - a C comment pattern
\r?\n - a CRLF or LF ending
.*[;,]$ - a whole line that ends with ; or , ($ is the end of a line anchor here due to m modifier).

You can use this regex:
/^[^*]*?[,;]$/gm
It will start by mathing any number of characters not being '*', then match ',' or ';' at the end of the line. It uses the global and multiline flags to match all lines.

How can I match the last part of an email via JavaScript? [duplicate]

Using a regular expression (replaceregexp in Ant) how can I match (and then replace) everything from the start of a line, up to and including the last occurrence of a slash?
What I need is to start with any of these:
../../replace_this/keep_this
../replace_this/replace_this/Keep_this
/../../replace_this/replace_this/Keep_this
and turn them into this:
what_I_addedKeep_this
It seems like it should be simple but I'm not getting it. I've made regular expressions that will identify the last slash and match from there to the end of the line, but what I need is one that will match everything from the start of a line until the last slash, so I can replace it all.
This is for an Ant build file that's reading a bunch of .txt files and transforming any links it finds in them. I just want to use replaceregexp, not variables or properties. If possible.

You can match this:
.*\/
and replace with your text.
DEMO

What you want to do is match greedily, the longest possible match of the pattern, it is default usually, but match till the last instance of '/'.
That would be something like this:
.*\/
Explanation:
. any character
* any and all characters after that (greedy)
\/ the slash escaped, this will stop at the **last** instance of '/'
You can see it in action here: http://regex101.com/r/pI4lR5

Option 1
Search: ^.*/
Replace: Empty string
Because the * quantifier is greedy, ^.*/ will match from the start of the line to the very last slash. So you can directly replace that with an empty string, and you are left with your desired text.
Option 2
Search: ^.*/(.*)
Replace: Group 1 (typically, the syntax would be $1 or \1, not sure about Ant)
Again, ^.*/ matches to the last slash. You then capture the end of the line to Group 1 with (.*), and replace the whole match with Group 1.
In my view, there's no reason to choose this option, but it's good to understand it.

Add HTML tags to this regex string

I'm using a tiny little JS plugin to truncate multiple lines of text on a site I'm working on.
The only problem is that the script is counting HTML tags for example in the character count which is throwing things off a little.
This is how the script currently excludes characters;
regex = /[!-\/:-#\[-`{-~]$/
Which basically just strips out certain punctuation characters.
I've tried changing it to this;
regex = [!-\/:-#\[-`{-~]$<[^>]*>
But, not being too familiar with regex, it didn't seem to work.
If someone could nudge me in the right direction that would be great.

In your initial regex you're looking for single characters that matches the tail of the string - either it be a character, word, line. Note the dollar sign '$'.
regex = /[!-\/:-#\[-`{-~]$/
Now you want to match anything between < and >.
regex = /[!-\/:-#\[-`{-~]$|<[^>]*$/
Note that you'll match: <, <aaaa, <aaaa< until the end of the string that you are matching against.
greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*/
non_greedy_regex = /[!-\/:-#\[-`{-~]$|<[^>]*?/
If you remove the second '$' - greedy_regex - it will do a greedy match, matching <b>c</b> of a<b>c</b>d. Using the ? as in non_greedy_regex it will match the '` only.

Is this regex the most efficient way of parsing my string?

First off, here are the parameters to follow in the string I allow the user to input:
If there is a slash, it has to appear at the start of the string, nowhere else, is limited to 1, is optional and must be succeeded by [a-zA-Z].
If there is a tilde, it has to appear after a space " ", nothing else, is optional and must be succeeded by [a-zA-Z]. Also, this expression is limited to 2. (ie: ~exa ~mple is passed but ~exa ~mp ~le is not passed)
The slash followed by a word is an instruction, like /get or /post.
The tilde followed by a word is a parameter like ~now or ~later.
String format:
[instruction] (optional) [query] [extra parameters] (optional)
[instruction] - Must contain / succeeded with [a-zA-Z] only
[query] - Can contain [\w\s()'-] (alphanumeric, whitespace, parentheses, apostrophe, dash)
[extra parameters] - ~ preceded by whitespace, succeeded with only [a-zA-Z]
String examples that should work:
/get D0cUm3nt ex4Mpl3' ~now
D0cUm3nt ex4Mpl3'
/post T(h)(i5 s(h)ou__ld w0rk t0-0'
String examples that shouldn't work:
//get document~now
~later
example ~now~later
Before I pass the string through the regex I trim any whitespace at the start and end of the string (before any text is seen) but I don't trim double whitespaces within the string as some queries require them.
Here is the regex I used:
^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
To break it down slightly:
[instruction check] - (/{0,1}[a-zA-Z])?
[query check] - [\w\s()'-]*
[parameter check] - ((\s~[a-zA-Z]*){0,2})?
This is the first time I've actually done any serious regex away from a tutorial so I'm wondering is there anything I can change within my regex to make it more compact/efficient?
All fresh perspectives are appreciated!
Thanks.

From your regex: ^(/{0,1}[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$,
you can change {0,1} to ? that is a shortcut to say 0 or 1 times:
^(/?[a-zA-Z])?[\w\s()'-]*((\s~[a-zA-Z]*){0,2})?$
The last part is present 0,1 or 2 times, then the ? is superfluous:
^(/?[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
The first part may be simplified too, the ? just after the / is superfluous:
^(/[a-zA-Z])?[\w\s()'-]*(\s~[a-zA-Z]*){0,2}$
If you don't use the captured groups, you can change them to non-capture group: (?: ) that are more efficient
^(?:/[a-zA-Z])?[\w\s()'-]*(?:\s~[a-zA-Z]*){0,2}$
You can also use the case-insensitive modifier (?i):
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]*){0,2}$
Finally, as said in OP, ~ must be followed by [a-zA-Z], so change the last * by +:
^(?i)(?:/[a-z])?[\w\s()'-]*(?:\s~[a-z]+){0,2}$

This looks slightly better:
^(?:/?[a-zA-Z]*\s)?[\w\s()'-]*(?:\s~[a-zA-Z]*)*$
https://codereview.stackexchange.com/ is more the place for this kind of thing

Assuming that capture groups are useful to you:
^((?:\/|\s~)[a-z]+)?([\w\s()'-]+)(~[a-z]+)?$
Regex101 Demo

Maybe this is what you're looking for:
var regex = /^((\/)?[a-zA-Z]+)?[\w\s()'-]*((\s~)?[a-zA-Z]+){0,2}$/;

Develop Reference

JavaScript is the programming language of the Web.

Grabbing URL sans the last segment/file with a regular expression - javascript

Related

Is it possible to replace only in a match group - REGEX

RegEx match end character only when other character is not present

How can I match the last part of an email via JavaScript? [duplicate]

Add HTML tags to this regex string

Is this regex the most efficient way of parsing my string?

Categories

Resources