Remove part of a string which matches a RegEx - javascript

I have a string made up of a page title and a page URL, separated by " - ". I would like to return a string which contains everything exept the parameters in the URL.
Input
This is a page title with a - and a ? - http://subdomain.example.com/subfolder/numbers134/?utm_medium=email&utm_source=a_source&utm_campaign=a_campaign_name
Desired Output
This is a page title with a - and a ? - http://subdomain.example.com/subfolder/numbers134/
What I have tried
I have tried .split() but because it is possible for the page title to have a "?" in it, this doesn't work.
x.replace(x.match(/http:[a-z/.0-9-]*(?.+)$/),"") where x is the string. This feels like it should have worked. I'm not sure if it is possible to use a capture group like this?
So far I have a regex to match the part I want removed: http:[a-z/.0-9-]*(?.+)$
I'm not sure how to turn that around and return the string minus that part.
Points
Page tile and URL separated by " - "
Page title may contain "-" or "?"
URL may contain parameters
The solution should work for any combination of 1. or 2. being present or not for any page title and URL.
The URL will always be http (Don't ask)

If you can control what the separator is then do that. make it less predictable and unique.
that way you can split by the new split character(s) and then pass the rest of string to the URL constructor
let SEPARATOR = '^^^';
let input = 'This is a page title with a - and a ? ^^^ http://subdomain.example.com/subfolder/numbers134/?utm_medium=email&utm_source=a_source&utm_campaign=a_campaign_name'
let url = new URL(input.substring(input.indexOf(SEPARATOR) + SEPARATOR.length));
// or
let url = new URL(input.split(SEPARATOR)[1])
console.log(url.origin + url.pathname)
// logs http://subdomain.example.com/subfolder/numbers134/
this way you also get for free the url validation logic

Could you not achieve it using lastIndexOf and substring like this?
var url = 'This is a page title with a - and a ? - http://subdomain.example.com/subfolder/numbers134/?utm_medium=email&utm_source=a_source&utm_campaign=a_campaign_name';
var output = url.substring(0, url.lastIndexOf('?'));
console.log(output);

Related

How to validate/block shorten URL in string

I need to block/validate shorten URL in String. Below string contains shorten URL how can I block/validate this in string .
Hi #first_name# This is Mondi from Novato Cleaners. May I ask for a favor ? Our google https://bit.ly requires reviews. Could you provide one ?Thank you
So for this you need to follow these steps:
1- extract all urls from string.
2- request each urls and get there original location. very well explained here:
How to get domain name from shortened URL with Javascript?
3- when you have originalUrl, just check if url != originalUrl then it is a shorten url.
Use regex to find whether there is a URL in your string or not, if they're just replacing it what you need on that space
/(https?://[^\s]+)/g
var string = "Hi Vignesh This is Mondi from Novato Cleaners. May I ask for a favor ? Our google https://bit.ly requires reviews. Could you provide one ?Thank you";
var protomatch = /(https?:\/\/[^\s]+)/g;
var b = string.replace(protomatch, '');
console.log(b)

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).
It needs to remove the trailing slash
It needs to be regex (No New URL)
It need to work with query parameters and anchor links
In other words all the following should return https://apple.com or https://www.apple.com for the last one.
https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo
These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar
The closer I got is with:
const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash
But this doesn't work with anchor links and queries which start right after the url without the / before
Any idea how I can get it to work on all cases?
You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.
For your example data, you could match:
^https?:\/\/[^#?\/]+
Regex demo
strings = [
"https://apple.com?query=true&slash=false",
"https://apple.com#anchor=true&slash=false",
"http://www.apple.com/#anchor=true&slash=true&whatever=foo",
"https://foo.bar/?q=true"
];
strings.forEach(s => {
console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})
You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.
Regex is a painful way to do something that the browser makes otherwise very simple.
I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.
let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"
let urlOne = new URL(one)
console.log(urlOne.origin)
let urlTwo = new URL(two)
console.log(urlTwo.origin)
let urlThree = new URL(three)
console.log(urlThree.origin)
const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');
This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.
^http.*?(?=\.com)
Or maybe you could do:
myUrl.Replace(/(#|\?|\/#).*$/, "")
To remove everything after the host name.

Removing a letters located between to specific string

I want to make sure that the URL I get from window.location does not already contain a specific fragment identifier already. If it does, I must remove it. So I must search the URL, and find the string that starts with mp- and continues until the end URL or the next # (Just in case the URL contains more than one fragment identifier).
Examples of inputs and outputs:
www.site.com/#mp-1 --> www.site.com/
www.site.com#mp-1 --> www.site.com
www.site.com/#mp-1#pic --> www.site.com/#pic
My code:
(that obviously does not work correctly)
var url = window.location;
if(url.toLowerCase().indexOf("#mp-") >= 0){
var imgString = url.substring(url.indexOf('#mp-') + 4,url.indexOf('#'));
console.log(imgString);
}
Any idea how to do it?
Something like this? This uses a regular expression to filter the unwanted string.
var inputs = [
"www.site.com/#mp-1",
"www.site.com#mp-1",
"www.site.com/#mp-1#pic"
];
inputs = inputs.map(function(input) {
return input.replace(/#mp-1?/, '');
});
console.log(inputs);
Output:
["www.site.com/", "www.site.com", "www.site.com/#pic"]
jsfiddle: https://jsfiddle.net/tghuye75/
The regex I used /#mp-1?/ removes any strings like #mp- or #mp-1. For a string of unknown length until the next hashtag, you can use /#mp-[^#]* which removes #mp-, #mp-1, and #mp-somelongstring.
Use regular expressions:
var url = window.location;
var imgString = url.replace(/(#mp-[^#\s]+)/, "");
It removes from URL hash anything from mp- to the char before #.
Regex101 demo
You can use .replace to replace a regular expression matching ("#mp-" followed by 0 or more non-# characters) with the empty string. If it's possible there are multiple segments you want to remove, just add a g flag to the regex.
url = url.replace(/#mp-[^#]*/, '');
The window.location has the hash property so... window.location.hash
The most primitive way is to declare
var char_start, char_end
and find two "#" or one and the 2nd will be end of input.
with that... you can do what you want, the change of window.location.hash will normally affect the browser adress.
Good luck!

How to pull a unknown URL out of a String

I'm writing a Node/Express app and I have a text string in a JSON object that I need to pull a URL out of. The URL is different every time, and the string itself has two very similar URL's, and I only want to pull out one.
The only thing I do know is that in the string, the url will always be preceded with the same text.
String:
The following new or updated things match your search criteria.
Link I Need
<http://randomurl.com/Junk/Yay/ThisView.aspx?r=164241242186&s=J
WD&t=JWD>
Link I don't Need
<http://randomurl.com/Junk/Yay/ThisView.aspx?r=164241242186&s=J
WD&t=JWD&m=true>
Search was last updated on April 12th, 2013 # 14:43
If you wish to unsubscribe from this update...
Out of this string all I need to pull out is the URL under Link I Need, http://randomurl.com/Junk/Yay/ThisView.aspx?r=164241242186&s=J
WD&t=JWD and nothing else. I'm not quite sure how to go about this, any help would be greatly appreciated!
Something like this should work:
var s = "The following new or updated ...";
var regex = /Link I Need\s*<([^>]*)>/;
var match = s.match(regex);
var theUrl = match && match[1];
This assumes that the URL is not split across newlines. If it is, then after you find the match, you need to to
theUrl = theUrl.replace(/\s+/, '')

How to use href.replace in extjs

how to use href.replace in extjs
This is my sample:
'iconCls': 'icon_' + href.replace(/[^.]+\./, '')
href= http://localhost:1649/SFM/Default.aspx#/SFM/config/release_history.png
Now i want to get text "release_history.png", How i get it.
Thanks
If you just want the filename, it's probably easier to do:
var href = "http://localhost:1649/SFM/Default.aspx#/SFM/config/release_history.png";
var iconCls = 'icon_' + href.split('/').pop();
Update
To get the filename without the extension, you can do something similar:
var filename = "release_history.png";
var without_ext = filename.split('.');
// Get rid of the extension
without_ext.pop()
// Join the filename back together, in case
// there were any other periods in the filename
// and to get a string
without_ext = without_ext.join('.')
some regex solutions (regex including / delimiter)
as in your example code match the start of the url that can be dropped
href.replace(/^.*\//, '')
or use a regex to get the last part of the url that you want to keep
/(?<=\/)[^.\/]+\.[^.]+$/
update
or get the icon name without .png (this is using lookbehind and lookahead feature of regex)
(?<=\/)[^.\/]+(?=\.png)
Not all flavors of regex support all lookaround reatures and I think Javascript only supports lookahead. so probably your solution is this:
[^.\/]+(?=\.png)
code examples here:
http://www.myregextester.com/?r=6acb5d23
http://www.myregextester.com/?r=b0a88a0a

Categories

Resources