JS RegEx to remove part of a URL? - javascript

I am using the GoogleBooks API to search for particular titles by name and retrieve a cover image URL. For example, searching for "The Great Gatsby" will return the following image link:
http://books.google.com/books/content?id=HestSXO362YC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api
If you look at the following image, you can see that there is a small fold on the bottom right corner. Some image URLs will have the fold and others won't. If you remove edge=curl from the URL link, the fold is removed.
Is there any way to use a regex to find and delete the curled portion?
Further, is there any way to use regex to change the img=1 value to img=2?

you can use the .replace() method
let URL = "some random URL you have"
console.log(URL.replace('&edge=curl',''))
Will replace every "&edge=curl" that it finds in this string and replace it with '' an empty string which is basically removing it.
You can also use the same method .replace() to replace any static URL variables like "img=1"
console.log(URL.replace('img=1','img=2'))

Don't use regex to parse URLs. Use URL object:
var u = new URL("http://books.google.com/books/content?id=HestSXO362YC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api");
u.searchParams.delete("edge");
u.searchParams.set("img", "2");
console.log(u.href);

To obtain an updated url where the &edge=curl pattern is replaced and the &img= and &zoom= parameters are updated, you could achieve this by chaining multiple .replace() calls as shown below:
const url = "http://books.google.com/books/content?id=HestSXO362YC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api"
// New values for img and zoom parameters
const img = 300;
const zoom = 22;
console.log(
url
.replace(/&img=(\w+)/,`&img=${img}`)
.replace(/&zoom=\w+/,`&zoom=${zoom}`)
.replace(/&edge=curl/,"")
)
Here the &img= and &zoom= parameters are updated with regular expressions &img=\w+ and &zoom=\w+, where \w+ will match one or more alpha numeric characters that appear after the parameter.
The advantage with this approach (over explicitly specifying img=1 and replacing it with img=2 ) is that you can update those parameter/value substrings of the input url without having to know the actual value of those parameters prior to replacement (ie that img has a value 1).
Note that this approach assumes the parameters being updated are prefixed with & (and not ?).
Hope that helps!

try this
let url = 'http://books.google.com/books/content?id=HestSXO362YC&printsec=frontcover&img=1&zoom=1&edge=curl&source=gbs_api';
// remove &edge=curl
url = url.replace('&edge=curl', '');
// replace img=1 with img=2
url = url.replace('img=1', 'img=2');

Related

Get base url from string with Regex and Javascript

I'm trying to get the base url from a string (So no window.location).
It needs to remove the trailing slash
It needs to be regex (No New URL)
It need to work with query parameters and anchor links
In other words all the following should return https://apple.com or https://www.apple.com for the last one.
https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo
These are just examples, urls can have different subdomains like https://shop.apple.co.uk/?query=foo should return https://shop.apple.co.uk - It could be any url like: https://foo.bar
The closer I got is with:
const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash
But this doesn't work with anchor links and queries which start right after the url without the / before
Any idea how I can get it to work on all cases?
You could add # and ? to your negated character class. You don't need .* because that will match until the end of the string.
For your example data, you could match:
^https?:\/\/[^#?\/]+
Regex demo
strings = [
"https://apple.com?query=true&slash=false",
"https://apple.com#anchor=true&slash=false",
"http://www.apple.com/#anchor=true&slash=true&whatever=foo",
"https://foo.bar/?q=true"
];
strings.forEach(s => {
console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})
You could use Web API's built-in URL for this. URL will also provide you with other parsed properties that are easy to get to, like the query string params, the protocol, etc.
Regex is a painful way to do something that the browser makes otherwise very simple.
I know that you asked about using regex, but in the event that you (or someone coming here in the future) really just cares about getting the information out and isn't committed to using regex, maybe this answer will help.
let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"
let urlOne = new URL(one)
console.log(urlOne.origin)
let urlTwo = new URL(two)
console.log(urlTwo.origin)
let urlThree = new URL(three)
console.log(urlThree.origin)
const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');
This will get you everything up to the .com part. You will have to append .com once you pull out the first part of the url.
^http.*?(?=\.com)
Or maybe you could do:
myUrl.Replace(/(#|\?|\/#).*$/, "")
To remove everything after the host name.

Removing a letters located between to specific string

I want to make sure that the URL I get from window.location does not already contain a specific fragment identifier already. If it does, I must remove it. So I must search the URL, and find the string that starts with mp- and continues until the end URL or the next # (Just in case the URL contains more than one fragment identifier).
Examples of inputs and outputs:
www.site.com/#mp-1 --> www.site.com/
www.site.com#mp-1 --> www.site.com
www.site.com/#mp-1#pic --> www.site.com/#pic
My code:
(that obviously does not work correctly)
var url = window.location;
if(url.toLowerCase().indexOf("#mp-") >= 0){
var imgString = url.substring(url.indexOf('#mp-') + 4,url.indexOf('#'));
console.log(imgString);
}
Any idea how to do it?
Something like this? This uses a regular expression to filter the unwanted string.
var inputs = [
"www.site.com/#mp-1",
"www.site.com#mp-1",
"www.site.com/#mp-1#pic"
];
inputs = inputs.map(function(input) {
return input.replace(/#mp-1?/, '');
});
console.log(inputs);
Output:
["www.site.com/", "www.site.com", "www.site.com/#pic"]
jsfiddle: https://jsfiddle.net/tghuye75/
The regex I used /#mp-1?/ removes any strings like #mp- or #mp-1. For a string of unknown length until the next hashtag, you can use /#mp-[^#]* which removes #mp-, #mp-1, and #mp-somelongstring.
Use regular expressions:
var url = window.location;
var imgString = url.replace(/(#mp-[^#\s]+)/, "");
It removes from URL hash anything from mp- to the char before #.
Regex101 demo
You can use .replace to replace a regular expression matching ("#mp-" followed by 0 or more non-# characters) with the empty string. If it's possible there are multiple segments you want to remove, just add a g flag to the regex.
url = url.replace(/#mp-[^#]*/, '');
The window.location has the hash property so... window.location.hash
The most primitive way is to declare
var char_start, char_end
and find two "#" or one and the 2nd will be end of input.
with that... you can do what you want, the change of window.location.hash will normally affect the browser adress.
Good luck!

Extract characters in URL after certain character up to certain character

I'm trying to extract certain piece of a URL using regex (JavaScript) and having trouble excluding characters after a certain piece. Here's what I have so far:
URL: http://www.somesite.com/state-de
Using url.match(/\/[^\/]+$/)[0] I can extract the state-de like I want.
However when the URL becomes http://www.somesite.com/state-de?page=r and I do the same regex it pulls everything including the "?page=r" which I don't want. I want to only extract the state-de regardless of whats after it (looks like usually a "?" follows it)
This might work:
var arr = url.split("/")
arr[arr.length - 1].split("?")[0]
I'd recommend reading up on regular expressions in general. What you want to do here is make the regular expression stop when it hits the ? in the URL.
Using capturing groups to select which part of the match that you want might also be useful here.
Example:
url.match(/(\/[^\/?]+)(?:\?.*)?$/)[1]
I avoid overly complex RegExs when possible, so I tend to do this in multiple steps (with .replace()):
var stripped = url.replace(/[?#].*/, ''); // Strips anything after ? or #
You can now do the simpler transform to get the state, e.g.:
var state = stripped.split('/').pop()
If you want do it by regex try this one:
url.match(/https?:\/\/([a-z0-9-]+\.)+[a-z]+\/([a-z0-9_-])\/?(\?.*)?/)[1]
Or you could do it using JQuery:
var url = 'http://www.somesite.com/state-de?page=r#mark4';
// Create a special anchor element, set the URL to it
var a = $('<a>', { href:url } )[1];
console.log(a.hostname);
console.log(a.pathname);
console.log(a.search);
console.log(a.hash);

RegExp - If first part of search string is found then replace with the full search string value

Is there a RegExp to find and replace a value based on the criteria, "if first part of search string is in the target string then replace the part that matches with the search string."
This is a special search and replace because the replacement is also used as the search string.
For example, I have this URL:
http://www.domain.com/path/something/more/something/
Search for any part of the following and replace with the whole:
/path/user/
Since, "/path/" is in both the replacement string and the target string the results would be:
http://www.domain.com/path/user/something/more/something/
NOTE: The search / replacement value can be anything.
I don't know what the replacement and search string is at the time I make a replacement so I can't use something that hard codes the search string. For example, this won't work because the term is hard coded:
s.replace(/(\/path\/)/, "$1value/");
Another example:
Here is the sentence, "Thank you Susan for your order."
Here is the search and replacement, "Susan Summers"
Here is the desired sentence, "Thank you Susan Summers for your order."
Use Case:
Lets say you are given 1 million text documents that are letters to customers but when they created the documents they used the customers first name only when they were supposed to use the full name. Now it's your job to find and replace every occurrence of their first name with their full name. You only have their full name to work with not first name.
Just realized this may not work as a RegEx and might require code.
You can use:
s = 'http://www.domain.com/path/something/more/something/';
r = s.replace(/(\/path\/)/, "$user/");
//=> "http://www.domain.com/path/user/something/more/something/"
You don't need to use regular expression for this case:
var url = 'http://www.domain.com/path/something/more/something/';
url.replace('/path/', '/path/user/');
// => "http://www.domain.com/path/user/something/more/something/"
I'm not quite sure if I understand the problem correctly. The following replaces any part of of /path/user/ (-> part 1: 'path', part 2: 'user') with the whole /path/user:
var url1 = "http://www.domain.com/path/something/more/something/";
var url2 = "http://www.domain.com/user/something/more/something/";
url1.replace(/\/path\/|\/user\//, '/path/user/');
url2.replace(/\/path\/|\/user\//, '/path/user/');
results in:
http://www.domain.com/path/user/something/more/something/
http://www.domain.com/path/user/something/more/something/
I hope this is what you need, otherwise, please add another example.
EDIT:
Here is the regex in action: http://regex101.com/r/jL6tK6
split + join alternative :
url = url.split('/path/').join('/path/user/');
Although your requirements are not clear, here is a guess that raises a few extra questions :
var sub = '/path/user/';
var parts = sub.match(/[^\/]+/g);
url = url.replace(new RegExp(
'\\/(' + [parts.join('\\/')].concat(parts).join('|') + ')\\/'
), sub);
The resulting regular expression is as follows :
/\/(path\/user|path|user)\// // "/path/user/" OR "/path/" OR "/user/"
Let's check some urls assuming we live in the best of worlds :
'http://domain/' -> 'http://domain/'
'http://path/user/' -> 'http://path/user/'
'http://path/' -> 'http://path/user/'
'http://user/' -> 'http://path/user/'
Now, what do you think about the following ones?
'http://path/user' -> 'http://path/user/user'
'http://user/path/' -> 'http://path/user/path/'
'http://path/user/path/' -> 'http://path/user/path/'
The remaining questions are :
Is this what you are looking for?
What to do when there is no trailing slash?
What to do in the reverse order case?
What to do with recurrent parts?

Regex to get a specific query string variable in a URL

I have a URL like
server/area/controller/action/4/?param=2"
in which the server can be
http://localhost/abc
https://test.abc.com
https://abc.om
I want to get the first character after "action/" which is 4 in the above URL, with a regex. Is it possible with regex in js, or is there any way?
Use regex \d+(?=\/\?)
var url = "server/area/controller/action/4/?param=2";
var param = url.match(/\d+(?=\/\?)/);
Test code here.
Using this regex in JavaScript:
action/(.)
Allows you to access the first matching group, which will contain the first character after action/ -- see the examples at JSFiddle
This way splits the URL on the / characters and extracts the last but one element
var url = "server/area/controller/action/4/?param=2".split ('/').slice (-2,-1)[0];

Categories

Resources