Why can't update url protocol when old protocol include '+' in nodejs - javascript

Why can't update url protocol when old protocol include '+'
here is my demo test code
let u = new URL( 'git+https://url-fake-hostname/zh-TW/scripts')
console.log(u)
u.protocol = 'http:';
console.assert(u.protocol !== 'git+https:', u.protocol)

URL is a special object in Node.js, since Node.js want to make it browser-compatible.
There have two kind of method to build URL object
WHATWG URL API new URL(url) - used by web browsers
Legacy API require('url').parse(url) - Node.js specific
As document mentions:
The WHATWG URL Standard considers a handful of URL protocol schemes to be special in terms of how they are parsed and serialized. When a URL is parsed using one of these special protocols, the url.protocol property may be changed to another special protocol but cannot be changed to a non-special protocol, and vice versa.
Here is some example of same case that you had met:
const u = new URL('http://example.org');
u.protocol = 'https';
console.log(u.href);
// https://example.org
const u = new URL('http://example.org');
u.protocol = 'fish';
console.log(u.href);
// http://example.org
You can solve this problem by calling Legacy API:
const url = require('url');
let u = url.parse( 'git+https://url-fake-hostname/zh-TW/scripts')
u.protocol = 'http:';
console.log(u.protocol);// protocol: 'http:'

Related

How to split domain with http or https in nodejs

Anyone can help to split the domain name with http or https from url string,
URL : https://www.test.com/abc/?a=1&b=1
Expected Output : https://www.test.com
Thanks in advance.
I strongly recommend you avoid using a home-grown regexp. Instead, use the node URL class:
https://nodejs.org/api/url.html
Not exactly sure which parts you want to keep or not (do you want to include the port? Do you want to decode IDNs?), but origin may be the way to go. Here’s the example straight out from the docs:
const { URL } = require('url');
const myURL = new URL('https://example.org/foo/bar?baz');
console.log(myURL.origin);
// Prints https://example.org
Otherwise, you could use the protocol and host or hostname components.
You can use the url-parse package also for get the origin from URL,
Refer : https://www.npmjs.com/package/url-parse
var URL = require('url-parse');
const url_obj = new URL('https://test.com/abc/?a=1');
console.log(url_obj.origin); // https://test.com

How to check if url scheme is present in a url string javascript

I am trying to solve an issue where I need to know if there is a URL scheme (not limited to http, https) prepended to my url string.
I could do link.indexOf(://); and then take the substring of anything before the "://", but if I have a case for eg:
example.com?url=http://www.eg.com
in this case, the substring will return me the whole string i.e.
example.com?url=http which is incorrect. It should return me "", since my url does not have a protocol prepended.
I need to find out whether the url is prepended with a protocol or not.
You can do it quite easily with a little bit of regex. The pattern /^[a-z0-9]+:\/\// will be able to extract it.
If you just want to test if it has it, use pattern.test() to get a boolean:
/^[a-z0-9]+:\/\//.test(url); // true
If you want what it is, use url.match() and wrap the protocol portion in parentheses:
url.match(/^([a-z0-9]+):\/\//)[1] // https
Here is a runnable example with a few example URLs.
const urls = ['file://test.com', 'http://test.com', 'https://test.com', 'example.com?http'];
console.log(
urls.map(url => (url.match(/^([a-z0-9]+):\/\//) || [])[1])
);
You could use the URL API which is supported in most browsers.
function getProtocol(str) {
try {
var u = new URL(str);
return u.protocol.slice(0, -1);
} catch (e) {
return '';
}
}
Usage
getProtocol('example.com?url=http://www.eg.com'); // returns ""
getProtocol('https://example.com?url=http://www.eg.com'); // returns "https"

JS - baseURLString when sending pathnames only

I am trying to create a new URL in JS so it can be manipulated for an async request. As nothing is cross-origin (I think this is the correct usage of that term), the URLs I send for async request look like /MyLoginUrl or /MyUpdateDataUrl, etc. (i.e. I am only sending the pathname).
My attempt to create a new URL from an existing url looked basically like this:
// Actually I set the url as an arguement in a function,
// but for demonstration it will be a variable
var url = '/myPathname';
// Much later...
url = new URL (url);
However, this was returning a syntax error. Once I looked a docs, I found out why.
Per the docs, the syntax for a new URL looks like this:
url = new URL(urlString, [baseURLstring])
url = new URL(urlString, baseURLobject)
The docs also say:
baseURLstring: is a DOMString representing the base URL to use in case urlString is a relative URL. If not specified, and no baseURLobject is passed in parameters, it default to 'about:blank'. If it is an invalid absolute URL, the constructor will raise a DOMException of type SYNTAX_ERROR
A couple of examples in the docs for a baseURLstring is:
var a = new URL("/", "https://developer.mozilla.org"); // Creates a URL pointing to 'https://developer.mozilla.org/'
var b = new URL("https://developer.mozilla.org"); // Creates a URL pointing to 'https://developer.mozilla.org/'
var c = new URL('en-US/docs', b); // Creates a URL pointing to 'https://developer.mozilla.org/en-US/docs'
Thus, I am trying to figure out how to emulate a baseURLstring for, currently, localhost and eventually when this gets hosted by the main server I will use for my network, the baseURLstring for that. I'm guessing it would involve in some way getting the IP address of the computer I have/of the server on the network, or maybe not...
you can test this
var base_url = location.protocol + '//' + location.host + '/';
baseURLstring will the url of your website, lets take the example of Google:
base url of google is https://www.google.com similarly your baseurlstring will be something like this https://www.yourwebsiteaddress.com and the first parameter in url = new URL(urlString, [baseURLstring]) is the path of the files placed on your server (root folder, where your default index file is placed)

Ensure URL is relative before navigating via JavaScript's location.replace()

I have a login page https://example.com/login#destination where destination is the target URL the user was trying to navigate to when they were required to log in.
(i.e. https://example.com/destination)
The JavaScript I was thinking about using was
function onSuccessfulLogin() {
location.replace(location.hash.substring(1) || 'default')
}
This would result in an XSS vulnerability, by an attacker providing the link
https://example.com/login#javascript:..
Also I need to prevent navigation to a lookalike site after login.
https://example.com/login#https://looks-like-example.com
or https://example.com/login#//looks-like-example.com
How can I adjust onSuccessfulLogin to ensure the URL provided in the hash # portion is a relative URL, and not starting with javascript:, https:, // or any other absolute navigation scheme?
One thought is to evaluate the URL, and see if location.origin remains unchanged before navigating. Can you suggest how to do this, or a better approach?
From OWASP recommendations on Preventing Unvalidated Redirects and Forwards:
It is recommended that any such destination input be mapped to a value, rather than the actual URL or portion of the URL, and that server side code translate this value to the target URL.
So a safe approach would be mapping some keys to actual URLs:
// https://example.com/login#destination
var keyToUrl = {
destination: 'https://example.com/destination',
defaults: 'https://example.com/default'
};
function onSuccessfulLogin() {
var hash = location.hash.substring(1);
var url = keyToUrl[hash] || keyToUrl.defaults;
location.replace(url);
}
You could also consider providing only path part of the URL and appending it with a hostname in the code:
// https://example.com/login#destination
function onSuccessfulLogin() {
var path = location.hash.substring(1);
var url = 'https://example.com/' + path;
location.replace(url);
}
I would stick to the mapping though.
That is a very good point about the XSS vulnerability.
I believe all protocols only use English alphabetic characters, so a regex like /^[a-z]+:/i would check for those. Alternately if we're feeling more inclusive, /^[^:\/?]+:/ allows anything but a / or ? followed by a :. Then we can combine that with /^\/\/ to test for a protocol-free URL, which gives us:
// Either
var rexIsProtocol = /(?:^[a-z]+:)|(?:^\/\/)/i;
// Or
var rexIsProtocol = /(?:^[^:\/?]+:)|(?:^\/\/)/i;
Then the test is like this:
var url = location.hash.substring(1).trim(); // trim to deal with whitespace
if (rexIsProtocol.test(url)) {
// It starts with a protocol
} else {
// It doesn't
}
That said, the only one I think you need to be particularly bothered by is the javascript: pseudo-protcol, so you might just test for that.

Perfect URL Checking Regular Expression for MOST URL's

I am working on a project where I need to validate my URL's and stumbled upon the following RegEx pattern;
/(((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+(aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mn|mo|mp|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|nom|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ra|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw|arpa)(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%#\.\/_-]+))?(\?[0-9a-zA-Z\+\%#\/&\[\];=_-]+)?)?))\b/imuS$/ # https://mathiasbynens.be/demo/url-regex
Which allows me to check URL's that always had a protocol before it (http, https or ftp). I would like to also allow the user to leave out the protocol and it still be valid. How do I do this?
Are there any other RegEx patterns that are better/more accurate that I can use to validate my URL's? Thanks for all answers!
I'm currently working a module that validates inputs. One of the validations required me to parse domains ( hostnames ) per:
RFC 952
RFC 1123
Trailing dots in domain names
To validate a domain I took a few steps, one of them was to use the
browser parsing logic by using this cool trick:
function parseURI( str ) {
var a = document.createElement( "a" );
// If the string doesn't contain a protocol, the browser
// will default to the current document location.
a.href = /^(https?:\/\/)/i.test( str ) === false ? ( "http://" + str ) : str;
// Since I can't overwrite a[property] - return an object I control ( Muahahah ).
return {
hash: a.hash,
hostname: a.hostname,
href: a.href,
origin: a.origin,
pathname: a.pathname,
port: a.port,
protocol: a.protocol,
search: a.search,
// When parsing the URL by the browser fails, the browser will
// set the hostname based on the current document.location value.
valid: a.hostname !== document.location
}
}
If validating a hostname | domain is what you are after, I can share my insights on the topic as well.
I suggest you to use regex powers in your regex for extension part like this:
(aero|asia|arpa|a[c-gil-oq-uwxz]|biz|b[abd-jmnorstv-z]|cat|com|coop|c[acdf-ik-oruvxyz]|
d[ejkmoz]|edu|e[cegr-u]|f[ijkmor]|gov|g[abd-ilmnp-uwy]|h[kmnrtu]|info|int|i[del-oq-t]|
jobs|j[emop]|k[eghimnprwyz]|l[abcikr-vy]|mil|mobi|museum|m[acdeghklnopr-z]|
name|net|nom|n[acefgilopruz]|org|pro|p[ae-hk-nrstwy]|qa|r[easuw]|s[a-eg-ortuvyz]|
tel|travel|t[cdfghj-prtvwz]|u[agksyz]|v[aceginu]|w[fs]|y[etu]|z[amw])
I modified it to the following so that the user can leave out the protocol;
/(((http|ftp|https):\/{2})?(([0-9a-z_-]+\.)+(aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cx|cy|cz|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mn|mn|mo|mp|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|nom|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ra|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw|arpa)(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%#\.\/_-]+))?(\?[0-9a-zA-Z\+\%#\/&\[\];=_-]+)?)?))\b$/
by making the first part (the protocol) optional using the ? operator.
http://www.regexr.com/ is a great tool to use for testing RegEx patterns and learning about how they work.

Categories

Resources