What sites are using my GitHub hosted script?

What sites are using my GitHub hosted script? - javascript

Suppose I have a JavaScript script named foo.js in a GitHub repo. I need to know what sites (domains) are using this script. Thus, for instance, if a website www.example.com is referencing my script...
<html>
<head>
<script src="https://myGitHubRepo/foo.js"></script>
</head>
etc...
</html>
I'd like to get, track or list example.com as a domain. To be more clear, I don't want to track actual users visiting www.example.com nor their IPs nor anything like this, I just want to track or make a list of the sites (domains) referencing my script in their HTMLs. Is that possible?
PS: some hypothetical solutions and their problems:
The first idea that comes to mind is using an analytics tool; however, despite being the owner of my code, I'm not the owner of the site containing the repo: GitHub is the owner. Therefore, using an analytics tools seems to be impossible.
I can't do calls to my server: again, I don't have a server, it's a GitHub repo.
A simple window.location.hostname in the script would get what I want, but it would get it on the client side. I don't know if it's possible sending that information back to me... actually, I don't even know if that is legal.

Don't do it. Telemetry is tricky - and people will opt to not use your script.
Also without "place" to gather this information you cannot do it on github.
You can try leveraging "code" search engines like:
https://publicwww.com/
https://www.nerdydata.com/
and similars

Without addressing the legal aspect, you could embed PAT (Personal Access Key) in your script, which would enable said script to make GitHub API calls.
Typically: "Create or update a file (PUT /repos/:owner/:repo/contents/:path)" (I mentioned it here)
You would replace the content of a file in a dedicated user/repository with the domain name you get from the script.
Each version of that file would represent one instance of the script execution, with the associated domain written in it.
The drawback is that anyone could use that key for accessing the repository, so you need to monitor its content and usage carefully (again, using a dedicated user account/repository just for that one usage).
As noted below by bk2204, this is too insecure.
Instead of a PAT, you can adopt a similar workflow as a GitHub webhook: your script would call a dedicate URL, with a JSON event, which would then register the call.

Related

how to limit access to my iframe widget using CSP cookies and http referer

I am developing a web application (like a widget) that my potential clients will use on their websites for the benefit of their users. I was thinking about the best way to deliver the application to them and at the same time be able to control who is using my widget so that I can bill them correctly.
I checked a few previous posts like iframe for a widget and iframe best practices limitations and JS to load iframe but they are 7-10yr old and not exactly what I'm trying to do.
That being said, so far ... the best way to deliver seems to be a combination of:
iframe
Content-Security-Policy frame-ancestors HTTP header
cookies + $http_referer checks on the server side to avoid sneaky users
On the load I'm going to send a secret key with URL to deliver a customized/branded version and I'm planing to rely on cookies for subsequent calls
I have a few questions here:
Should I use an iframe tag with specific URL directly, like
<iframe src="https://superwidget.com/SecretKey=12345678"></iframe>
or should I use a JavaScript to load/create iframe element using the same URL? Is there any benefit from using one or another except being able to defer a load of an iframe in the JS version?
So I'm planing to use iframe / CSP / http referer / cookie combo ... Is there any other (better) way to deliver a widget and make sure only allowed audience using it?
Anything else I'm missing here
Any help appreciated!

My recommendation would be to use javascript.
That way, in your javascript, you can validate if the DOMAIN NAME for the page that the javascript is called from is authorized for that client's token.
If it is, load the IFrame with the custom content.
This will also allow you to have greater control over user experience.

If I were you I would use a simple iframe. The page should be retrived with a key (eg. ?key=some-special-key-in-base-16-58-or-64).
You backend should later on verify that the Refer: not-your-site.com header is whitelisted for that specific API key.
If, instead, you need to use a js widget, you could use the key as a param when requesting the js file and let the verification backend use the classic Host: not-your-site.com header.
You could send a custom widget that asks them to pay/renew the system if the key or the refer is not valid. Some people visitng the site might not like this idea so think carefully about implementing it. If you are not on top of the pyramid of the team let someone with more responsabilty choose.
The advantage of using an iframe widget over a js one is that it has a sandbox and therfor cannot be accesed by the parent site. Please note that it might be a disvantage if you want to let your consumers to modify the widget with their own js.
Please note that SCP has to always be set correctly if you want all of this to work.
Last tip: Using the hosts file to fake two sites on the same machine won't work, on Windows 10 at least, so you'll have to use two different machines.

protect data file access in static app

In a static web app (nothing except html, css and javascript),
I'm searching for a method to protect a file (e.g. json) from being accessed.
That file should only be accessible by authenticated and approved users. (I don't know yet how authentication will be handled.)
I can hide the view of the content in the application with userapp.io e.g., but I can't prevent someone to read it if he wanted to.
Would this be possible?
I thought of putting the protected file on www.firebase.com, but I could not find any practical example.
I also found solutions with .htaccess, but I need to avoid server dependent solutions.
P.S.: Not asking for code here ;-), just advice to point me in the right direction will do.
Thanks in advance!

You can limit access through the web server (.htaccess), server-side code, or a third party solution. If you want to keep your app static and want to avoid modifying .htaccess, then your best bet is to find a third party file host that offers authentication. Would something like Box work for you?
If you're interested in putting your website on something like Weebly, then you can password protect certain pages.

Javascript injected into URL

We have a relatively popular website, and recently we started seeing some strange URL's popping up in our logs. Our pages reference jQuery and we started seeing pieces of those scripts being inserted into URL's. So we have logging entries like this:
/js/,data:c,complete:function(a,b,c){c=a.responseText,a.isResolved()&&(a.done(function(a){c=a}),i.html(g?d(
The User Agent string of the Request is Java/1.6.0_06, so I think we can safely assume it's a bot that's probably written in Java. Also, I can find back the piece of appended code in the jQuery file.
Now, my question is why would a bot try to insert referenced Javascript into the URL?

It may not be specifically targeted at your site -- it may be a shotgun attempt to find XSS-able sites so that an attacker later can figure out what's stealable and craft an attack and write a web-page to deploy it against real users.
In that cases, the attacker may use bots to collect HTML from sites, and then pass that HTML to instances of IE running on zombie machines to see what messages get out.
I don't see any active payload here so I assume you've truncated some code here, but it looks like JSCompiled jQuery code that probably uses jQuery's postMessage so it's probably an attempt to XSS your code to exfiltrate user data or credentials, install a JavaScript keylogger, etc.
I would grep through your JavaScript looking for code that does something like
eval(location.substring(...));
or anything that uses a regexp or substring call to grab part of the location and uses eval or new Function to unpack it.

Checking for Cross Site Scripting vulnerabilities, maybe.
If the bot detects a successful injection, it might inject dangerous code (e.g. stealing your users' passwords or redirecting them to malicious sites).

identify the user from cross-site post request

In my app, I will provide my client a javascript plugin, which will collect some HTML data and send to my server. I wonder what's the best way to identify my client. Say someone copied the javascript and put into his website. A similar case is the live chat plugin.

Really your questions it is not very clear to me. I am monitoring it from the beginning, so as no one answers I can say the following:
1.- If your javascript plugin is to plug in websites, as a jquery plugin, then you don't be sure about nothing because the code can easily be modified to remove any security procedure.
2.- If your javascript plugin is to plug in browsers, as a FF addon. Well, indeed can be modified too, but in the most of cases you can track simply with cookies or a login procedure.
Said that I think that if the case is the first (plug in websites) you could identify the websites asking for a authentication token stored in the server's website (requested by AJAX) and add it to the HTML data that is send to your server.
Hopefully you can understand my Emglizch :) and do not say pure garbage.

programmatically determining if someone owns a website?

I need to figure out the best way to determine if someone is the actual owner of a website. I don't just mean the domain although in a lot of cases that might be the case.
My first inclination was to have them put a special comment in their HTML that my program can scrape. e.g.:
<!-- #webcode:1234 -->
One possible problem with that approach is someone in theory could add it in the comments on their page or some other way to add content. Although I'm not sure anything I have them do couldn't be gotten that way.
My other idea was since I was planning on also offering a JavaScript widget was to just scrape that although I didn't want to necessarily force them to add the widget.
<script type="text/javascript" src="http://yoursite.com/widget/widget/A4923D2342JF"></script>
What other mechanisms could be employed to determine ownership/control of a website?

Here are the options that Google uses for Domain verification:
Create a CNAME or TXT record in your
domain's DNS settings. These methods
require accessing DNS settings for
your domain at your domain host's
website. Which method you can choose
(CNAME or TXT record) depends on
what's offered in your Google Apps
control panel. We're currently
rolling out the TXT record method but
still ask many customers to create a
CNAME record, instead.
Upload an HTML file to your domain's
web server This method requires being
able to upload files to your domain's
web server. Try doing this if you
don't have access to your domain's
DNS settings.
Add a tag to your home page
This method is available only for
some customers (it's another new
method we're rolling out). It
requires accessing your domain's web
server but not uploading to it. Try
doing this if you have write access
to files on the server but can't
upload new files.
CNAME/TXT or uploading an HTML file to the root of the domain is the most secure, since it requires full control of the domain. If you want to be a bit more lax you could use a Meta tag in the head node, which would prevent someone from adding a comment to a page. All depends on how secure you want to be.

Do what Google does for their Webmaster Tools. Generate a unique key, and have them put it in a meta tag in the head of their front page. It's pretty unlikely that a user who does not own the site will be able to change the contents within the <head></head> tags. If they can, the site is vulnerable to almost any kind of vandalism, and is hopeless.

You could have them add your original idea but only accept the comment in, say, the <header> tag of the website. This way you could avoid having them past the comment into a 'comments' section like you originally suggested.
In fact, I subscribed to a service that did just that: include the special comment in the header section of your page.

Make part of the requirement be that comment be inside of the <head> tag. Typically, even user generated content wouldn't make it's way into the head.
Also, your concern about the comment hack are probably unnecessary. Any comment system worth it's weight knows to escape comments so that the comment is not displayed as actual HTML markup.

Have them put a file with a hard to guess name on the server?
such as http://www.example.com/5gdbadcab234g3.txt

The only true way is to be able to access their fileserver. Anything transferred through HTTP can be reproduced.
If you don't have access to their server, then the best way would be to have an encrypted string embedded on the page (or in an image or some binary file on that page).
The string should be comprised of the URI, author, and timestamp. That way, even if someone does copy this string to their website, you would still be able to determine the author and the page. An added bonus is you'll be able to determine if there was a theft.
Granted, this is only as good as the algorithm that encrypts the page/author combination; hackers that are good at decrypting could get around this. Additionally, a dishonest author could create his own key for his page, thus you'd need to host the encryption so that no one could tinker with the timestamp. Also, this requires that all authors places the code on their page.

I know you mentioned that it isn't necessarily domain dependent but that would help. You could hash the domain (as they are unique) and send the person that string to put somewhere on their site either .txt or in the header as others have mentioned.
Then you store all their domains and their hashes in a database and your scraper would check that the domain it is scraping matches the hashed comment string, if it checks out then its fine.

Develop Reference

JavaScript is the programming language of the Web.

What sites are using my GitHub hosted script? - javascript

Don't do it. Telemetry is tricky - and people will opt to not use your script. Also without "place" to gather this information you cannot do it on github. You can try leveraging "code" search engines like: https://publicwww.com/ https://www.nerdydata.com/ and similars

Related

how to limit access to my iframe widget using CSP cookies and http referer

protect data file access in static app

Javascript injected into URL

identify the user from cross-site post request

programmatically determining if someone owns a website?

Categories

Resources