Obfuscate email address to hide from scrapers? - javascript

I want to hide my personal data (email and phone) from scrapers and bots. This data is in the href of anchor tags. Of course, actual users should still be able to have functional clickable links.
I was thinking to use a simple JavaScript function to encrypt and decrypt the data so that patterns matchers (*#*.* etc) who only get the HTML code won't find the actual email address.
So my encryption function is to convert the string to a list of character codes, incrementing all list elements by 1, and converting it back to a string. See below for the code.
My question: Is this an adequate way to hide data from scrapers? Or does every scraper render JS nowadays?
The code:
JSFiddle
function stringToCharCodes(string) {
// Returns a list of the character codes of a string
return [...string].map(c => c.charCodeAt(0))
}
function deobfuscate(obfString) {
// String to character codes
let obfCharCodes = stringToCharCodes(obfString);
// Deobfuscate function (-1)
let deobfCharCodes = obfCharCodes.map(e => e -= 1);
// Character codes back to string
// Use spread operator ...
return String.fromCharCode(...deobfCharCodes);
}
// Result of obfuscate("example#example.com")
let obfEmail = "fybnqmfAfybnqmf/dpn";
document.getElementById("email").href = "mailto:" + deobfuscate(obfEmail);
// Result of obfuscate("31612345678")
let obfPhone = "42723456789";
document.getElementById("whatsapp").href = "https://wa.me/" + deobfuscate(obfPhone);
function obfuscate(string) {
// Obfuscate - Use developer tools F12 to run this once and then use the obfuscated string in your website
// String to character codes
let charCodes = stringToCharCodes(string);
// Obfuscate function (+1)
let obfCharCodes = charCodes.map(e => e += 1);
// Character codes back to string
// Use spread operator ...
return String.fromCharCode(...obfCharCodes);
}
<h1>Obfuscate Email And Phone</h1>
<p>Scrapers without Javascript will not be able to harvest your personal data.</p>
<ul>
<li><a id="email">Mail</a></li>
<li><a id="whatsapp">WhatsApp</a></li>
</ul>

The question is hard to answer, because there is no absolute truth, but let me try it.
You'll never get your email hidden 100% securely. Anything that renders the email address in a way that the user can read it, can also be rendered by sophisticated email scraper programs.
Once we accept that, what remains is the challenge to find a reasonable balance between the effort to hide the email address and the damage caused by a scraped email address.
In my experience, obfuscating the email and the href=mailto tag using html character encoding for a few characters is extremely simple but still effective in most cases. In addition to that, it renders without Javascript.
Example:
peter.pan#neverland.org
may become something like
peter.pan#neverland.de
It's supposedly even enough to hide the mailto: and the #.
I would guess that, because there are so many email addresses that can be collected too easily, email scrapers don't tend to use a lot of highly sophisticated techniques for that purpose. It's just not necessary.
Remember, however good you try to hide your email address on a publicly accessible website, if it's in one of the many address leakages, you've lost anyway. I use custom email addresses for different services, and only for those services, and still I get spam sent to some of these addresses, so I'm sure they were leaked in some way.
With regard to your approach, I'd say yes, it's good enough.

Related

Remove last 3 letters of div (hidde also in browser page source)

this is my HTML
<div id="remove">Username</div>
and this is my JS code
function slice() {
var t = document.getElementById("remove");
t.textContent = t.textContent.slice(0, -3);
}
slice();
Username load from foreach
{foreach from=$last_user item=s}
{$s.date}
{$s.username}
{/foreach}
This code working and remove 3 letter but when right click on browser and look at page sources i can see "Username" !
I need remove three letter because of privacy and security .
something like
*** name or usern ***
Thank for help me !
The only secure way to make sure the client can't see a particular piece of information is to never send it to the client in the first place. Otherwise, there will always be a way for the client to examine the raw payloads of the network requests and figure out the information they aren't supposed to know.
You'll need to fix this on your backend - either hard-code in
<div id="remove">Usern</div>
or, for a more dynamic approach, use a template engine (or whatever's generating the HTML) and look up how to change strings with it. For example, in EJS, if user is an object with a username property, you could do
<div id="remove"><%= user.username.slice(0, -3) %></div>
Changing the content only with client-side JavaScript will not be sufficient, if you wish to keep some things truly private.
With Smarty, you can define a modifier that takes a string and returns all but the last three characters of it.
function smarty_modifier_truncate_three($string)
{
return substr($string, 0, -3);
}
and then in your template, replace
{$s.username}
with
{$s.username|truncate_three}
If you want only the first three characters, it's easier because you can use the built-in truncate.
{$s.username|truncate:3}
JS doesn't change the source, it can only change the DOM, so what you can do is to keep the element empty and add a value to it using js, but don't forget that js runs on the client's side so its better here to send the string from the server without the last 3 characters.

How do websites code those "What Character Are You" Facebook links?

Here is an example of what I am referring to: Facebook Example
I don't understand how this is coded. Is it simply some code that states if your name begins with the letter "A" then you are this person, "B" then this person, etc... or is it more complex then that. I have seen people whose names both begin with "A" get different results, so could it be just a random result? And how would this all be coded on the website's end, since Facebook just pulls up an image/text preview of the site (which is also another question, how could so many "sites/name" exist for every name possible)
Any insight would be greatly appreciated!
There is many different ways this could be achieved but if you were speaking generally I would assume they where chosen at random for either an object, array, database etc. An example of this would be the following using a JavaScript array
const la = ["Goofy", "Bugs Bunny", "Yosemite Sam", "Porky Pig"]
const generateRandomCharacter = () => {
return `Your character is: ${la[Math.floor((Math.random() * la.length} + 0)]}`)
}
alert(generateRandomCharacter) /* would return your random character */
running the generateRandomCharacter would return your random character.
Again this could be achieved many other ways this is just an example.
For your question about 'how that many sites could exist' well from my very minimal experience with php I create a site that would write a new file each time a user loaded. I speculate that whenever you were to click the button to generate your character it was writing a file with your randomly chosen character and your facebook name as the filename but again my php knowledge is very minimal.
Hope this helped somehow.

Can I get robust XSS protection in CF11 that I can apply to an entire site without touching every query or input?

So I'm currently using CF11 and CFWheels 1.1, the "Global Script Protection"(GSP) server feature does an awful job of covering the XSS bases. I would like to extend it to block any and all tags/vectors for JS from being inserted into the database.
CF11 offers antiSamy protection via the getSafeHTML() function which applies a xml policy file specified in application.cfc but I would still need to modify every single varchar cfqueryparam in the application to use it right?
Is there a way to get CF11 to enable the antisamy features server or application wide in a similar way that the GSP feature works? What I mean by this is GSP automatically strips tags out of input submitted to the app without having to modify all the queries/form actions. I'd like a way to apply the antisamy policy file or getSafeHTML() in the same way.
Thanks!
Why would you have to apply it to every one? You would only need to do it for string (varchar) inputs and only when inserting. And even then, you wouldn't use it everywhere. For example, if you ask for my name and bio, there is no reason why you would want html, even "good" html, in my name. So I'm sure you already use something there to escape all html or simply remove it all. Only for a field like bio would you use getSafeHTML.
Validation is work. You (typically) don't want a "all at once" solution imo. Just bite the bullet and do it.
If you did want to do it, you can use onRequestStart to automatically process all keys in the form and url scope. This is written by memory so it may have typos, but here is an example:
function onRequestStart(string req) {
for(var key in form) { form[key] = getSafeHTML(form[key]); }
for(var key in url) { url[key] = getSafeHTML(url[key]); }
}
I agree with Ray, validation is work, and it is very important work. If you could have a server wide setting it would be way to generalized to fit all situations. When you do your own validation for specific fields you can really narrow down the attack surface. For example, assume you have a form with three fields; name, credit card number, social security number. With one server wide setting it would need to be general enough to allow all three types of input. With your own validation you can be very specific for each field and only allow a certain set of characters; name - only allows alpha characters and space, credit card number - only allows digits, space, dash and must conform to the mod rule, social security number - only allows digits and dash in 3-2-4 format. Nothing else is allowed.
That being said, I just wanted to point out that the "Global Script Protection" rules can be customized. That setting works by applying a regular expression that is defined in the cf_root/lib/neo-security.xml file in the server configuration, or the cf_root/WEB-INF/cfusion/lib/neo-security.xml file in the JEE configuration to the variable value. You can customize the patterns that ColdFusion replaces by modifying the regular expression in the CrossSiteScriptPatterns variable.
The default regular expression is defined as:
<var name='CrossSiteScriptPatterns'>
<struct type='coldfusion.server.ConfigMap'>
<var name='<\s*(object|embed|script|applet|meta)'>
<string><InvalidTag</string>
</var>
</struct>
</var>
Which means, by default, the Global Script Protection mechanism is only looking for strings containing <object or <embed or <script or <applet or <meta and replacing them with <InvalidTag. You can enhance that regular expression to look for more cases if you want.
See Protecting variables from cross-site scripting attacks section on this page
The solution as implemented for a cfwheels 1.1 app:
I used the slashdot file from https://code.google.com/p/owaspantisamy/downloads/list
This goes in application.cfc:
<cfcomponent output="false">
<cfset this.security.antisamypolicy="antisamy-slashdot-1.4.4.xml">
<cfinclude template="wheels/functions.cfm">
</cfcomponent>
This goes in the /ProjectRoot/events/onrequeststart.cfm file
function xssProtection(){
var CFversion = ListToArray(SERVER.ColdFusion.productversion);
if(CFversion[1]GTE 11){
for(var key in form) {
if(not IsJSON(form[key])){
form[key] = getSafeHTML(form[key]);
}
}
for(var key in url) {
if(not IsJSON(url[key])){
url[key] = getSafeHTML(url[key]);
}
}
}
}
xssProtection();

How to validate an e-mail address against php e-mail injection in javascript using regex?

I want to avoid the occurrence of an %sign in the e-mail adress.
So the user does not add additional headers to the e-mail.
However I am totally overwhelmed with regex and cannot find the solution.
So far I have
/[%]+/
But in my whole code this does validate an e-mail adress like test#example.com% as true.
This was due to Firefox and Chrome having an internal e-mail check whan specifying type="email" for the input!!!
function validateEmail(sEmail) {
var filter = /^[a-z0-9]+([-._][a-z0-9]+)*#([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,4}$/;
filter2 = /^(?=.{1,64}#.{4,64}$)(?=.{6,100}$).*/;
filter3 = /[%]+/
if (filter.test(sEmail) && filter2.test(sEmail) && !filter3.test(sEmail)) {
return true;
} else {
return false;
}
}
Btw. since I am totally unable to write this myself so far, I found two solutions, which I am not sure which one is better.
The upper one (1) or 2:
/^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$/
What do you think?
You shouldn't be using regular expressions to prevent against injection. Instead use a prepared sql statement:
$stmt = $dbh->prepare("INSERT INTO REGISTRY (name, value) VALUES (:name, :value)");
$stmt->bindParam(':name', $name);
$stmt->bindParam(':value', $value);
I wouldn't use regex unless you fully understand what a valid email address is. Instead, you should parameterize all values so that even if malicious code is inserted, it'll just treat it as a string. See OWASP:
https://www.owasp.org/index.php/Query_Parameterization_Cheat_Sheet
I see that you changed the question. You're actually interested in fixing PHP email header injection. This is considerably more complex than simply filtering out a single character in email addresses. You need to check to see if hackers/bots are trying to inject multipart/form data (especially newline/carriage returns which delimit header from body of a multipart message), manipulate form and session variables, and filter names and other fields (which shouldn't contain special characters).
Email address filtering just isn't sufficient to prevent attacks. The machine itself needs to be properly configured as well. Follow this guide to do what you can to secure against php injection (includes examples for filtering out newlines and for software configuration): http://static.askapache.com/pdf/xss-csrl-injection.pdf

Search for embedded email and phone numbers

I need to use a javascript form validation routine to scan various input text fields for embedded phone numbers and email addresses. This is for a classifieds system that is free to post but 'pay to connect' with buyers, so the intent is to prevent (as much as possible) the ability for users (those posting the ad) from simply embedding their phone and/or email contact information to bypass the system.
I've been googling for awhile now, and RegEx is not my strong suit, so I'm having a bit of a hard time finding a good snippet of code to help. All I want to do is get a pass/fail for a text field (pass if it does not appear to have embedded email and/or phone numbers, and fail if it does)
Does anyone already have a good javascript solution for this?
Try this:
var text = textArea.value;
if (text.search(/^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}$/))
...;//Contains email
if (text.search(/^[+]?(?!0{5})(\d{5})(?!-?0{4})(-?\d{4})?$/))
...;//Contains phone
Thanks to all for the input. Here is the version I ended up with, hope it helps someone else. Note: I removed the actual 'bad' words for this posting so that it would pass this site's filters. You can replace 'badword1', 'badword2', etc. with actual 'bad' words (you know, like nukular, calender, ekcetera):
function isAllowed(varField) {
var msg = '';
var pass = true;
var regex0=/\b(#|www|WWW|http|hotmail|gmail|badword1|badword2|badword3)\b/i;
if (regex0.test(varField))
{
msg += "Text appears to have disallowed words (e.g. profanity, email, web address, # symbol, etc.)\n";
pass = false;
}
var regex1=/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i;
if (regex1.test(varField))
{
msg += "Text appears to have email address in it (not allowed\n";
pass = false;
}
var regex2=/\b\(?\d{3}\)?[-\s.]?\d{3}[-\s.]\d{4}\b/i;
if (regex2.test(varField))
{
msg += "Text appears to have a phone number in it (not allowed)\n";
pass = false;
}
if (msg!='')
{
alert(msg);
}
return pass;
}
This will find email addresses: \b[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}\b
and this will find phone numbers: \b(()?\d{2,3}(?(1)))(?:-?\d{3}-?\d{4}|\d{11})\b
You'll be able to get some, but don't expect to get most (especially if people are aware of the requirement, or get more than one chance to fill the form).
People are already really good at circumventing bot detection of email addresses by doing things like "myaddresses at hotmail dot com", and there are a million variations of this. Also, Phone numbers vary by region.
You don't say what server side technology you're using, but it might be preferable to do this type of processing on the server. I always favor server side in my own work (ASP.NET), because the flexibility and power of an object oriented server side framework will trump that of JavaScript just about every time. This case is no exception, as it appears that JavaScript regular expression support is lacking several key features.
Regardless of whether you choose to go server side or client side, I've found that writing RegEx code is much simplified when using a tool such as Espresso. If you're running on a Mac, consider Reggy. These tools usually come with several "stock" RegEx expressions for various common queries (i.e. phone numbers, email etc) that usually work with minimal modification.

Categories

Resources