using regular expression to hide email address from spam bots

using regular expression to hide email address from spam bots - javascript

i am dynamically rendering multiple email addresses (mail to: ) on a webpage.
i obliviously need to hide these from spam bots.
the simplest solution that i found is this:
link
this involves putting a fake characters: "X" within the email address and then removing these once the link is click, copied or pasted.
it works- however the drawback is that it remove all "x"'s from the address. since i cannot guarantee that my dynamically rendered emails will not contain "x" this solution-as is, it not right for me.
a better solution would be to put 3 or more 'X' at the start/end of each email address and then using the above code to remove them once the link is clicked
i.e:
<a href="mailto:XXXcontact#domain.comXXX"
onmouseover="this.href=this.href.replace(/x/g,'');">link</a>
what i now need to do is use regular expression to THEN remove the first 3 'x' from the email address when its clicked
i tried the below but it did not work:
<a href="mailto:xxxcontact#domain.comXXX"
onmouseover="this.href=this.href.replace(^[\s\S]{0,3});">link</a>

The replace method expects two parameters - first the regex you're matching against, and second the value you want to replace matches with. It is also expected that your regex pattern will have flags to explain the behaviour of matches. For instance, g will match over the string it is operating on, globally, and i will match in a case-insensitive manner.
The regex you're after here would probably be more along the lines of:
^(mailto\:)x{3}(.*)x{3}$
That is, you're aiming to capture mailto:, which is expected at the beginning of the string, then to discard 3 x or X chars, followed by capturing the email address, but not the 3 x or X chars that are expected at the end of the string.
This would fit into the replace method in the following manner:
.replace(/^(mailto\:)x{3}(.*)x{3}$/i, '$1$2')
That said, would it not be fair to say that an email address could be inclined to include x or X characters consecutively? If so, you should either replace each occurrence of x{3} and the corresponding matches that you're prepending/appending to the email address with something less likely to be contained in an email address, or devise an alternative approach to the problem.

You could try something along the lines of
link
It would basically replace the occurences of ^$^ instead of something common as X or XXX

I would avoid adding more or less common characters in your mail address for obfuscation purposes. Rather try some kind of very basic encryption, such as toggling the bits or taking the string char by char, and increasing the char code by a fixed value.
Example:
var mailto = "mailto:contact#domain.com";
var obfuscated = "";
for (let i = 0; i < mailto.length; i++) {
obfuscated += String.fromCharCode(mailto.charCodeAt(i) + 7);
}
//obfuscated now looks like this: "thps{vAjvu{hj{Gkvthpu5jvt"
//to reverse the process, do the same thing and subtract 7.
//You could extract the code to a method that you simply call with "onmouseover"
Hope this helps, despite not precisely answering your question :)

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much

This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

RegEx match only final domain name from any email address

I want to match only parent domain name from an email address, which might or might not have a subdomain.
So far I have tried this:
new RegExp(/.+#(:?.+\..+)/);
The results:
Input: abc#subdomain.maindomain.com
Output: ["abc#subdomain.domain.com", "subdomain.maindomain.com"]
Input: abc#maindomain.com
Output: ["abc#maindomain.com", "maindomain.com"]
I am interested in the second match (the group).
My objective is that in both cases, I want the group to match and give me only maindomain.com
Note: before the down vote, please note that neither have I been able to use existing answers, nor the question matches existing ones.

One simple regex you can use to get only the last 2 parts of the domain name is
/[^.]+\.[^.]$/
It matches a sequence of non-period characters, followed by period and another sequence of non-periods, all at the end of the string. This regex doesn't ensure that this domain name happens after a "#". If you want to make a regex that also does that, you could use lazy matching with "*?":
/#.*?([^.]+\.[^.])$/
However,I think that trying to do everything at once tends to make the make regexes more complicated and hard to read. In this problem I would prefer to do things in two steps: First check that the email has an "#" in it. Then you get the part after the "#" and pass it to the simple regex, which will extract the domain name.
One advantage of separating things is that some changes are easier. For example, if you want to make sure that your email only has a single "#" in it its very easy to do in a separate step but would be tricky to achieve in the "do everything" regex.

You can use this regex:
/#(?:[^.\s]+\.)*([^.\s]+\.[^.\s]+)$/gm
Use captured group #1 for your result.
It matches # followed by 0 or more instance of non-DOT text and a DOT i.e. (?:[^.\s]+\.)*.
Using ([^.\s]+\.[^.\s]+)$ it is matching and capturing last 2 components separated by a DOT.
RegEx Demo

With the following maindomain should always return the maindomain.com bit of the string.
var pattern = new RegExp(/(?:[\.#])(\w[\w-]*\w\.\w*)$/);
var str = "abc#subdomain.maindomain.com";
var maindomain = str.match(pattern)[1];
http://codepen.io/anon/pen/RRvWkr
EDIT: tweaked to disallow domains starting with a hyphen i.e - '-yahoo.com'

Javascript string validation. How to write a character only once in string and only in the start?

I am writing validation for phone numbers. I need to allow users to write + character only in the begining of input field and prevent users from writing it later in the field.
In other words:
+11111111 - right,
111111111 - right,
+111+111+ - false,
1111+111+ - false
The problem is that I need to perform validation while typing. As result I cannot analyse whole string after submision, thus it is not possible to fetch the position of + character because 'keyup' always returns 0.
I have tryed many approaches, this is one of them:
$('#signup-form').find('input[name="phone"]').on('keyup', function(e) {
// prevent from typing letters
$(this).val($(this).val().replace(/[^\d.+]/g, ''));
var textVal = $(this).val();
// check if + character occurs
if(textVal === '+'){
// remove + from occurring twice
// check if + character is not the first
if(textVal.indexOf('+') > 0){
var newValRem = textVal.replace(/\+/, '');
$(this).val(newValRem);
}
}
});
When I am trying to replace + character with empty string then it is replaced only once which is not enough, because user might type it a cople of times by mistake.
Here is the link to the fiddle: https://jsfiddle.net/johannesMt/rghLowxq/6/
Please give me any hint in this situation. Thanks!

To help you with the current code fix (#Thomas Mauduit-Blin is right that there are a lot more to do here than just allow plus symbol at the beginning only), you may remove any plus symbols that are preceded with any character. Just capture that character and restore with a backreference in the replacement pattern:
$(this).val($(this).val().replace(/[^\d.+]|(.)\++/g, '$1'));
See the updated fiddle and the regex demo.
The pattern is updated with a (.)\++ alternative. (.) captures any character but a newline into Group 1 that is followed with one or more plus symbols, and the contents of Group 1 is placed back during the replacement with the help of $1 backreference.

For better validation Why don't you use Jquery maskedinput library which will do lots of additional task for you without over head for other purpose also
$("#phone").mask("+999-999-9999");
$("#phone").mask("+9999-999-9999");
$("#phone").mask("+99999999999");

If you want to do the validation on your own, you must use a regex.
But, as described in another related thread here:
don't use a regular expression to validate complex real-world data like phone numbers or URLs. Use a specialized library.
You must let the user enter an invalid phone number, and perform the check later, or on form submit and/or on server side for example. Here, you want to take care of the "+" character, but there are lot's of other stuff to do to have a trustable validation.

If your textVal has a +, indexOf will only check for the first occurence. You need to ensure that first character is not checked by indexOf. So use substring to take out first character from the equation.
Simply replace
if(textVal.indexOf('+') > 0){
with
if(textVal.substring(1).indexOf('+') > -1){
Demo

JavaScript regex valid name

I want to make a JavaScript regular expression that checks for valid names.
minimum 2 chars (space can't count)
space en some special chars allowed (éàëä...)
I know how to write some seperatly but not combined.
If I use /^([A-Za-z éàë]{2,40})$/, the user could input 2 spaces as a name
If I use /^([A-Za-z]{2,40}[ éàë]{0,40})$/, the user must use 2 letters first and after using space or special char, can't use letters again.
Searched around a bit, but hard to formulate search string for my problem. Any ideas?

Please, please pretty please, don't do this. You will only end up upsetting people by telling them their name is not valid. Several examples of surnames that would be rejected by your scheme: O'Neill, Sørensen, Юдович, 李. Trying to cover all these cases and more is doomed to failure.
Just do something like this:
strip leading and trailing blanks
collapse consecutive blanks into one space
check if the result is not empty
In JavaScript, that would look like:
name = name.replace(/^\s+/, "").replace(/\s+$/, "").replace(/\s+/, " ");
if (name == "") {
// show error
} else {
// valid: maybe put trimmed name back into form
}

Most solutions don't consider the many different names there might be. There can be names with only two character like Al or Bo or someone that writes his name like F. Middlename Lastname.
This RegExp will validate most names but you can optimize it to whatever you want:
/^[a-z\u00C0-\u02AB'´`]+\.?\s([a-z\u00C0-\u02AB'´`]+\.?\s?)+$/i
This will allow:
Li Huang Wu
Cevahir Özgür
Yiğit Aydın
Finlay Þunor Boivin
Josué Mikko Norris
Tatiana Zlata Zdravkov
Ariadna Eliisabet O'Taidhg
sergej lisette rijnders
BRIANA NORMINA HAUPT
BihOtZ AmON PavLOv
Eoghan Murdo Stanek
Filimena J. Van Der Veen
D. Blair Wallace
But will not allow:
Shirley24
66Bryant Hunt88
http://stackoverflow.com
laoise_ibtihaj
hippolyte#example.com
Cy4n 4ur0r4 Blyth3 3ll1
Justisne
Danny
If the name needs to be capitalized, uppercase, lowercase, trimmed or single spaced, that's a task a formatter should do, not the user.

I would like to propose a RegEx that would match all latin based languages with their special characters:
/\A([ áàíóúéëöüñÄĞİŞȘØøğışÐÝÞðýþA-Za-z-']*)\z/
P.S. I've included all characters I could find, but please feel free to edit the answer in case I've missed any.

Why not
var reg= /^([A-Za-z]{2}[ éàëA-Za-z]*)$/;
2 letters, then as many spaces, letters or special characters as you want.
I wouldn't allow spaces in usernames though - it's begging for trouble when you have usernames like
ab ba
who's going to remember how many spaces they used?

You could do this:
/^([A-Za-zéàë]{2,40} ?)+$/
2-40 characters, and then optionally a space, repeated at least once. This will allow a space at the end, but you could trim it off separately.

After 'trim' the input value, The following will math your request only for Latin surnames.
rn = new RegExp("([\w\u00C0-\u02AB']+ ?)+","gi");
m = ln.match(rn);
valid = (m && m.length)? true: false;
Note that I am using '+', instead of '{2,}', that is because some surnames uses just one letter in a separated word like "Ortega y Gasset"
You can see I am not using RegExp.test, this is because that method don't work properly (I don't know why, but it has a high fail-rate, you may see it here:.
In my country, people from non-latin-language countries usually do some translation of their names so the previous RegExp would be enough. However, if you attempt to match any surname in the world, you may add more range of \u#### characters, avoiding to include symbols, numbers or other type. Or perhaps the xregexp library may help you.
And, please, do not forget to test the input in server side, and escaping it before using it in the sql sentences (if you have them)

Need a regex for acceptable file names

I'm using Fancy Upload 3 and onSelect of a file I need to run a check to make sure the user doesn't have any bad characters in the filename. I'm currently getting people uploading files with hieroglyphics and such in the names.
What I need is to check if the filename only contains:
A-Z
a-z
0-9
_ (underscore)
- (minus)
SPACE
ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü (as single and double byte)
Obviously you can see the difficult thing there. The non-english single and double byte chars.
I've seen this:
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]
And this:
[\x80-\xA5]
But neither of them fully cover the situation right.
Examples that should work:
fást.zip
abc.zip
ABC.zip
Über.zip
Examples that should NOT work:
∑∑ø∆.zip
¡wow!.zip
•§ªº¶.zip
The following is close, but I'm NO RegEx'pert, not even close.
var filenameReg = /^[A-Za-z0-9-_]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF]+$/;
Thanks in advance.
Solution from Zafer mostly works, but it does not catch all of the other symbols, see below.
Uncaught:
¡£¢§¶ª«ø¨¥®´åß©¬æ÷µç
Caught:
™∞•–≠'"πˆ†∑œ∂ƒ˙∆˚…≥≤˜∫√≈Ω
Regex:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;

Alternation between two character classes (ie. [abc]|[def]) can be simplified to a single character class ([abcdef]) -- the first can be read as "(a or b or c) OR (d or e or f)"; the second as "(a or b or c or d or e or f)". What probably tripped up your regular expression is the unescaped dash in the first class -- if you want a literal dash, it should be the last character in the class.
So we'll modify your expression to get it working:
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+$/;
The problem now is that you're not accounting for the file extension, but that is an easy modification (assuming you're always getting .zip files):
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+\.zip$/;
Replace zip with another pattern if the extension differs.

It looks like it is the character ranges that are causing the problem, because they include some unallowable characters in between. Since you already have the list of allowable characters, the best thing would be to just use that directly:
var filenameReg = /^[A-Za-z0-9_\-\ ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü]+$/;

The following should work:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;
I've put \ next to - and grouped two expressions otherwise + sign doesn't affect the first expression.
EDIT 1 :I've also put . in the expression.

We have diffrent rules for diffrent platforms. But I think you mean long file names in windows. For that you can use following RegEx:
var longFilenames = #"^[^\./:*\?\""<>\|]{1}[^\/:*\?\""<>\|]{0,254}$";
NOTE: Instead of saying which Character is allowed, you need to say which ones are not allowed!
But keep in mind that this is not 100% complete RegEx. If you really want to make it complete you have to add exceptions for reserved names as well.
You can find more information about filename rules here:
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx

Develop Reference

JavaScript is the programming language of the Web.

using regular expression to hide email address from spam bots - javascript

You could try something along the lines of link It would basically replace the occurences of ^$^ instead of something common as X or XXX

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

RegEx match only final domain name from any email address

Javascript string validation. How to write a character only once in string and only in the start?

JavaScript regex valid name

Need a regex for acceptable file names

Categories

Resources