Hashtag linking in AngularJS and JS - javascript

I'm new to regex expressions and don't really understand them. I'm getting comments from a PHP script that may or may not include hashtags. I need to create a link out of the hashtag (not including urls or if the hashtag has a commas or a space in it)
So far I've looked online and found this:
string = string.replace(/(^|\s)(#[a-z\d-]+)/ig, "$1$2");
However, the link generated is:
#thenameofhashtag
I need to be able to exclude the hashtag from the tag= variable line. How can I modify the expression to achieve this and are there any angularJS way's of doing this? Additionally, are languages (Chinese, Japanese, etc) or characters that are not in UTF-8 encoded create problems?

You can exclude the # from the capturing group so that it is not captured in $2 as
(^|\s)#([a-z\d-]+)/ig
#([a-z\d-]+) Here the # is moved outside so that only [a-z\d-]+ is captured
Example
string.replace(/(^|\s)#([a-z\d-]+)/ig, "$1#$2");
// => #thenameofhashtag

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much
This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

javascript regex replace some words with links, but not within existing links

Trying to replace certain words in HTML pages with the same word but as a URL linking to that resource.
For example, replace the word 'MySQL' with MySQL
Using the JS replace function with regex, and it's doing the replacing just fine.
BUT it's also replacing words that are already part of URLs... which is the problem.
For the MySQL example, it's replacing BOTH the "MySQL" text that's already linked, AND the URL leading to mysql.com, so breaking the already existing link.
Is there a way to update the inline regex (in the .replace call) to NOT do replacing in existing links, i.e. elements?
Here's the replace code:
var NewHTML = OriginalHTML
.replace(/\bJavaScript\b/gi, "$&")
.replace(/\bMySQL\b/gi, "$&")
;
Here's the full sample code (tried to paste it inline but wasn't looking right with the backticks):
http://pastie.org/private/v4l2s2c42aqduqlopurpw
Went through the JS regexp reference (here), and tried various other permutations in the regex matching, like the following, but all that does it make it not match ANY words on the page...
.replace(/\b(\<a\>*!\>)JavaScript\b/i,xxxxx
The following regex DOES prevent the match from happening wherever the word is literally touching a slash or a dash... but that's not the solution (and it does not fix the mysql example above):
.replace(/\b(?!\>)(?!\-)(?!\/)MySQL\b(?!\-)(?!\/)/gi, "$&")`
I've read through the related threads on stackoverflow and elsewhere, but can't seem to find this particular scenario, not in JavaScript anyway.
Any help would be greatly appreciated. :-)
Thanks!
You could change your regex to exclude keywords that precede the end anchor tag, </a>:
.replace(/\bMySQL\b(?![^<]*?<\/a>)/gi, "$&")
See jsfiddle for example.
A negative lookahead should be sufficient:
.replace(/\bMySQL(?!\.com)\b/gi, "$&")

Regular expression for detecting hyperlinks

I've got this regex pattern from WMD showdown.js file.
/<((https?|ftp|dict):[^'">\s]+)>/gi
and the code is:
text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"$1");
But when I set text to http://www.google.com, it does not anchor it, it returns the original text value as is (http://www.google.com).
P.S: I've tested it with RegexPal and it does not match.
Your code is searching for a url wrapped in <> like: <http://www.google.com>: RegexPal.
Just change it to /((https?|ftp|dict):[^'">\s]+)/gi if you don't want it to search for the <>: RegexPal
As long as you know your url's start with http:// or https:// or whatever you can use:
/((https?|s?ftp|dict|www)(://)?)[A-Za-z0-9.\-]+)/gi
The expression will match till it encounters a character not allowed in the URL i.e. is not A-Za-z\.\-. It will not however detect anything of the form google.com or anything that comes after the domain name like parameters or sub directory paths etc. If that is your requirement that you can simply choose to terminate the terminating condition as you have above in your regex.
I know it seems pointless but it may be useful if you want the display name to be something abbreviated rather than the whole url in case of complex urls.
You could use:
var re = /(http|https|ftp|dict)(:\/\/\S+?)(\.?\s|\.?$)/gi;
with:
el.innerHTML = el.innerHTML.replace(re, '<a href=\'$1$2\'>$1$2<\/a>$3');
to also match URLs at the end of sentences.
But you need to be very careful with this technique, make sure the content of the element is more or less plain text and not complex markup. Regular expressions are not meant for, nor are they good at, processing or parsing HTML.

Building a Hashtag in Javascript without matching Anchor Names, BBCode or Escaped Characters

I would like to convert any instances of a hashtag in a String into a linked URL:
#hashtag -> should have "#hashtag" linked.
This is a #hashtag -> should have "#hashtag" linked.
This is a [url=http://www.mysite.com/#name]named anchor[/url] -> should not be linked.
This isn't a pretty way to use quotes -> should not be linked.
Here is my current code:
String.prototype.parseHashtag = function() {
return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) {
var tag = t.replace("#","")
return t.link("http://www.mysite.com/tag/"+tag);
});
};
Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link.
Halp!
How about the following:
/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g
matches the hashtags in your example. Since JavaScript doesn't support lookbehind, it tries to either match the start of the string or any character except & before the hashtag. It captures the latter so it can later be replaced. It also captures the name of the hashtag.
So, for example:
subject.replace(/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1http://www.mysite.com/tag/$2");
will transform
#hashtag
This is a #hashtag and this one #too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
into
http://www.mysite.com/tag/hashtag
This is a http://www.mysite.com/tag/hashtag and this one http://www.mysite.com/tag/too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
This probably isn't what t.link() (which I don't know) would have returned, but I hope it's a good starting point.
There is an open-source Ruby gem to do this sort of thing (hashtags and #usernames) called twitter-text. You might get some ideas and regexes from that, or try out this JavaScript port.
Using the JavaScript port, you'll want to just do:
var linked = TwitterText.auto_link_hashtags(text, {hashtag_url_base: "http://www.mysite.come/tag/"});
Tim, your solution was almost perfect. Here's what I ended up using:
subject.replace(/(^| )#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1#$2");
The only change is the first conditional, changed it to match the beginning of the string or a space character. (I tried \s, but that didn't work at all.)

match image tags with regEx

I am having some trouble with this regex:
<img(.+)src="_image/([0-9]*)/(.+)/>
Global and case insensitive flags is on.
The problem is that it also grabs Image n (see string below), but I want it only to match the image tags in the string.
<p>Image 1:<img width="199" src="_image/12/label" alt=""/> Image 2: <img width="199" src="_image/12/label" alt=""/><img width="199" src="_image/12/label" alt=""/></p>
It works if I put a newline before Image n :)
Can anyone point out for me what I am doing wrong?
Thanks in advance
bob
Use a non-greedy regexp:
<img .? src="_image/(\d+)/(.+?)/.?>
If I interpret your regex correctly, it looks like you're after the directory name in the first group and the file path in the second group?
<IMG.*?SRC="/_image/(\d+?)/([^"]*?)".*?/>
Don't forget to use the regex options CaseInsensitive which wraps the regex with (?i:[regex])
In the second group, you're parsing everything that is not the closing ", right now you're looking for all characters, in fact, you don't need to search all characters, you want everything that isn't the closing quote on the string.
Also, don't forget to close your SRC string which you're missing, and that the SRC attribute may not be the last in the tag - for instance border, width, height etc. Also, there may be any number of spaces after the closure of the last attribute and the end of tag />
From this regex, your first match group will hold the subdirectory name and the second match group will hold everything after the / of the subdirectory - including nested subdirectories. If you've got nested subdirectories, you may need to expand this slightly:
<IMG.*?SRC="/_image/((\d+?)/)+?([^"]*?)".*?/>
In this case, each of the leading groups will hold each of the nested directory names, and the last group will hold the file name.
Have you tried lazy evaluation? That worked sometime back when I tried something similar.
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
You're using a greedy quantifier (+) without much restriction. A greedy quantifier is telling the regex engine: "Grab every character that qualifies and only back off enough to complete the regex." That means that it will get from the first sequence of the characters "image/nnnnnn/something/".

Categories

Resources