What I'm trying to accomplish is to auto-generate tags/keywords for a file upload, basing these keywords from the filename.
I have accomplished auto-generating titles for each upload, as shown here:
But I have now moved on to trying to auto-generate keywords. Similar to titles, but with more formatting. First, I run the string through this to remove commonly used words from the filename (such as this,that,there... etc)
I am happy with it, but I need to not include words that have numbers in it. I have not found a solution on how to remove a word entirely if it contains a number. The solutions I have found like here only works for a certain match, while this one removes numbers alone. I would like to remove the entire word if it contains ANY numeric digit.
To remove all words which contain a number, use:
string = string.replace(/[a-z]*\d+[a-z]*/gi, '');
Try this expression:
var regex = /\b[^\s]*\d[^\s]*\b/g;
Example:
var str = "normal 5digit dig555it digit5 555";
console.log( str.replace(regex,'') ); //Result-> normal
Apply a simple regular expression to you current filename strings, replacing all occurrences with the empty string. The regular expression matches "words" containing any digits.
Javascript example:
'asdf 8bit jawesome234 mayhem 234'.replace(/\s*\b\w*\d\w*\b/g, '')
Evaluates to:
"asdf mayhem"
Here the regular expression is /\s*\b\w*\d\w*\b/g, which matches maximal sequences consisting of zero or more whitespace characters (\s*) followed by a word-boundary transition (\b), followed by zero or more alphanum characters (\w*), followed by a digit (\d), followed by zero or more alphanum characters, followed by a word-boundary transition (\b). \b matches the empty string at the transition to an alphanumeric character from either the beginning or end of the word or a non-alphanumeric character. The g after the final / of the regular expression means replace all occurrences, not just the first.
Once the digit-words are removed, you can split the string into keywords however you want (by whitespace, for example).
"asdf mayhem".split(/\s+/);
Evaluates to:
["asdf", "mayhem"]
('Apple Cover Photo 23s423 of your 543634 moms').match(/\b([^\d]+)\b/g, '')
returns
Apple Cover Photo , of your , moms
http://jsfiddle.net/awBPX/2/
use this to Remove words containing numeric :
string.replace("[0-9]","");
hope this helps.
Edited :
check this :
var str = 'one 2two three3 fo4ur 5 six';
var result = str.match(/(^[\D]+\s|\s[\D]+\s|\s[\D]+$|^[\D]+$)+/g).join('');
Related
I have written a regex that returns true or false depending on whether the text provided is a valid first/last name:
let letters = `a-zA-Z`;
letters += `àáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšž`;
letters += `ÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð`;
const re = new RegExp(`^[${letters}][${letters} ,.'’-]+[${letters}.]$`, 'u')
return(name.match(re));
So far, I'm able to ensure it only validates names that actually start with a letter and do not contain numerals or any special characters other than dot, hyphen, or comma. However, it still rejects names like Jo and Xi. I understand it's due to the three separate $-blocks. But the blocks are there to ensure the name doesn't start with a non-letter or end in a non-letter other than dot. How should I modify my expression to accommodate this?
Also, is there any way to shorten this expression without compromising its range? I still need it to cover extended Latin characters.
If the minimum length of the word is 2 chars, you could use a negative lookahead ^(?!.*[ ,'’]$) to assert that the string does not end with the characters that you would not allow and leave out the last [${letters}.]
Regex demo
If the minimum length is 1, you could use another negative lookahead (?![ .,'’]) and add the dot as well so that a single dot is not allowed at the beginning and then use the single character class that contains all allowed characters.
^(?!.*[ ,'’]$)(?![ .,'’])[a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð ,.'’-]+$
Regex demo
I'm been having trouble with regex, which I doesn't understand at all.
I have a string '#anything#that#i#say' and want that the regex detect one word per #, so it will be [#anything, #that, #i, #say].
Need to work with spaces too :(
The closest that I came is [#\w]+, but this only get 1 word and I want separated.
You're close; [#\w] will match anything that is either a # or a word character. But what you want is to match a single # followed by any number of word characters, like this: #\w+ without the brackets
var str = "#anything#that#i#say";
var regexp = /#\w+/gi;
console.log(str.match(regexp));
It's possible to have this deal with spaces as well, but I'd need to see an example of what you mean to tell you how; there are lots of ways that "need to work with spaces" can be interpreted, and I'd rather not guess.
use expression >> /#\s*(\w+)/g
\s* : to check if zero or more spaces you have between # and word
This will match 4 word in your string '#anything#that#i#say'
even your string is containing space between '#anything# that#i# say'
sample to check: http://www.regextester.com/?fam=97638
I've been working on a web app in which users can comment and reply to comments, this uses a tagging system. The users are being tagged by their name which can contain more words so I've decided to mark the takes like this:
&&John Doe&&
So a comment might look like this:
&&John Doe&&, are you sure that &&Alice Johnson&& is gone?
I'm trying to write a regex to match use in a string.replace() javascript function, so the regex must match every single tag in the string.
So far I have:
^&&.+{2, 64}&&$
This isn't working so I'm sure something is wrong, in case you didn't understand what I meant, the regex from above is supposed to match strings like this:
&&anythingbetween2and64charslong&&.
Thanks in advance!
(.*?)&& means "everything until &&" :
var before = document.getElementById("before");
var after = document.getElementById("after");
var re = /&&(.*?)&&/g, b = "<b>$1</b>";
after.innerHTML = before.textContent.replace(re, b);
<p id="before">&&John Doe&&, are you sure that &&Alice Johnson&& is gone?</p>
<p id="after"></p>
try &{2}.{2,64}&{2}
if you want to get the match in between add parentheses for the match group
&{2}(.{2,64})&{2}
right now your are only checking strings where the entire line matches
the ^ character means beginning of line
the $ character means end of line
\A means beginning of entire string
\Z means end of entire string
Here's what you need:
str.match(/&&.{2,64}?&&/g)
you need to remove ^ and $ from the start and the end since they match the start and the end of the string.
add a /g flag at the end so all the matches will be matched
? after the {} makes the match non-greedy, so it will match the shortest possible string between "&&" instead of the longest (will give you "&&John Doe&&" instead of "&&John Doe&&, are you sure that &&Alice Johnson&&")
Read up on greediness: Repetition with Star and Plus
This regex will match any Unicode letter between && signs:
str.match(/\&\&[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)*\&\&/g);
Here,
\p{L} --> Any unicode letter, the names can be any language and letter
\p{N} --> Any unicode digit
[\p{L}\p{N}]+ --> A word constructed with unicode letters or digits
\s+ --> Gaps between words, max 3 length
[\p{L}\p{N}]+(?:\s+[\p{L}\p{N}]+)* --> All word groups
EDIT: Thank you all for your inputs. What ever you answered was right.But I thought I didnt explain it clear enough.
I want to check the input value while typing itself.If user is entering any other character that is not in the list the entered character should be rolled back.
(I am not concerning to check once the entire input is entered).
I want to validate a date input field which should contain only characters 0-9[digits], -(hyphen) , .(dot), and /(forward slash).Date may be like 22/02/1999 or 22.02.1999 or 22-02-1999.No validation need to be done on either occurrence or position. A plain validation is enough to check whether it has any other character than the above listed chars.
[I am not good at regular expressions.]
Here is what I thought should work but not.
var reg = new RegExp('[0-9]./-');
Here is jsfiddle.
Your expression only tests whether anywhere in the string, a digit is followed by any character (. is a meta character) and /-. For example, 5x/- or 42%/-foobar would match.
Instead, you want to put all the characters into the character class and test whether every single character in the string is one of them:
var reg = /^[0-9.\/-]+$/
^ matches the start of the string
[...] matches if the character is contained in the group (i.e. any digit, ., / or -).
The / has to be escaped because it also denotes the end of a regex literal.
- between two characters describes a range of characters (between them, e.g. 0-9 or a-z). If - is at the beginning or end it has no special meaning though and is literally interpreted as hyphen.
+ is a quantifier and means "one or more if the preceding pattern". This allows us (together with the anchors) to test whether every character of the string is in the character class.
$ matches the end of the string
Alternatively, you can check whether there is any character that is not one of the allowed ones:
var reg = /[^0-9.\/-]/;
The ^ at the beginning of the character class negates it. Here we don't have to test every character of the string, because the existence of only character is different already invalidates the string.
You can use it like so:
if (reg.test(str)) { // !reg.test(str) for the first expression
// str contains an invalid character
}
Try this:
([0-9]{2}[/\-.]){2}[0-9]{4}
If you are not concerned about the validity of the date, you can easily use the regex:
^[0-9]{1,2}[./-][0-9]{1,2}[./-][0-9]{4}$
The character class [./-] allows any one of the characters within the square brackets and the quantifiers allow for either 1 or 2 digit months and dates, while only 4 digit years.
You can also group the first few groups like so:
^([0-9]{1,2}[./-]){2}[0-9]{4}$
Updated your fiddle with the first regex.
I've been learning some Javascript regular expressions today and I'm failing to understand how the following code works.
var toswop = 'last first\nlast first\nlast first';
var swapped = text.replace(/([\w]+)\b([\w ]+)/g,'$2 $1');
alert(swapped);
It correctly alerts the words swapped round in to the correct sequence however the following code (note the missing space after the second \w) doesn't work. It just print them in the original order.
var toswop = 'last first\nlast first\nlast first';
var swapped = text.replace(/([\w]+)\b([\w]+)/g,'$2 $1');
alert(swapped);
From the MDN:
\w
Matches any alphanumeric character including the underscore. Equivalent to [A-Za-z0-9_].
For example, /\w/ matches 'a' in "apple," '5' in "$5.28,"
and '3' in "3D."
When you add a space, you change the character set from alphanumerics and an underscore to alphanumerics and an underscore and a space.
I think you are incorrectly using '\b' to match with a space, but in JavaScript regular expressions '\b' matches with a beginning or end of word.
Therefore this /([\w]+)\b/ part of the regular expression match only upto the end of word 'last'. remaining string is ' first' (note the space at the beginning).
Then to match with the remainder you need this ([\w ]+), this translates into 'One or more occurances of anyword character or space character'. which is exactly what we need to match with the remainder string ' first'.
You can note that even when the words are swapped, there is a space before the word 'first'.
To prove this further: imagine you changed your input to :
var toswop = 'last first another\nlast first another\nlast first another';
You can see your swapped text becomes
first another last
first another last
first another last
That is because last segment of the regular expression ([\w ]+) kept matching with both spaces and word characters and included the word 'another' into the match.
But if you remove the space from square brackets, then it won't match with the remainder ' first', because its not a string of 'word character' but a 'space' + string of 'word character'.
That is why you space is significant here.
But if you change your regex like this:
swapped = toswop.replace(/([\w]+)\s([\w]+)/g,'$2 $1');
Then it works without the space because \s in the middle with match with the space in the middle of two words.
Hope this clarifies your question.
See here for JavaScript RegEx syntax: http://www.w3schools.com/jsref/jsref_regexp_begin.asp
See here for my fiddle if you want to experiment more: http://jsfiddle.net/BuddhiP/P5Jqm/
It goes through the expression like this. It will look for all word characters until a non-word character is found. That catches the first word.
Then it looks for the next match which is a space or a word character. So without the space in the square brackets the space in the name isn't matched. That is why it's failing for the alternative without the space.
I think it's better to write this explicitly putting the space in rather than the \b.