Matching all expressions using JS regex - javascript

I need to match all the expression (example: Laugh at Loud (LoL)) with 2 or more than 3 words. My regex works only for text with 3 character long expression. How do I make the regex very generic (without specifying the length as 3) so that expression are selected even if they are of any length.
The link shared provides an overview of it.
The last expression
light amplification by stimulated emission of radiation (LASER)
Green Skill Development Programme (GSDP) are not selected using the below regex
\b(\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* \(\1\2\3\)
\b(?:\w[\w']* [^a-zA-Z]*){3} ?\([A-Z]{3}\)
https://regex101.com/r/QPMo5M/1

You can try the following:
/\b(\w)[-'\w]* (?:[-'\w]* ){1,}\(\1[A-Z]{1,}\)/gi
UPDATE
As #ikegami commented, this sloppy regex matches also things like Bring some drinks (beer) and Bring something to put on the grill (BBQ). I think these cases can be filtered by using proper JavaScript code after doing the regex matching. Maybe in case of Bring some drinks (beer), we can detect it by using the fact that (beer) has no uppercase letters. In case of Bring something to put on the grill (BBQ), we can detect it by using the fact that there's no matching initial letters for the second B and Q in Bring something to put on the grill.
UPDATE 2
When we match the following string by using the regex above:
We need to use technologies from Natural Language Processing (NLP).
It matches "need to use technologies from Natural Language Processing (NLP)", not "Natural Language Processing (NLP)". These problems should be tackled also.
UPDATE 3
The following regex matches acronyms whose length is from 2 to 5 and it doesn't have the issues mentioned above. And I think it can be quite easily extended to support longer length as you want:
/\b(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* )?)?)?) *\(\1\2\3\4\5\)/gi

\b(\w)[-'\w]* (?:[-`."?,~=#!/\\|+:;%°*#£&^€$¢¥§'\w]* ){2,}\(\1[A-Z]{2,}\)
I placed some special characters in between

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true
How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)
Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.
I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.

Remove all parentheses, commas etc from a string and then extract the first 5 words

I have a string, eg:
Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone
I want to remove all parentheses and contents within the parentheses, commas etc and then extract the first five words only so that I get the result as
Lenovo K6 Power smart phone
Is there any method to apply regex to get this result?
Here's one way of doing it:
var str = 'Lenovo K6 Power (Silver, 32 GB)(4 GB RAM)';
document.write(str.match(/\w+/g).slice(0,5).join(' '));
It gets all words into an array (match(/\w+/g)), then gets the first five (slice(0,5)), to join then back to a string separated by space (join(' ')).
(And... Considering the question is tagged with regex, I believe a word could be defined as consisting of regex word characters, i.e. \w.)
Edit
The question has changed so the answer isn't correct anymore. Here's an update snippet that works with the new criteria:
var str = 'Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone';
document.write(str.split(/(?:\W*\([^)]*\))*\W+/).slice(0,5).join(' '));
This one split's the string instead, using the regex (?:\W*\([^)]*\))*\W+ which will match everything but word characters (\W), unless they're inside parentheses (everything inside parentheses is matched).
spliting on that will give an array with only the desired words. Therefrom the logic is the same.
var s1 = "Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone";
var s2 = s1.replace(/\([^)]*\)|, /g,'')
console.log(s2) //Output : "Lenovo K6 Power smart phone"
var myString = "Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone";
while (/\(.*\)/.test(myString)) {
myString = myString.replace(/\(.*?\)/.exec(myString)[0],'');
}
console.log(myString.match(/\w+/g));
The first snippet matches all parentheses pairs as long as there are some and removes them, them it matches all remaining words.
Output: Obj... ["Lenovo", "K6", "Power", "smart", "phone"]
This is a general solution, to always only get the first 5 Elements change the console log to
var obj = myString.match(/\w+/g);
for (var i = 0; i < 5; i ++)
{
console.log(obj[i]);
}
Your question is extremely trivial and can be answered with the most basic JavaScript skills. I strongly suggest you go back and review your tutorials and intros, and try solving your problem yourself.
To remove something, you simply do
string.replace(WHAT, '')
In other words, you replace something with nothing (the empty string ''). In the case you mentioned, this, a simple Google search for something like "javascript remove regexp" will give you plenty of pointers. In this case, one of the first results actually is about removing parentheses.
In your case, I guess you finally decided you want to remove parentheses and what's inside them. In about the first five minutes of learning regexp, you should have learned to write
/\(.*?\)/g
^^ an actual left paren
^^^ any number of characters
^^ an actual right paren
^ match this over and over again
If you need help with this, try an online regexp tester such as regex101.com. It will also give you a readable version of your regexp.
The only thing moderately advanced about this is the .*?, where the ? means "non-greedy"--in other words, take characters up only to the next right paren.
I'm sure you already learned why you have to write \( to match a left parenthesis, right? The \ escapes the parentheses, because by itself the parentheses would have a special meaning to regexp. You know the g flag too, right? That means replace all the matches.
To find the first five tokens, you first need to split your string into tokens. I'm sure you recall from your studies the basic Array methods, including--drum roll--split! Split your string with string.split(' '). That will split on single spaces. If you want to split on any whitespace, you could try string.split(/\s+/).
Now go back and read the documentation for split real carefully, although I know you already have. Look carefully at the second argument, called limit. It does exactly what you want. It splits into segments, but no more than specified by limit.
The solution to your problem, which you could easily have come up with if you had spent about five minutes studying the documentation and experimenting, is
input.replace(/\(.*?\)/g, '').split(/\s+/, 5)
Unfortunately, your approach of posting on Stack Overflow is not going to scale well at all. You can't post here every time there is some minor problem you cannot figure out yourself. To be perfectly frank, if you cannot learn how to learn, then you better give up on being a programmer and try brick-laying instead. You need to learn how to figure out things yourself. Before anything else, you need to learn how to read (and digest) the documentation. Very early in your career, you're also going to need to learn how to debug your programs, since Stack Overflow is no better a way to get your programs debugged than it is to get them written for you in the first place. If you simply cannot bring yourself to read documents or learn by yourself, and can only work by asking other people how to do every little thing, then find a chat room or forum where there are people with nothing better to do than answer such questions. That is not what Stack Overflow is.
console.log('Lenovo K6 Power (Silver, 32 GB)(4 GB RAM), smart phone'.replace(/\(.*?\)/g, '').split(/\s+/, 5));
Fixing this code so as to remove the comma is left as an exercise for you to use your new-found learning powers on. Hint: you may want to use the regexp feature called "alternation", which is represented by the vertical bar or pipe |. You may also find yourself needing to use character classes, which is another thing you should have learned about very early in your regexp studies.
None of this has anything to do with TypeScript, or Angular, as you seem to have thought when you initially posted the question. It's a little concerning that you seem to think that doing basic regexp or string or array manipulation would somehow be a TypeScript or Angular issue. TypeScript is merely a typing layer on top of JavaScript. Angular is a framework for building web apps. Neither replaces JavaScript, or provides any new basic language capability. In fact, to use either effectively, you must know JavaScript well.

JS RegEx challenge: replacing recurring patterns of commas and numbers

I am fiddling with a program to convert Japanese addresses into romaji (latin alphabet) for use in an emergency broadcast system for foreigners living in a Japanese city.
Emergency evacuation warnings are sent out to lists of areas all at once. I would like to be able to copy/paste this Japanese list of areas and spit out the romanized equivalent.
example Japanese input:
3条4~12丁目、15~18条12丁目、2、3条5丁目
(this list is of three areas, where 条(jo) and 丁目(chome) indicate block numbers in north-south and east-west directions, respectively)
The numbers are fine as they are, and I have already written code to replace the characters 条 and 丁目 with their romanized equivalents. My program currently outputs the first two areas (correctly) as "3-jo 4~12-chome" and "15~18-jo 12-chome"
However, I would like to replace patterns like that in the last area "2、5条6丁目" (meaning blocks 2 and 5 of 6-chome) such that the output reads "2&5-jo 6-chome"
The regular expression that denotes this pattern is \d*、\d* (note the Japanese format comma)
I am still getting used to regex - how can I replace the comma found in all \d*、\d* patterns with an "&"? Note that I can't simply replace all commas because they are also used to separate areas.
The easiest way is to isolate sequences like 15、18 and replace all commas in them.
text = "3条4~12丁目、15~18条12丁目、2、3条5丁目";
text.
replace(/(?:\d+、)+\d+/g, function(match) {
return match.replace(/、/g, "&");
}).
replace(/条/g, '-jō ').
replace(/丁目/g, '-chōme').
replace(/~/g, '-').
replace(/、/g, ', ')
// => "3-jō 4-12-chōme, 15-18-jō 12-chōme, 2&3-jō 5-chōme"
(Also... Where the heck do you live that has 丁 well-ordered by cardinal directions? Where I live, addresses are a mess... :P )
(Also also, thanks to sainaen for nitpicking my regexps into perfection :) )

Phone number validation - excluding non repeating separators

I have the following regex for phone number validation
function validatePhonenumber(phoneNum) {
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/;
return regex.test(phoneNum);
}
However, I would liek to make sure it doesn;t pass for different separators such as in
111-222.3333
Any ideas how to make sure the separators are the same always?
Just make sure beforehand that there is at most one kind of separator, then pass the string through the regex as you were doing.
function validatePhonenumber(phoneNum) {
var separators = extractSeparators(phoneNum);
if(separators.length > 1) return false;
var regex = /^[1-9]{3}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{3}$/;
return regex.test(phoneNum);
}
function extractSeparators(str){
// Return an array with all the distinct chars
// that are present in the passed string
// and are not numeric (0-9)
}
You can use the following regex instead:
\d{3}([-\s\.])?\d{3}\1?\d{4}
Here is a working example:
http://regex101.com/r/nN9nT7/1
As result it will match the following result:
111-222-3333 --> ok
111.222.3333 --> ok
111 222 3333 --> ok
111-222.3333
111.222-3333
111-222 3333
111 222-3333
EDIT: after Alan Moore's suggestion:
Also matches 111-2223333. That's because you made the \1 optional,
which isn't necessary. One of JavaScript's stranger quirks is that a
backreference to a group that did not participate in the match,
succeeds anyway. So if there's no first separator, ([-\s.])? succeeds
because the ? made it optional, and \1 succeeds because it's
JavaScript. But I would have used ([-\s.]?) to capture the first
separator (which might be nothing), and \1 to match the same thing
again. This works in any flavor, including JavaScript.
We can improve the regex to:
^\d{3}([-\s\.]?)\d{3}\1\d{4}$
You'll need at least two passes to keep this maintainable and extensible.
JS' RegEx doesn't allow for creating variables for use later in the RegEx, if you want to support older browsers.
If you are only supporting modern browsers, Fede's answer is just fine...
As such, with ghetto-support, you aren't going to be able to reliably check that one separator is the same value every time, without writing a really, really, really, stupidly-long RegEx, using | to basically write out the RegEx 3 times.
A better way might be to grab all of the separators, and use a reduction or a filter to check that they all have the same value.
var userEnteredNumber = "999.231 3055";
var validNumber = numRegEx.test(userEnteredNumber);
var separators = userEnteredNumber.replace(/\d+/g, "").split("");
var firstSeparator = separators[0];
var uniformSeparators = separators.every(function (separator) { return separator === firstSeparator; });
if (!uniformSeparators) { /* also not valid */ }
You could make that a little neater, using closures and some applied functions, but that's the idea.
Alternatively, here's the big, ugly RegEx that would allow you to test exactly what the user entered.
var separatorTest = /^([0-9]{3}\.[0-9]{3}\.[0-9]{3,4})|([0-9]{3}-[0-9]{3}-[0-9]{3,4})|([0-9]{3} [0-9]{3} [0-9]{3,4})|([0-9]{9,10})$/;
Notice I had to include the exact same number-test three times, wrap each one in parens (to be treated as a single group), and then separate each group with an | to check each group, like an if, else if, else... ...and then plug in a separate special case for having no separator at all...
...not pretty.
I'm also not using \d, just because it's easy to forget that - and . are both accepted "digit"s, when trying to maintain one of these abominations.
Now, a word or two of warning:
People are liable to enter all kinds of crap; if this is for a commercial site, it's likely better to just strip separators entirely and validate the number is the right size, and conforms to some specifics (eg: doesn't start with /^555555/).
If not given any instruction about number format, people will happily use either no separator or a formal number, like (555) 555-5555 (or +1 (555) 555-5555 for the really pedantic), which is obviously going to fail hard, in this system (see point #1).
Be prepared to trim what you get, before validating.
Depending on your country/region/etc laws about data-security and consumer-vs-transaction record-keeping (again, may or may not be more important in a commercial setting), it's likely better to store both a "user-given" ugly number, and a system-usable number, which you either clean on the back-end, or submit along with the user-entered text.
From a user-interaction perspective, either forcing the number to conform, explicitly (placeholders showing them xxx-xxx-xxxx right above the input, in bold), or accepting any text, and prepping it yourself, is going to be 1000x better than accepting certain forms, but not bothering to tell the user up-front, and instead telling them what they did was wrong, after they try.
It's not cool for relationships; it's equally not cool, here.
You've got 9-digit and 10-digit numbers, so if you're trying for an international solution, be prepared to deal with all international separators (, \.\-\(\)\+) etc... again, why stripping is more useful, because THAT RegEx would be insane.

Regex for parsing some medical data

I have been looking for a few hours how to do this particular regular expression magic with little to no luck.
I have been playing around with parsing some of my own medical data (why not?) which unfortunately comes in the form of a very unstructured text document with no tags (XML or HTML).
Specifically, as a prototype, I only want to match what my LDL delta (cholesterol change) is as a percentage.
In the form it shows up in a few different ways:
LDL change since last visit: 10%
or
LDL change since last visit:
10%
or
LDL change since last visit:
10%
I have been trying to do this in JavaScript using the native RegExp engine for a few hours (more than I want to admit) with little success. I am by no means a RegExp expert but I have been looking at an expression like such:
(?<=LDL change since last visit)*(0*(100\.00|[0-9]?[0-9]\.[0-9]{0,2})%)
Which I know does not work in JS because the lack support for ?<=. I tested these in Ruby but even then they were not successful. Could anybody work me through some ways of doing this?
EDIT:
Since this particular metric shows up a few times in different areas, I would like the regex to match them all and have them be accessible in multiple groups. Say matching group 0 corresponds to the Lipid Profile section and matching group 1 corresponds to the Summary.
Lipid profile
...
LDL change since last visit:
10%
...
Summary of Important Metrics
...
LDL change since last visit: 10%
...
A lookbehind solution is complicated because most languages only support fixed or finite length lookbehind assertions. Therefore it's easier to use a capturing group instead. (Also, the * quantifier after the lookbehind that you used makes no sense).
And since you don't really need to validate the number (right?), I would simply do
regexp = /LDL change since last visit:\s*([\d.]+)%/
match = regexp.match(subject)
if match
match = match[1]
else
match = nil
end
If you expect multiple matches per string, use .scan():
subject.scan(/LDL change since last visit:\s*([\d.]+)%/)

Categories

Resources