Match a string if it comes after certain string - javascript

I need a regular expression for JavaScript to match John (case insensitive) after Name:
I know how to do it, but I don't know how to get string from a different line like so (from a textarea):
Name
John
This is what I tried to do :: var str = /\s[a-zA-Z0-9](?= Name)/;
The logic: get a string with letter/numbers on a linespace followed by Name.
Then, I would use the .test(); method.
EDIT:
I tried to make the question more simple than it should have been. The thing I don't quite understand is how do I isolate "John" (really anything) on a new line followed by a specific string (in this case Name).
E.g., IF John comes after Name {dosomething} else{dosomethingelse}

Unfortunately, JavaScript doesn't support look-behinds. For something this simple, you can just match both parts of the string like this:
var str = /Name\s+([a-zA-Z0-9]+)/;
You then just have to extract the first capture group if you want to get John. For example:
"Name\n John".match(/Name\s+([a-zA-Z0-9]+)/)[1]; // John
However if you're just using .test, the capture group isn't necessary. For example:
var input = "Name\n John";
if (/Name\s+[a-zA-Z0-9]+/.test(input)) {
// dosomething
} else{
// dosomethingelse
}
Also, if you need to ensure that Name and John appear on separate lines with nothing but whitespace in between, you can use this pattern with the multi-line (m) flag.
var str = /Name\s*^\s*([a-zA-Z0-9]+)/m;

You do not need a lookahead here, simply place Name before the characters you want to match. And to enable case-insensitive matching, place the i modifier on the end of your regular expression.
var str = 'Name\n John'
var re = /Name\s+[a-z0-9]+/i
if (re.test(str)) {
// do something
} else {
// do something else
}
Use the String.match method if you want to extract the name from the string.
'Name\n John'.match(/Name\s+([a-z0-9]+)/i)[1];
The [1] here refers back to what was matched/captured in capturing group #1

Related

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Javascript: get username substring(s) beginning with # symbol

I'm looking for a good way to get one or more usernames preceded by a # symbol in a string.
Some example strings could be:
"#username this is just a test string"
"Blabla bla #username this is just a test string"
"Message for #username1, #username2 and #username_three: this is just a test string! - #username4"
Any solutions? I've not been able to find anything.
One occurrence
To match a single occurrence use a regular expression without any modifier and then get rid of the # using .substr(1):
const myString = "Hey #username this is a test string";
const username = myString.match(/#\w+/)[0].substr(1);
// This will be "username"
NOTE: the \w in the regular expression is equivalent to [A-Za-z0-9_].
Multiple occurrences
For more than one occurrence use a regular expression with the g modifier to match more than one string, then use the .map() method to remove the # characters at the beginning of each username using .substr(1) to throw away the first character:
const myString = "Hey #username1 and #Username_Two, this is just a test string! #username3";
const usernames = myString.match(/#\w+/g).map(x => x.substr(1));
// This will be ["username1", "Username_Two", "username3"]
NOTE: I am using the regular expression /#\w+/g since that usernames can contain only letters, numbers and underscores (at least on the most common sites, like Twitter etc).
Try regular expression:
var m = "#username This is just a test string".match(/#(.+?)\b/);
console.log(m); // ["#username", "username"]
The other answers work fine (mostly). I'm adding this as an alternative for matching multiple instances of usernames:
var str = "random #foo string #bar test #baz";
var usernames = str.split(/#(\w+)/).filter(function(_, i) { return i % 2; });
// [ "foo", "bar", "baz" ]
This works because if you place a capture group inside the pattern when you call .split, it will include that matched group in the result array. Then you just have to take every other array element.
Note also that .filter was added in ECMAScript 5.1, so it may not be supported in older browsers. If this is a concern, either use the polyfill technique described in the MDN article, or a simple for loop.
Just do:
var user = "#username";
var user2 = user.split("#")[1];

javascript regexp match tag names

I can't remember the name of it, but I believe you can reference already matched strings within a RegExp object. What I want to do is match all tags within a given string eg
<ul><li>something in the list</li></ul>
the RegExp should be able to match only the same tags, then I will use a recursive function to put all the individual matches in an array. The regex that should work if I can reference the first match would be.
var reg = /(?:<(.*)>(.*)<(?:FIRST_MATCH)\/>)/g;
The matched array should then contain
match[0] = "<ul><li>something in the list</li></ul>";
match[1] = "ul";
match[2] = ""; // no text to match
match[3] = "li";
match[4] = "something in the list";
thanks for any help
It seems like you mean backreference (\1, \2):
var s = '<ul><li>something in the list</li></ul>';
s.match(/<([^>]+)><([^>]+)>(.*?)<\/\2><\/\1>/)
// => ["<ul><li>something in the list</li></ul>",
// "ul",
// "li",
// "something in the list"]
The result is not exactly same with what you want. But point is that the backreference \1, \2 match the string that was matched by earlier group.
It is not possible to parse HTML using regular expressions (if you're interested in the specifics, it is because HTML parsing requires a stronger type of automaton than a finite state automaton which is what a regular expression can express - look up FSA vs FST for more info).
You might be able to get away with some hack for a specific problem, but if you want to reliably parse HTML using Javascript then there are other ways to do this. Search the web for: parse html javascript and you'll get plenty of pointers on how to do this.
I made a dirty workaround. Still needs work thought.
var str = '<div><ul id="list"><li class="something">this is the text</li></ul></div>';
function parseHTMLFromString(str){
var structure = [];
var matches = [];
var reg = /(<(.+)(?:\s([^>]+))*>)(.*)<\/\2>/;
str.replace(reg, function(){
//console.log(arguments);
matches.push(arguments[4]);
structure.push(arguments[1], arguments[4]);
});
while(matches.length){
matches.shift().replace(reg, function(){
console.log(arguments);
structure.pop();
structure.push(arguments[1], arguments[4]);
matches.push(arguments[4]);
});
}
return structure;
}
// parseHTMLFromString(str); // ["<div>", "<ul id="list">", "<li class="something">", "this is the text"]

Using regex to match part of a word or words

I'm new to regex and having difficulty with some basic stuff.
var name = "robert johnson";
var searchTerm = "robert johnson";
if (searchTerm.match(name)) {
console.log("MATCH");
}
I'd like to try and find something that matches any of the following:
rob, robert, john, johnson, robertjohnson
To make the regex simpler, I've already added a .toLowerCase() to both the "name" and the "searchTerm" vars.
What regex needs to be added to searchTerm.match(name) to make this work?
Clarification: I'm not just trying to get a test to pass with the 5 examples I gave, I'm trying to come up with some regex where any of those tests will pass. So, for example:
searchTerm.match(name)
...needs to change to something like:
searchTerm.match("someRegexVoodooHere"+name+"someMoreRegexVoodooHere")
So, if I edit
var searchTerm = "robert johnson";
...to be
var searchTerm = "rob";
...the same function searchTerm.match directive would work.
Again, I'm new to regex so I hope I'm asking this clearly. Basically I need to write a function that takes any searchTerm (it's not included here, but elsewhere I'm requiring that at least 3 characters be entered) and can check to see if those 3 letters are found, in sequence, in a given string of "firstname lastname".
"robert johnson".match(/\b(john(son)?|rob(ert(johnson)?)?)\b/)
Will give you all possible matches (there are more then one, if you need to find whether the input string contained any of the words.
/\b(john(son)?|rob(ert(johnson)?)?)\b/.test("robert johnson")
will return true if the string has any matches. (better to use this inside a condition, because you don't need to find all the matches).
\b - means word boundary.
() - capturing group.
? - quantifier "one or none".
| - logical "or".
You could create an array of the test terms and loop over that array. This method means less complicated regex to build in a dynamic environment
var name = "robert johnson";
var searchTerm = "robert johnson";
var tests = ['rob', 'robert', 'john', 'johnson', 'robertjohnson'];
var isMatch = false;
for (i = 0; i < tests.length; i++) {
if (searchTerm.test(tests[i])) {
isMatch = true;
}
}
alert(isMatch)
Regular expressions look for patterns to make a match. The answer to your question somewhat depends on what you are hoping to accomplish - That is, do you actually want matched groups or just to test for the existence of a pattern to execute other code.
To match the values in your string, you would need to use boolean OR matching with a | - using the i flag will cause a case insensitive match so you don't need to call toLowerCase() -
var regex = /(rob|robert|john|johnson|robertjohnson)/i;
regex.match(name);
If you want a more complex regex to match on all of these variations -
var names = "rob, robert, john, johnson, robertjohnson, paul";
var regex = /\b((rob(ert)?)?\s?(john(son)?)?)\b/i;
var matches = regex.match(names);
This will result in the matches array having 5 elements (each of the names except "paul"). Worth noting that this would match additional names as well, such as "rob johnson" and "rob john" which may not be desired.
You can also just test if your string contains any of those terms using test() -
var name = "rob johnson";
var regex = /\b((rob(ert)?)?\s?(john(son)?)?)\b/i;
if (regex.test(name)){
alert('matches!');
}
/^\s*(rob(ert)?|john(son)?|robert *johnson)\s*$/i.test(str)
will return true if str matches either:
rob
robert
john
johnson
robertjohnson
robert johnson
robert johnson (spaces between these 2 does not matter)
and it just doesn't care if there are preceding or following empty characters. If you don't want that, delete \s* from the beginning and the end of the pattern. Also, the space and asterisk between name and surname allows 0 or more spaces between those two. If you don't want it to contain any space, just get rid of that space and asterisk.
The caret ^ indicates beginning of string and the dollar sign $ indicates end of string. Finally, the i flag at the end makes it search case insensitively.

How to detect a series of characters in a string?

For example, I have a string:
"This is the ### example"
I would like to substring the ### out of the above string?
The number of Hash keys may vary, so I would like to find out and replace the ### pattern with, say, 001 for example.
Can anybody help?
You can also do a replace. I am familiar with the C# version of this,
string stringValue = "Thia is the ### example";
stringValue.Replace("###", "");
This would remove ### completely from the above string. Again you would have to know the exact string.
In JavaScript, it's similar - .replace (with a lowercase r) is used. So:
var stringValue = "This is the ### example";
var replacedValue = stringValue.replace('###', '');
You'll want to investigate either "Regular Expressions" for this, or, if you know the precise position and length of the characters you are interested in, you can simply use String's .substring method.
If you want to capture multiple # characters, then you'll need regular expressions:
var myString = "This is #### the example";
var result = myString.replace(/#+/g, '');
If you want to remove the space too, you can use the regex /#+\s|\s#+|#+/.
If the rest of the string is known, just get the part that you need:
var example = str.substr(12, str.length - 20);
The javascript match method will return an array of substrings matching a regular expression. You can use this to determine the number of matching characters to be replaced. Assuming you want to replace each octothorpe with a random digit, you could use code like this:
var exampleStr = "This is the ### example";
var swapThese = exampleStr.match(/#/g);
if (swapThese) {
for (var i=0;i<swapThese.length;i++) {
var swapThis = new RegExp(swapThese[i]);
exampleStr = exampleStr.replace(swapThis,Math.floor(Math.random()*9));
}
}
alert(exampleStr); // or whatever you want to do with it
Note that the code only loops the length of the array if it's present: if (swapThese) {
This check is necessary because if the match method finds no matches, it returns null rather than an empty array. Trying to iterate through null value will break.

Categories

Resources