regular expression to extract two items from a long string - javascript

There are some strings having the following type of format,
{abc=1234457, cde=3, label=3352-4e9a-9022-1067ca63} <chve> abc? 123.456.789, http=appl.com
I would like to extract 1234457 and 3352-4e9a-9022-1067ca63, which correspond to abc and label respectively.
This is the javascript I have been trying to use, but it does not work. I think the regular expression part is wrong.
var headerPattern = new RegExp("\{abc=([\d]*),,label=(.*)(.*)");
if (headerPattern.test(row)) {
abc = headerPattern.exec(row)[0];
label = headerPattern.exec(row)[1];
}

Try: abc=(\d*).*?label=([^}]*)
Explanation
abc= literal match
(\d*) catch some numbers
.*? Lazy match
label= literal match
([^}]*) catch all the things that aren't the closing brace

Here is what I came up with:
\{abc=(\d+).*label=(.+)\}.*
Your have two problems in \{abc=([\d]*),,label=(.*)(.*):
Using abc=([\d]*),,, you are looking for abc=([\d]*) followed by the literal ,,. You should use .* instead. Since .* is nongreedy be default, it will not match past the label.
By using label=(.*)(.*), the first .* captures all the remaining text. You want to only catch text until the edge of the braces, so use (.*)}.*.
Disclaimer: Made with a Java-based regex tester. If anything in JavaScript regexes would invalidate this, feel free to comment.

You can do it the following way:
var row = '{abc=1234457, cde=3, label=3352-4e9a-9022-1067ca63} <chve> abc? 123.456.789, http=appl.com';
var headerPatternResult = /{abc=([0-9]+),.*?label=([a-z0-9\-]+)}/.exec(row);
if (headerPatternResult !== null) {
var abc = headerPatternResult[1];
var label = headerPatternResult[2];
console.log('abc: ' + abc);
console.log('label: ' + label);
}

Related

Regexp to capture comma separated values

I have a string that can be a comma separated list of \w, such as:
abc123
abc123,def456,ghi789
I am trying to find a JavaScript regexp that will return ['abc123'] (first case) or ['abc123', 'def456', 'ghi789'] (without the comma).
I tried:
^(\w+,?)+$ -- Nope, as only the last repeating pattern will be matched, 789
^(?:(\w+),?)+$ -- Same story. I am using non-capturing bracket. However, the capturing just doesn't seem to happen for the repeated word
Is what I am trying to do even possible with regexp? I tried pretty much every combination of grouping, using capturing and non-capturing brackets, and still not managed to get this happening...
If you want to discard the whole input when there is something wrong, the simplest way is to validate, then split:
if (/^\w+(,\w+)*$/.test(input)) {
var values = input.split(',');
// Process the values here
}
If you want to allow empty value, change \w+ to \w*.
Trying to match and validate at the same time with single regex requires emulation of \G feature, which assert the position of the last match. Why is \G required? Since it prevents the engine from retrying the match at the next position and bypass your validation. Remember than ECMA Script regex doesn't have look-behind, so you can't differentiate between the position of an invalid character and the character(s) after it:
something,=bad,orisit,cor&rupt
^^ ^^
When you can't differentiate between the 2 positions, you can't rely on the engine to do a match-all operation alone. While it is possible to use a while loop with RegExp.exec and assert the position of last match yourself, why would you do so when there is a cleaner option?
If you want to savage whatever available, torazaburo's answer is a viable option.
Live demo
Try this regex :
'/([^,]+)/'
Alternatively, strings in javascript have a split method that can split a string based on a delimeter:
s.split(',')
Split on the comma first, then filter out results that do not match:
str.split(',').filter(function(s) { return /^\w+$/.test(s); })
This regex pattern separates numerical value in new line which contains special character such as .,,,# and so on.
var val = [1234,1213.1212, 1.3, 1.4]
var re = /[0-9]*[0-9]/gi;
var str = "abc123,def456, asda12, 1a2ass, yy8,ghi789";
var re = /[a-z]{3}\d{3}/g;
var list = str.match(re);
document.write("<BR> list.length: " + list.length);
for(var i=0; i < list.length; i++) {
document.write("<BR>list(" + i + "): " + list[i]);
}
This will get only "abc123" code style in the list and nothing else.
May be you can use split function
var st = "abc123,def456,ghi789";
var res = st.split(',');

Need a regex that finds "string" but not "[string]"

I'm trying to build a regular expression that parses a string and skips things in brackets.
Something like
string = "A bc defg hi [hi] jkl mnop.";
The .match() should return "hi" but not [hi]. I've spent 5 hours running through RE's but I'm throwing in the towel.
Also this is for javascript or jquery if that matters.
Any help is appreciated. Also I'm working on getting my questions formatted correctly : )
EDIT:
Ok I just had a eureka moment and figured out that the original RegExp I was using actually did work. But when I was replaces the matches with the [matches] it simply replaced the first match in the string... over and over. I thought this was my regex refusing to skip the brackets but after much time of trying almost all of the solutions below, I realized that I was derping Hardcore.
When .replace was working its magic it was on the first match, so I quite simply added a space to the end of the result word as follows:
var result = string.match(regex);
var modifiedResult = '[' + result[0].toString() + ']';
string.replace(result[0].toString() + ' ', modifiedResult + ' ');
This got it to stop targeting the original word in the string and stop adding a new set of brackets to it with every match. Thank you all for your help. I am going to give answer credit to the post that prodded me in the right direction.
preprocess the target string by removing everything between brackets before trying to match your RE
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*\]/, "")
then apply your RE to tmpstring
correction: made the match for brackets eager per nhahtd comment below, and also, made the RE global
string = "A bc defg hi [hi] jkl mnop."
tmpstring = string.replace(/\[.*?\]/g, "")
You don't necessarily need regex for this. Simply use string manipulation:
var arr = string.split("[");
var final = arr[0] + arr[1].split("]")[1];
If there are multiple bracketed expressions, use a loop:
while (string.indexOf("[") != -1){
var arr = string.split("[");
string = arr[0] + arr.slice(1).join("[").split("]").slice(1).join("]");
}
Using only Regular Expressions, you can use:
hi(?!])
as an example.
Look here about negative lookahead: http://www.regular-expressions.info/lookaround.html
Unfortunately, javascript does not support negative lookbehind.
I used http://regexpal.com/ to test, abcd[hi]jkhilmnop as test data, hi(?!]) as the regex to find. It matched 'hi' without matching '[hi]'. Basically it matched the 'hi' so long as there was not a following ']' character.
This of course, can be expanded if needed. This has a benefit of not requiring any pre-processing for the string.
r"\[(.*)\]"
Just play arounds with this if you wanto to use regular expressions.
What do yo uwant to do with it? If you want to selectively replace parts like "hi" except when it's "[hi]", then I often use a system where I match what I want to avoid first and then what I want to watch; if it matches what I want to avoid then I return the match, otherwise I return the processed match.
Like this:
return string.replace(/(\[\w+\])|(\w+)/g, function(all, m1, m2) {return m1 || m2.toUpperCase()});
which, with the given string, returns:
"A BC DEFG HI [hi] JKL MNOP."
Thus: it replaces every word with uppercase (m1 is empty), except if the word is between square brackets (m1 is not empty).
This builds an array of all the strings contained in [ ]:
var regex = /\[([^\]]*)\]/;
var string = "A bc defg hi [hi] [jkl] mnop.";
var results=[], result;
while(result = regex.exec(string))
results.push(result[1]);
edit
To answer to the question, this regex returns the string less all is in [ ], and trim whitespaces:
"A bc defg [hi] mnop [jkl].".replace(/(\s{0,1})\[[^\]]*\](\s{0,1})/g,'$1')
Instead of skipping the match you can probably try something different - match everything but do not capture the string within square brackets (inclusive) with something like this:
var r = /(?:\[.*?[^\[\]]\])|(.)/g;
var result;
var str = [];
while((result = r.exec(s)) !== null){
if(result[1] !== undefined){ //true if [string] matched but not captured
str.push(result[1]);
}
}
console.log(str.join(''));
The last line will print parts of the string which do not match the [string] pattern. For example, when called with the input "A [bc] [defg] hi [hi] j[kl]u m[no]p." the code prints "A hi ju mp." with whitespaces intact.
You can try different things with this code e.g. replacing etc.

Javascript Regular expression to remove unwanted <br>,

I have a JS stirng like this
<div id="grouplogo_nav"><br> <ul><br> <li><a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/">&nbsp;</a></li><br> </ul><br> </div>
I need to remove all <br> and $nbsp; that are only between > and <. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.
EDIT :
Please note i want to remove only the tags b/w > and <
Avoid using regex on html!
Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:
var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
if(nodes[nodeId].tagName === 'br') {
tempDiv.removeChild(nodes[nodeId]);
}
}
var newStr = tempDiv.innerHTML;
Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.
http://jsfiddle.net/fxfrt/
myString = myString.replace(/^( |<br>)+/, '');
... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes " or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.
s.replace(/(>)(?: |<br>)+(\s?<)/g,'$1$2');
Don't use this in production. See the answer from Phil H.
Edit: I try to explain it a bit and hope my english is good enough.
Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so or <br> must be there one or more times. The last part (\s?<) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters <. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.
MDN provides a nice table, which explains all the special characters.
You need to replace globally. Also don't forget that you can have the being closed . Try this:
myString = myString.replace(/( |<br>|<br \/>)/g, '');
This worked for me, please note for the multi lines
myString = myString.replace(/( |<br>|<br \/>)/gm, '');
myString = myString.replace(/^( |<br>)+/, '');
hope this helps

Javascript regular expression to replace word but not within curly brackets

I have some content, for example:
If you have a question, ask for help on StackOverflow
I have a list of synonyms:
a={one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight}
ask={question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate}
I'm using JavaScript to:
Split synonyms based on =
Looping through every synonym, if found in content replace with {...|...}
The output should look like:
If you have {one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight} question, {question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate} for help on StackOverflow
Problem:
Instead of replacing the entire word, it's replacing every character found. My code:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp(word, "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
It should replace content word with synonym which should not be in {...|...}.
When you build the regexps, you need to include word boundary anchors at both the beginning and the end to match whole words (beginning and ending with characters from [a-zA-Z0-9_]) only:
var match = new RegExp("\\b" + word + "\\b", "ig");
Depending on the specific replacements you are making, you might want to apply your method to individual words (rather than to the entire text at once) matched using a regexp like /\w+/g to avoid replacing words that themselves are the replacements for others. Something like:
content = content.replace(/\w+/g, function(word) {
for(var i = 0, L = allSyn.length; i < L; ++i) {
var rtnSyn = allSyn[syn].split("=");
var synonym = (rtnSyn[1]).trim();
if(synonym && rtnSyn[0].toLowerCase() == word.toLowerCase()) return synonym;
}
});
Regular expressions include something called a "word-boundary", represented by \b. It is a zero-width assertion (it just checks something, it doesn't "eat" input) that says in order to match, certain word boundary conditions have to apply. One example is a space followed by a letter; given the string ' X', this regex would match it: / \bX/. So to make your code work, you just have to add word boundaries to the beginning and end of your word regex, like this:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp("\\b"+word+"\\b", "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
[Note that there are two backslashes in each of the word boundary matchers because in javascript strings, the backslash is for escape characters -- two backslashes turns into a literal backslash.]
For optimization, don't create a new RegExp on each iteration. Instead, build up a big regex like [^{A-Za-z](a|ask|...)[^}A-Za-z] and an hash with a value for each key specifying what to replace it with. I'm not familiar enough with JavaScript to create the code on the fly.
Note the separator regex which says the match cannot begin with { or end with }. This is not terribly precise, but hopefully acceptable in practice. If you genuinely need to replace words next to { or } then this can certainly be refined, but I'm hoping we won't have to.

Regex to get string between curly braces

Unfortunately, despite having tried to learn regex at least one time a year for as many years as I can remember, I always forget as I use them so infrequently. This year my new year's resolution is to not try and learn regex again - So this year to save me from tears I'll give it to Stack Overflow. (Last Christmas remix).
I want to pass in a string in this format {getThis}, and be returned the string getThis. Could anyone be of assistance in helping to stick to my new year's resolution?
Related questions on Stack Overflow:
How can one turn regular quotes (i.e. ', ") into LaTeX/TeX quotes (i.e. `', ``'')
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Try
/{(.*?)}/
That means, match any character between { and }, but don't be greedy - match the shortest string which ends with } (the ? stops * being greedy). The parentheses let you extract the matched portion.
Another way would be
/{([^}]*)}/
This matches any character except a } char (another way of not being greedy)
/\{([^}]+)\}/
/ - delimiter
\{ - opening literal brace escaped because it is a special character used for quantifiers eg {2,3}
( - start capturing
[^}] - character class consisting of
^ - not
} - a closing brace (no escaping necessary because special characters in a character class are different)
+ - one or more of the character class
) - end capturing
\} - the closing literal brace
/ - delimiter
If your string will always be of that format, a regex is overkill:
>>> var g='{getThis}';
>>> g.substring(1,g.length-1)
"getThis"
substring(1 means to start one character in (just past the first {) and ,g.length-1) means to take characters until (but not including) the character at the string length minus one. This works because the position is zero-based, i.e. g.length-1 is the last position.
For readers other than the original poster: If it has to be a regex, use /{([^}]*)}/ if you want to allow empty strings, or /{([^}]+)}/ if you want to only match when there is at least one character between the curly braces. Breakdown:
/: start the regex pattern
{: a literal curly brace
(: start capturing
[: start defining a class of characters to capture
^}: "anything other than }"
]: OK, that's our whole class definition
*: any number of characters matching that class we just defined
): done capturing
}: a literal curly brace must immediately follow what we captured
/: end the regex pattern
Try this:
/[^{\}]+(?=})/g
For example
Welcome to RegExr v2.1 by #{gskinner.com}, #{ssd.sd} hosted by Media Temple!
will return gskinner.com, ssd.sd.
Try this
let path = "/{id}/{name}/{age}";
const paramsPattern = /[^{}]+(?=})/g;
let extractParams = path.match(paramsPattern);
console.log("extractParams", extractParams) // prints all the names between {} = ["id", "name", "age"]
Here's a simple solution using javascript replace
var st = '{getThis}';
st = st.replace(/\{|\}/gi,''); // "getThis"
As the accepted answer above points out the original problem is easily solved with substring, but using replace can solve the more complicated use cases
If you have a string like "randomstring999[fieldname]"
You use a slightly different pattern to get fieldname
var nameAttr = "randomstring999[fieldname]";
var justName = nameAttr.replace(/.*\[|\]/gi,''); // "fieldname"
This one works in Textmate and it matches everything in a CSS file between the curly brackets.
\{(\s*?.*?)*?\}
selector {.
.
matches here
including white space.
.
.}
If you want to further be able to return the content, then wrap it all in one more set of parentheses like so:
\{((\s*?.*?)*?)\}
and you can access the contents via $1.
This also works for functions, but I haven't tested it with nested curly brackets.
You want to use regex lookahead and lookbehind. This will give you only what is inside the curly braces:
(?<=\{)(.*?)(?=\})
i have looked into the other answers, and a vital logic seems to be missing from them . ie, select everything between two CONSECUTIVE brackets,but NOT the brackets
so, here is my answer
\{([^{}]+)\}
Regex for getting arrays of string with curly braces enclosed occurs in string, rather than just finding first occurrence.
/\{([^}]+)\}/gm
var re = /{(.*)}/;
var m = "{helloworld}".match(re);
if (m != null)
console.log(m[0].replace(re, '$1'));
The simpler .replace(/.*{(.*)}.*/, '$1') unfortunately returns the entire string if the regex does not match. The above code snippet can more easily detect a match.
Try this one, according to http://www.regextester.com it works for js normaly.
([^{]*?)(?=\})
This one matches everything even if it finds multiple closing curly braces in the middle:
\{([\s\S]*)\}
Example:
{
"foo": {
"bar": 1,
"baz": 1,
}
}
You can use this regex recursion to match everythin between, even another {} (like a JSON text) :
\{([^()]|())*\}
Even this helps me while trying to solve someone's problem,
Split the contents inside curly braces ({}) having a pattern like,
{'day': 1, 'count': 100}.
For example:
#include <iostream>
#include <regex>
#include<string>
using namespace std;
int main()
{
//string to be searched
string s = "{'day': 1, 'count': 100}, {'day': 2, 'count': 100}";
// regex expression for pattern to be searched
regex e ("\\{[a-z':, 0-9]+\\}");
regex_token_iterator<string::iterator> rend;
regex_token_iterator<string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) cout << " [" << *a++ << "]";
cout << endl;
return 0;
}
Output:
[{'day': 1, 'count': 100}] [{'day': 2, 'count': 100}]
Your can use String.slice() method.
let str = "{something}";
str = str.slice(1,-1) // something

Categories

Resources