Extract content of code which start with a curly bracket and ends with a curly bracket followed by closing parenthesis - javascript

I'm completely mess with Regular Expressions right now(lack of practice).
I'm writing a node script, which goes through a bunch of js files, each file calls a function, with one of the arguments being a json. The aim is to get all those json arguments and place them in one file. The problem I'm facing at the moment is the extraction of the argument part of the code, here is the function call part of that string:
$translateProvider.translations('de', {
WASTE_MANAGEMENT: 'Abfallmanagement',
WASTE_TYPE_LIST: 'Abfallarten',
WASTE_ENTRY_LIST: 'Abfalleinträge',
WASTE_TYPE: 'Abfallart',
TREATMENT_TYPE: 'Behandlungsart',
TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
TREATMENT_TYPE_LIST: 'Behandlungsarten',
TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
SITE_TARGET: 'Gebäudeziel',
WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
WASTE_TYPE_ADD: 'Abfallart hinzufügen',
UNIT_ADD: 'Einheit hinzufügen'
})
So I'm trying to write a regular expression which matches the segment of the js code, which starts with "'de', {" and ends with "})", while it can have any characters between(single/double curly brackets included).
I tried something like this \'de'\s*,\s*{([^}]*)})\ , but that doesn't work. The furthest I got was with this \'de'\s*,\s*{([^})]*)}\ , but this ends at the first closing curly bracket within the json, which is not what I want.
It seems, that even the concepts of regular exressions I understood before, now I completely forgot.
Any is help is much appreciated.

You did not state the desired output. Here is a solution that parses the text, and creates an array of arrays. You can easily transform that to a desired output.
const input = `$translateProvider.translations('de', {
WASTE_MANAGEMENT: 'Abfallmanagement',
WASTE_TYPE_LIST: 'Abfallarten',
WASTE_ENTRY_LIST: 'Abfalleinträge',
WASTE_TYPE: 'Abfallart',
TREATMENT_TYPE: 'Behandlungsart',
TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
TREATMENT_TYPE_LIST: 'Behandlungsarten',
TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
SITE_TARGET: 'Gebäudeziel',
WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
WASTE_TYPE_ADD: 'Abfallart hinzufügen',
UNIT_ADD: 'Einheit hinzufügen'
})`;
const regex1 = /\.translations\([^{]*\{\s+(.*?)\s*\}\)/s;
const regex2 = /',[\r\n]+\s*/;
const regex3 = /: +'/;
let result = [];
let m = input.match(regex1);
if(m) {
result = m[1].split(regex2).map(line => line.split(regex3));
}
console.log(result);
Explanation of regex1:
\.translations\( -- literal .translations(
[^{]* -- anything not {
\{\s+ -- { and all whitespace
(.*?) -- capture group 1 with non-greedy scan up to:
\s*\}\) -- whitespace, followed by })
s flag to make . match newlines
Explanation of regex2:
',[\r\n]+\s* -- ',, followed by newlines and space (to split lines)
Explanation of regex3:
: +' -- literal : ' (to split key/value)
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

This can be done with lookahead, lookbehind, and boundary-type assertions:
/(?<=^\$translateProvider\.translations\('de', {)[\s\S]*(?=}\)$)/
(?<=^\$translateProvider\.translations\('de', {) is a lookbehind assertion that checks for '$translateProvider.translations('de', {' at the beginning of the string.
(?=}\)$) is a lookahead assertion that checks for '})' at the end of the string.
[\s\S]* is a character class that matches any sequence of space and non-space characters between the two assertions.
Here is the regex101 link for you to test
Hope this helps.

Related

Or Condition in a regular expression

I have a string in which I need to get the value between either "[ValueToBeFetched]" or "{ValueToBeFetched}".
var test = "I am \"{now}\" doing \"[well]\"";
test.match(/"\[.*?]\"/g)
the above regex serves the purpose and gets the value between square brackets and I can use the same for curly brackets also.
test.match(/"\{.*?}\"/g)
Is there a way to keep only one regex and do this, something like an or {|[ operator in regex.
I tried some scenarios but they don't seem to work.
Thanks in advance.
You could try following regex:
(?:{|\[).*?(?:}|\])
Details:
(?:{|\[): Non-capturing group, gets character { or [
.*?: gets as few as possible
(?:}|\]): Non-capturing group, gets character } or ]
Demo
Code in JavaScript:
var test = "I am \"{now}\" doing \"[well]\"";
var result = test.match(/"(?:{|\[).*?(?:}|\])"/g);
console.log(result);
Result:
["{now}", "[well]"]
As you said, there is an or operator which is |:
[Edited as suggested] Let's catch all sentences that begins with an "a" or a "b" :
/^((a|b).*)/gm
In this example, if the line parsed begins with a or b, the entire sentence will be catched in the first result group.
You may test your regex with an online regex tester
For your special case, try something like that, and use the online regex tester i mentionned before to understand how it works:
((\[|\{)\w*(\]|\}))

Convert escaped unicode sequence to Emoji in JS

I have a string in JS as follows. I am having a hard time converting these surrogate pairs to emoji's. Can someone help?
I have tried to get a solution online by searching almost everything that I could, but in vain.
var text = 'CONGRATS! Your task has been completed! Tell us how we did \\uD83D\\uDE4C \\uD83D\\uDC4D \\uD83D\\uDC4E'
This is a node.js code. Is there any easy way to convert these codes to emojis without using an external helper utility?
EDIT:
I updated my code and the regex as follows:
var text = 'CONGRATS! Your task has been completed! Tell us how we did {2722} {1F44D} {1F44E}';
text.replace(/\{[^}]*\}/ig, (_, g) => String.fromCodePoint(`0x${g}`))
What am I doing wrong?
One option could be to replace all Unicode escape sequences with their HEX representations and use String.fromCharCode() to replace it with its associated character:
const text = 'CONGRATS! Your task has been completed! Tell us how we did \\uD83D\\uDE4C \\uD83D\\uDC4D \\uD83D\\uDC4E';
const res = text.replace(/\\u([0-9A-F]{4})/ig, (_, g) => String.fromCharCode(`0x${g}`));
console.log(res);
As for your edit, your issue is with your regular expression. You can change it to be /\{([^}]*)\}/g, which means:
\{ - match an open curly brace.
([^}]*) - match and group the contents after the open curly brace which is not a closed curly brace }.
} - match a closed curly brace.
g - match the expression globally (so all occurrences of the expression, not just the first)
The entire regular expression will match {CONTENTS}, whereas the group will contain only the contents between the two curly braces, so CONTENTS. The match is the first argument provided to the .replace() callback function whereas the group (g) is provided as the second argument and is what we use:
const text = 'CONGRATS! Your task has been completed! Tell us how we did {2722} {1F44D} {1F44E}';
const res = text.replace(/\{([^}]*)\}/g, (_, g) => String.fromCodePoint(`0x${g}`));
console.log(res);

Regular Expression confused by use of double and single quotes

I have this JavaScript (running in Chrome 48.0.2564.103 m):
var s1 = 'label1="abc" label2=\'def\' ';
var s2 = 'label1="abc" label2=\'def\' label3="ghi"';
var re = /\b(\w+)\b=(['"]).*?def.*?\2/;
re.exec(s1); // --> ["label2='def'", "label2", "'"]
re.exec(s2); // --> ["label1="abc" label2='def' label3="", "label1", """]
The first exec() matches label2, as I intended. However, the second gets confused by the double quote after 'label3=' and matches label1 instead.
I had expected the use of .*? to tell the regular expression to make the match as tightly as possible, but clearly it doesn't always. Is there a way to tighten up my regular expression?
Just exclude what was seen as a quote
/\b(\w+)\b=(['"])(?:.(?!\2))*def(?:.(?!\2))*.?\2/
So the change was replacing your .*? with (?:.(?!\2))*.
Break down:
(?!) is negative look ahead, non-capturing
(?:) is non-capturing group.
The last letter right before the closing quote would not match if it's not def, need .? to fix
This allows you to combine other rules when you want to allow a='\'' or a="\"" or further a="\\\"":
/\b(\w+)\b=(['"])(?:\\\\|\\\2|.(?!\2))*def(?:\\\\|\\\2|.(?!\2))*.?\2/
The reason s2 gives a different result is that you add a " on the right side of the "def" after label2, which allows the pattern to correctly match everything between the first and last double quote in the string.
I can only guess that the reason a sparse match (?) doesn't have any effect is that at that point the regex engine has already decided to match " rather than '. Regex does its thing left-to-right after all.
The "simplest" way of solving this is to match only non-quotes, rather than using ., between the quotes:
var re = /\b(\w+)\b=(['"])[^'"]*def[^'"]*\2/;
re.exec(s1); // --> ["label2='def'", "label2", "'"]
re.exec(s2); // --> ["label2='def'", "label2", "'"]
The problem with this is that now you can't put any kind of quotes in the value, even if they are perfectly legal:
// This won't match because of the " after def
var s2 = 'label1="abc" label2=\'def"\' label3="ghi"'
// This won't match because there's an escaped single quote in the value
var s2 = 'label1="abc" label2=\'def\\\'\' label3="ghi"'
But basically, regex isn't made for parsing HTML, so if these limitations are a problem you should look into proper parsing.

Restrict action of toLowerCase to part of a string?

I want to convert most of a string to lower case, except for those characters inside of brackets. After converting everything outside the brackets to lower case, I then want to remove the brackets. So giving {H}ell{o} World as input should give Hello world as output. Removing the brackets is simple, but is there a way to selectively make everything outside the brackets lower case with regular expressions? If there's no simple regex solution, what's the easiest way to do this in javascript?
You can try this:
var str='{H}ell{o} World';
str = str.replace(/{([^}]*)}|[^{]+/g, function (m,p1) {
return (p1)? p1 : m.toLowerCase();} );
console.log(str);
The pattern match:
{([^}]*)} # all that is between curly brackets
# and put the content in the capture group 1
| # OR
[^{]+ # anything until the regex engine meet a {
# since the character class is all characters but {
the callback function has two arguments:
m the complete match
p1 the first capturing group
it returns p1 if p1 is not empty
else the whole match m in lowercase.
Details:
"{H}" p1 contains H (first part of the alternation)
p1 is return as it. Note that since the curly brackets are
not captured, they are not in the result. -->"H"
"ell" (second part of the alternation) p1 is empty, the full match
is returned in lowercase -->"ell"
"{o}" (first part) -->"o"
" World" (second part) -->" world"
I think this is probably what you are looking for:
Change case using Javascript regex
Detect on the first curly brace instead of a hyphen.
Assuming that all parentheses are well balanced, the parts that should be lower cased are contained like this:
Left hand side is either the start of your string or }
Right hand side is either the end of your string or {
This the code that would work:
var str = '{H}ELLO {W}ORLD';
str.replace(/(?:^|})(.*?)(?:$|{)/g, function($0, $1) {
return $1.toLowerCase();
});
// "Hello World"
I would amend #Jack s solution as follows :
var str = '{H}ELLO {W}ORLD';
str = str.replace (/(?:^|\})(.*?)(?:\{|$)/g, function($0, $1) {
return $1.toLowerCase ();
});
Which performs both the lower casing and the bracket removal in one operation!

Regex to get string between curly braces

Unfortunately, despite having tried to learn regex at least one time a year for as many years as I can remember, I always forget as I use them so infrequently. This year my new year's resolution is to not try and learn regex again - So this year to save me from tears I'll give it to Stack Overflow. (Last Christmas remix).
I want to pass in a string in this format {getThis}, and be returned the string getThis. Could anyone be of assistance in helping to stick to my new year's resolution?
Related questions on Stack Overflow:
How can one turn regular quotes (i.e. ', ") into LaTeX/TeX quotes (i.e. `', ``'')
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Try
/{(.*?)}/
That means, match any character between { and }, but don't be greedy - match the shortest string which ends with } (the ? stops * being greedy). The parentheses let you extract the matched portion.
Another way would be
/{([^}]*)}/
This matches any character except a } char (another way of not being greedy)
/\{([^}]+)\}/
/ - delimiter
\{ - opening literal brace escaped because it is a special character used for quantifiers eg {2,3}
( - start capturing
[^}] - character class consisting of
^ - not
} - a closing brace (no escaping necessary because special characters in a character class are different)
+ - one or more of the character class
) - end capturing
\} - the closing literal brace
/ - delimiter
If your string will always be of that format, a regex is overkill:
>>> var g='{getThis}';
>>> g.substring(1,g.length-1)
"getThis"
substring(1 means to start one character in (just past the first {) and ,g.length-1) means to take characters until (but not including) the character at the string length minus one. This works because the position is zero-based, i.e. g.length-1 is the last position.
For readers other than the original poster: If it has to be a regex, use /{([^}]*)}/ if you want to allow empty strings, or /{([^}]+)}/ if you want to only match when there is at least one character between the curly braces. Breakdown:
/: start the regex pattern
{: a literal curly brace
(: start capturing
[: start defining a class of characters to capture
^}: "anything other than }"
]: OK, that's our whole class definition
*: any number of characters matching that class we just defined
): done capturing
}: a literal curly brace must immediately follow what we captured
/: end the regex pattern
Try this:
/[^{\}]+(?=})/g
For example
Welcome to RegExr v2.1 by #{gskinner.com}, #{ssd.sd} hosted by Media Temple!
will return gskinner.com, ssd.sd.
Try this
let path = "/{id}/{name}/{age}";
const paramsPattern = /[^{}]+(?=})/g;
let extractParams = path.match(paramsPattern);
console.log("extractParams", extractParams) // prints all the names between {} = ["id", "name", "age"]
Here's a simple solution using javascript replace
var st = '{getThis}';
st = st.replace(/\{|\}/gi,''); // "getThis"
As the accepted answer above points out the original problem is easily solved with substring, but using replace can solve the more complicated use cases
If you have a string like "randomstring999[fieldname]"
You use a slightly different pattern to get fieldname
var nameAttr = "randomstring999[fieldname]";
var justName = nameAttr.replace(/.*\[|\]/gi,''); // "fieldname"
This one works in Textmate and it matches everything in a CSS file between the curly brackets.
\{(\s*?.*?)*?\}
selector {.
.
matches here
including white space.
.
.}
If you want to further be able to return the content, then wrap it all in one more set of parentheses like so:
\{((\s*?.*?)*?)\}
and you can access the contents via $1.
This also works for functions, but I haven't tested it with nested curly brackets.
You want to use regex lookahead and lookbehind. This will give you only what is inside the curly braces:
(?<=\{)(.*?)(?=\})
i have looked into the other answers, and a vital logic seems to be missing from them . ie, select everything between two CONSECUTIVE brackets,but NOT the brackets
so, here is my answer
\{([^{}]+)\}
Regex for getting arrays of string with curly braces enclosed occurs in string, rather than just finding first occurrence.
/\{([^}]+)\}/gm
var re = /{(.*)}/;
var m = "{helloworld}".match(re);
if (m != null)
console.log(m[0].replace(re, '$1'));
The simpler .replace(/.*{(.*)}.*/, '$1') unfortunately returns the entire string if the regex does not match. The above code snippet can more easily detect a match.
Try this one, according to http://www.regextester.com it works for js normaly.
([^{]*?)(?=\})
This one matches everything even if it finds multiple closing curly braces in the middle:
\{([\s\S]*)\}
Example:
{
"foo": {
"bar": 1,
"baz": 1,
}
}
You can use this regex recursion to match everythin between, even another {} (like a JSON text) :
\{([^()]|())*\}
Even this helps me while trying to solve someone's problem,
Split the contents inside curly braces ({}) having a pattern like,
{'day': 1, 'count': 100}.
For example:
#include <iostream>
#include <regex>
#include<string>
using namespace std;
int main()
{
//string to be searched
string s = "{'day': 1, 'count': 100}, {'day': 2, 'count': 100}";
// regex expression for pattern to be searched
regex e ("\\{[a-z':, 0-9]+\\}");
regex_token_iterator<string::iterator> rend;
regex_token_iterator<string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) cout << " [" << *a++ << "]";
cout << endl;
return 0;
}
Output:
[{'day': 1, 'count': 100}] [{'day': 2, 'count': 100}]
Your can use String.slice() method.
let str = "{something}";
str = str.slice(1,-1) // something

Categories

Resources