Match a specific syntax with regex

Match a specific syntax with regex - javascript

I need to break this text and grab objects in a separated form.
object {
child {
}
}
object {
}
I am no regex expert, but after attempting, the best pattern I got to was something like so:
(.)*{(.|\n)*}/ig
But when applying it to the above text, it'll match it all, I can see why, but I don't know what else I could do to actually make it break the results into separate sections.
Edit:
To be more clear, in the text I provided, I'd like to have matched groups, from 'object {' to the closing '}', while including everything inside of it.
And to visualize it:
Matched group #1:
object {
child {
}
}
Matched group #2:
object {
}
*Just to clarify, 'object' and 'child' are only examples, and I want the pattern to match any names, with an option to have a child with an identical name as it's parent

If I understand your question correctly, you want to match this:
object {
child {
}
}
and this:
object {
}
as two separate matches. In that case, you just need to make your quantifier non-greedy:
(.)*{(.|\n)*?}
The ? makes the * non-greedy, so instead of taking as much as possible, it'll take as little as possible.
Your original matches everything from the first { to the last } because it's greedy and that, of course, ends up grabbing everything.
The problem with the above is that it misses the last closing bracket on the first object because of nesting. You can fix this for the first level of nesting like this:
(.)*{({(.|\n)*?}|.|\n)*?}
By adding the clause {(.|\n)*?} as another alternative you now match the nested child correctly. But of course, the problem is that if you have another nested object then it'll be broken again!
Unfortunately, javascript's regex engine doesn't support recursion (some do), so you might need to take a different approach.

object\s*{(?:(?!\bobject\b)[\s\S])*}
Try this.See demo.
https://regex101.com/r/sH8aR8/16
var re = /object\s*{(?:(?!\bobject\b)[\s\S])*}/g;
var str = 'object {\n child {\n\n }\n}\nobject {\n\n}';
var m;
while ((m = re.exec(str)) != null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
// View your result using the m-variable.
// eg m[0] etc.
}

Related

Javascript Unnecessarily Compact Array Operations

So, for my own knowledge, and because I love looking at things from a different perspective...
I have the following bite of Javascript code, which for this problem's intents and purposes will only ever receive strings formatted like this: "wordone wordtwo":
function inName(inputName) {
return inputName.split(" ")[1].toUpperCase();
}
However, this only returns half of what I want ("WORDTWO"). I desire to return the original string with a single change: the space-separated second word returned through the toUpperCase(); and then re-concatenated to the untouched first word.
I also want to unnecessarily run all of the operations on the return line. My brain says this is possible, given how as the compiler reads the line from left to right and makes adjustments to the available member functions based on what has resolved. Also everything in Javascript is an object, correct?
Help me out for my own curiosity's sake, or bash me over the head with my own misconceptions.
Here is a solved version of the above question using 'normal' statements:
function inName(inputName) {
var nameArray=inputName.split(" ");
nameArray[1]=nameArray[1].toUpperCase();
return nameArray.join(" ");
}

One line with substr, indexOf and a variable on the fly ;-)
function inName(inputName) {
return inputName.substr(0, (index = inputName.indexOf(' '))) + inputName.substr(index).toUpperCase();
}

Here's another option which avoids the regular expression:
function inName(inputName) {
return inputName.split(' ').map(function(v,i){return i?v.toUpperCase():v;}).join(' ');
}
This does the same split as the original code, then maps the parts to a function which returns the value at index 0 unchanged but the value at index 1 in upper case. Then the two results are joined back together with a space.
As others have said, a longer, clearer version is better in practice than trying to come up with a clever one-liner. Defining a function inside the return statement feels like cheating anyway ;-)

Something like this almost seems like it belongs on Code Golf, but here's my take:
function inName(inputName) {
return inputName.replace(/ .*/,function(m) {return m.toUpperCase();});
}

Interesting. Here is my take on the problem
function justDoIt(str){
return [str = str.split(" ") , str.pop().toUpperCase()].join(" ");
}
Creates a new array, str is split and reassigned as an array, and the first item of the new array, then the second new array item pops the last word, makes it uppercase, puts it into the new array. Then joins the array [["wordOne"],"WORDTWO"].join(" ")

CodeMirror - Using RegEx with overlay

I can't seem to find an example of anyone using RegEx matches to create an overlay in CodeMirror. The Moustaches example matching one thing at a time seems simple enough, but in the API, it says that the RegEx match returns the array of matches and I can't figure out what to do with it in the context of the structure in the moustaches example.
I have a regular expression which finds all the elements I need to highlight: I've tested it and it works.
Should I be loading up the array outside of the token function and then matching each one? Or is there a way to work with the array?
The other issue is that I want to apply different styling depending on the (biz|cms) option in the regex - one for 'biz' and another for 'cms'. There will be others but I'm trying to keep it simple.
This is as far as I have got. The comments show my confusion.
CodeMirror.defineMode("tbs", function(config, parserConfig) {
var tbsOverlay = {
token: function(stream, state) {
tbsArray = match("^<(biz|cms).([a-zA-Z0-9.]*)(\s)?(\/)?>");
if (tbsArray != null) {
for (i = 0; i < tbsArray.length; i++) {
var result = tbsArray[i];
//Do I need to stream.match each element now to get hold of each bit of text?
//Or is there some way to identify and tag all the matches?
}
}
//Obviously this bit won't work either now - even with regex
while (stream.next() != null && !stream.match("<biz.", false)) {}
return null;
}
};
return CodeMirror.overlayMode(CodeMirror.getMode(config, parserConfig.backdrop || "text/html"), tbsOverlay);
});

It returns the array as produced by RegExp.exec or String.prototype.match (see for example https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/match), so you probably don't want to iterate through it, but rather pick out specific elements the correspond to groups in your regexp (if (result[1] == "biz") ...)

Look at implementation of Code Mirror method match() and you'll see, that it processes method parameter for two types: string and RegExp.
Your constant in
stream.match("<biz.")
is of string type.
Define it in RegExp type:
tbsArray = /<biz./g
Thus, your stream will be matched with RegExp.

What unicode character I can use to "flag" a string?

I want to represent an object that has several text properties, every one representing the same text value but in different languages. In case the user modifies a single field, the other fields should be revised, and I'm thinking on adding a single Unicode character at the beginning of the string of the other fields, and then to check for fields that need attention, I just have to check the value at obj.text_prop[0].
Which Unicode character can I use for this purpose? Ideally, it would be non-printable, supported in JS and JSON.

Such flagging should be done some other way, at a protocol level other than character level. For example, consider as making each language version an object rather than just a string; the object could then have properties such as needsAttention in addition to the property that contains the string.
But in case you need to embed such information into a string, then you could use ZERO WIDTH SPACE U+200B. As such it means line break opportunity, but this should not disturb here. The main problem is probably that old versions of IE may display it as a small rectangle.
Alternatively, you could use a noncharacter code point such as U+FFFF, if you can make sure that the string is never sent anywhere from the program without removing this code point. As described in Ch. 16 of the Unicode Standard, Special Areas and Format Characters, noncharacter code points are reserved for internal use in an application and should never be used in text interchange.

I would suggest you not to use strange characters in the beginning of the line. You can implement something like this:
<script type="text/javascript">
function LocalizationSet(){};
LocalizationSet.prototype.localizationItems = [];
LocalizationSet.prototype.itemsNeedAttention = [];
LocalizationSet.prototype.setLocalization = function(langId, text)
{
this.localizationItems[langId] = text;
this.itemsNeedAttention[langId] = true;
}
LocalizationSet.prototype.getLocalization = function(langId)
{
return this.localizationItems[langId];
}
LocalizationSet.prototype.needsAttention = function(langId)
{
if(this.itemsNeedAttention[langId] == null)
{
return false;
}
return this.itemsNeedAttention[langId];
}
LocalizationSet.prototype.unsetAttentionFlags = function()
{
for(var it in this.itemsNeedAttention)
{
this.itemsNeedAttention[it] = false;
}
}
//Example
var set = new LocalizationSet();
set.setLocalization("en","Hello");
set.setLocalization("de","Willkommen");
alert(set.needsAttention("en"));
alert(set.needsAttention("de"));
set.unsetAttentionFlags();
alert(set.needsAttention("en"));
set.setLocalization("en","Hi");
alert(set.needsAttention("en"));
//Shows true,true,false,true
</script>

JS - jQuery inarray ignoreCase() and contains()

well, I am more of a PHP person, and my JS skills are close to none when it comes to any JS other than simple design related operations , so excuse me if I am asking the obvious .
the following operations would be a breeze in PHP (and might also be in JS - but I am fighting with unfamiliar syntax here ...)
It is some sort of input validation
var ar = ["BRS201103-0783-CT-S", "MAGIC WORD", "magic", "Words", "Magic-Word"];
jQuery(document).ready(function() {
jQuery("form#searchreport").submit(function() {
if (jQuery.inArray(jQuery("input:first").val(), ar) != -1){
jQuery("#contentresults").delay(800).show("slow");
return false;
}
This question has 2 parts .
1 - how can I make it possible for the array to be case insensitive ?
E.g. - BRS201103-0783-CT-S will give the same result as brs201103-0783-ct-s AND Brs201103-0783-CT-s or MAGIC magic Magic MaGIc
basically i need something like ignoreCase() for array , but I could not find any reference to that in jQuery nor JS...
I tried toLowerCase() - but It is not working on the array (ittirating??) and also, would it resolve the mixed case ?
2 - How can I make the function to recognize only parts or
combinations of the elements ?
E.g. - if one types only "word" , I would like it to pass as "words" , and also if someone types "some word" it should pass (containing "word" )

Part 1
You can process your array to be entirely lowercase, and lowercase your input so indexOf() will work like it's performing a case insensitive search.
You can lowercase a string with toLowerCase() as you've already figured out.
To do an array, you can use...
arr = arr.map(function(elem) { return elem.toLowerCase(); });
Part 2
You could check for a substring, for example...
// Assuming you've already transformed the input and array to lowercase.
var input = "word";
var words = ["word", "words", "wordly", "not"];
var found = words.some(function(elem) { return elem.indexOf(input) != -1; });
Alternatively, you could skip in this instance transforming the array to be all lowercase by calling toLowerCase() on each elem before you check indexOf().
some() and map() aren't supported in older IEs, but are trivial to polyfill. An example of a polyfill for each is available at the linked documentation.
As Fabrício Matté also pointed out, you can use the jQuery equivalents here, $.map() for Array.prototype.map() and $.grep() with length property for Array.prototype.some(). Then you will get the browser compatibility for free.

To check if an array contains an element, case-insensitive, I used this code:
ret = $.grep( array, function (n,i) {
return ( n && n.toLowerCase().indexOf(elem.toLowerCase())!=-1 );
}) ;
Here is a fiddle to play with
array match case insensitive

Best way to store JS Regex capturing groups in array?

Exactly what title asks. I'll provide some examples while explaining my question.
Test string:
var test = "#foo# #foo# bar #foo#";
Say, I want to extract all text between # (all foos but not bar).
var matches = test.match(/#(.*?)#/g);
Using .match as above, it'll store all matches but it'll simply throw away the capturing groups it seems.
var matches2 = /#(.*?)#/g.exec(test);
The .exec method apparently returns only the first result's matched string in the position 0 of the array and my only capturing group of that match in the position 1.
I've exhausted SO, Google and MDN looking for an answer to no avail.
So, my question is, is there any better way to store only the matched capturing groups than looping through it with .exec and calling array.push to store the captured groups?
My expected array for the test above should be:
[0] => (string) foo
[1] => (string) foo
[2] => (string) foo
Pure JS and jQuery answers are accepted, extra cookies if you post JSFiddle with console.log. =]

You can use .exec too like following to build an array
var arr = [],
s = "#foo# #bar# #test#",
re = /#(.*?)#/g,
item;
while (item = re.exec(s))
arr.push(item[1]);
alert(arr.join(' '));
Working Fiddle
Found from Here
Well, it still has a loop, if you dont want a loop then I think you have to go with .replace(). In which case the code will be like
var arr = [];
var str = "#foo# #bar# #test#"
str.replace(/#(.*?)#/g, function(s, match) {
arr.push(match);
});
Check these lines from MDN DOC which explains your query about howexec updates lastIndex property I think,
If your regular expression uses the "g" flag, you can use the exec
method multiple times to find successive matches in the same string.
When you do so, the search starts at the substring of str specified by
the regular expression's lastIndex property (test will also advance
the lastIndex property).

I'm not sure if this is the answer you are looking for but you may try the following code:
var matches = [];
var test = "#foo# #foo# bar #foo#";
test.replace(/#(.*?)#/g, function (string, match) {
matches.push(match);
});
alert(JSON.stringify(matches));
Hope it helps.

data.replace(/.*?#(.*?#)/g, '$1').split(/#/)
No loops, no functions.

In case somebody arrives with a similar need to mine, I needed a matching function for a Django-style URL config handler that could pass path "arguments" to a controller. I came up with this. Naturally it wouldn't work very well if matching '$' but it wouldn't break on '$1.00'. It's a little bit more explicit than necessary. You could just return matchedGroups from the else statement and not bother with the for loop test but ;; in the middle of a loop declaration freaks people out sometimes.
var url = 'http://www.somesite.com/calendar/2014/june/6/';
var calendarMatch = /^http\:\/\/[^\/]*\/calendar\/(\d*)\/(\w*)\/(\d{1,2})\/$/;
function getMatches(str, matcher){
var matchedGroups = [];
for(var i=1,groupFail=false;groupFail===false;i++){
var group = str.replace(matcher,'$'+i);
groupFailTester = new RegExp('^\\$'+i+'$');
if(!groupFailTester.test(group) ){
matchedGroups.push(group);
}
else {
groupFail = true;
}
}
return matchedGroups;
}
console.log( getMatches(url, calendarMatch) );

Another thought, though exec is as efficient.
var s= "#foo# #foo# bar #foo#";
s= s.match(/#([^#])*#/g).join('#').replace(/^#+|#+$/g, '').split(/#+/);

Develop Reference

JavaScript is the programming language of the Web.

Match a specific syntax with regex - javascript

Related

Javascript Unnecessarily Compact Array Operations

CodeMirror - Using RegEx with overlay

What unicode character I can use to "flag" a string?

JS - jQuery inarray ignoreCase() and contains()

Best way to store JS Regex capturing groups in array?

Categories

Resources