I looked around on StackOverflow but could not find the answer!
Anyway, I was wondering how do I get following in JavaScript with regexp.
I have a string
bla().func1($(ele).attr("data")).func2.func3("test")... ;
and I want extract func1, func2, func3... from the given string, anybody have an example of how to do that?
Bear in mind, that the function names changes, so they are not func1, func2 ... they can be load, past, moon ... whatever
try this:
var str = 'bla().func1($(ele).attr("data")).func2("test")';
alert(str.split(/(?:\.)\w+\d+(?=\()/g));
There's the fiddle:
http://jsfiddle.net/cv6avhfb/3/
This code would split the string in parts while using as separators substrings like '.func1(', '.func2(', '.abc3(' and other. If the function names have different structure you just have to change the \w+\d+ part of the regex.
Here's the result of this code:
bla(),($(ele).attr("data")),("test")
And if you like to know more about regex in javascript:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
Repeatedly strip out innermost parenthesized expressions:
var str = 'bla().func1($(ele).attr("data")).func2.func3("test")... ';
var expression = /\([^(]*?\)/g;
while (expression.test(str)) { str = str.replace(expression, ''); }
document.write(str.split('.').slice(1).join(' '));
However, you shouldn't do this (why do you want to?). JS should be parsed by, well, JS parsers. For instance, the above will fail miserably if there's a string literal containing a parenthesis. You should parse this using Esprima and analyze the resulting parse tree.
Related
I'm trying to write a regex which gets a particular substring from a string. However, the substring could be broken across multiple lines.
I've tried using the multiline flag, like this:
"foo\nbar".match(/foobar/m)
But that returns null.
I've also seen a number of posts suggesting I use [\S\s]. However, as far as I can tell, this only works if you know where the break line will be, like this:
'foo\nbar'.match(/foo[\S\s]bar/m)
Is there a way to find all instaces of foobar in a string when the line break could anywhere in the string?
Is there a way to find all instances of foobar in a string when the line break could anywhere in the string?
Remove all line-breaks from subject before comparing with your regex.
See this simple demo:
const arr = ["foo\nbar", "\nfoobar", "fo\nobar", "foobar\n", "foobar"];
const val = 'foobar';
arr.forEach(function(el) {
console.log(el.replace(/\n/, '') == val)
});
So, I recently found this example on trimming whitespace, but I've found that it also affects strings in code. For instance, say I'm doing a lesson on string comparison, and to demonstrate that "Hello World!" and "Hello World!" are different, I need the code compression to not have any effect on those two strings.
I'm using the whitespace compression so that people with different formatting styles won't be punished for using something that I don't use. For instance, I like to format my functions like this:
function foo(){
return 0;
};
While others may format it like this:
function foo()
{
return 0;
};
So I use whitespace compression around punctuation to make sure it always comes out the same, but I don't want it to affect anything within a string. Is there a way to add exceptions in JavaScript's replace() function?
UPDATE:
check this jsfiddle
var str='dfgdfg fdgfd fd gfd g print("Hello World!"); sadfds dsfgsgdf'
var regex=/(?:(".*"))|(\s+)/g;
var newStr=str.replace(regex, '$1 ');
console.log(newStr);
console.log(str);
In this code it will process everything except the quoted strings
to play with the code more comfortably you can see how the regex is working :
https://regex101.com/r/tG5qH2/1
I made a jsfiddle here: https://jsfiddle.net/cuywha8t/2/
var stringSplitRegExp = /(".+?"|'.+?')/g;
var whitespaceRegExp = /\s+\{/g;
var whitespaceReplacement = "{"
var exampleCode = `var str = "test test test" + 'asdasd "sd"';\n`+
`var test2 = function()\n{\nconsole.log("This is a string with 'single quotes'")\n}\n`+
`console.log('this is a string with "double quotes"')`;
console.log(exampleCode)
var separatedStrings =(exampleCode.split(stringSplitRegExp))
for(var i = 0; i < separatedStrings.length; i++){
if (i%2 === 1){
continue;
}
var oldString = separatedStrings[i];
separatedStrings[i] = oldString.replace(whitespaceRegExp, whitespaceReplacement)
}
console.log(separatedStrings.join(""))
I believe this is what you are looking for. it handles cases where a string contains the double quotes, etc. without modifying. This example just does the formatting of the curly-braces as you mentioned in your post.
Basically, the behavior of split allows the inclusion of the splitter in the array. And since you know the split is always between two non-string elements you can leverage this by looping over and modifying only every even-indexed array element.
If you want to do general whitespace replacement you can of course modify the regexp or do multiple passes, etc.
What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?
The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.
First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)
In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.
In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}
The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.
I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.
I'm having difficulties with constructing some regular expressions using Javascript.
What I need:
I have a string like: Woman|{Man|Boy} or {Girl|Woman}|Man or Woman|Man etc.
I need to split this string by '|' separator, but I don't want it to be split inside curly brackets.
Examples of strings and desired results:
// Expample 1
string: 'Woman|{Man|Boy}'
result: [0] = 'Woman', [1] = '{Man|Boy}'
// Example 2
string '{Woman|Girl}|{Man|Boy}'
result: [0] = '{Woman|Girl}', [1] = '{Man|Boy}'
I can't change "|" symbol to another inside the brackets because the given strings are the result of a recursive function. For example, the original string could be
'Nature|Computers|{{Girls|Women}|{Boys|Men}}'
try this:
var reg=/\|(?![^{}]+})/g;
Example results:
var a = 'Woman|{Man|Boy}';
var b = '{Woman|Girl}|{Man|Boy}';
a.split(reg)
["Woman", "{Man|Boy}"]
b.split(reg)
["{Woman|Girl}", "{Man|Boy}"]
for your another question:
"Now I have another, but a bit similar problem. I need to parse all containers from the string. Syntax of the each container is {sometrash}. The problem is that container can contain another containers, but I need to parse only "the most relative" container. mystring.match(/\{+.+?\}+/gi); which I use doesn't work correctly. Could you correct this regex, please? "
you can use this regex:
var reg=/\{[^{}]+\}/g;
Example results:
var a = 'Nature|Computers|{{Girls|Women}|{Boys|Men}}';
a.match(reg)
["{Girls|Women}", "{Boys|Men}"]
You can use
.match(/[^|]+|\{[^}]*\}/g)
to match those. However, if you have a nesting of arbitrary depth then you'll need to use a parser, [javascript] regex won't be capable of doing that.
Test this:
([a-zA-Z0-9]*\|[a-zA-Z0-9]*)|{[a-zA-Z0-9]*\|[a-zA-Z0-9]*}
How can I split the following string?
var str = "test":"abc","test1":"hello,hi","test2":"hello,hi,there";
If I use str.split(",") then I won't be able to get strings which contain commas.
Whats the best way to split the above string?
I assume it's actually:
var str = '"test":"abc","test1":"hello,hi","test2":"hello,hi,there"';
because otherwise it wouldn't even be valid JavaScript.
If I had a string like this I would parse it as an incomplete JSON which it seems to be:
var obj = JSON.parse('{'+str+'}');
and then use is as a plain object:
alert(obj.test1); // says: hello,hi
See DEMO
Update 1: Looking at other answers I wonder whether it's only me who sees it as invalid JavaScript?
Update 2: Also, is it only me who sees it as a JSON without curly braces?
Though not clear with your input. Here is what I can suggest.
str.split('","');
and then append the double quotes to each string
str.split('","'); Difficult to say given the formatting
if Zed is right though you can do this (assuming the opening and closing {)
str = eval(str);
var test = str.test; // Returns abc
var test1 = str.test1; // returns hello,hi
//etc
That's a general problem in all languages: if the items you need contain the delimiter, it gets complicated.
The simplest way would be to make sure the delimiter is unique. If you can't do that, you will probably have to iterate over the quoted Strings manually, something like this:
var arr = [];
var result = text.match(/"([^"]*"/g);
for (i in result) {
arr.push(i);
}
Iterate once over the string and replace commas(,) following a (") and followed by a (") with a (%) or something not likely to find in your little strings. Then split by (%) or whatever you chose.