I am trying to figure out how to get substrings when the substrings are either located between a ' (single quote) or " (double quote)
Example:
Input: The "quick" brown "fox" 'jumps' over the 'lazy dog'
Output: ['quick', 'fox', 'jumps', 'lazy dog']
I have tried doing this with a regex but fell flat.
const string = "The "quick" brown "fox" 'jumps' over the 'lazy dog'"
const pattern = /(?:'([^']*)')|(?:"([^"]*)")/;
console.log(strippedText.match(pattern));
But it only returns the first single quoted or double quotes word.
Use the global flag, g, after the last / in the pattern, and change the function from match to matchAll. So: pattern = /(?:'([^']*)')|(?:"([^"]*)")/g;. This returns an array of arrays, so you'll need to do processing on that to get the normal array that you want.
const string = `The "quick" brown "fox" 'jumps' over the 'lazy dog'`; // Uses backticks since we use " and '
const pattern = /(?:'([^']*)')|(?:"([^"]*)")/g; // Pattern has the global flag "g" at the end so it allows multiple matches
const matches = [...string.matchAll(pattern)] // Convert RegExpStringIterator into array with the spread operator "..."
.map(([_, first, second]) => first ?? second); // Convert the array of arrays into something sensible.
console.log(matches);
Without mapping, matches would look like this:
[
[
"\"quick\"",
null,
"quick"
],
[
"\"fox\"",
null,
"fox"
],
[
"'jumps'",
"jumps",
null
],
[
"'lazy dog'",
"lazy dog",
null
]
]
So with this line:
.map(([_, first, second]) => first ?? second)
We destructure the inner array, discarding the 0th index (which is the whole match, including things inside a "do no match" group (?:), so it includes the quotes at the beginning and end), and extracting the 1st and 2nd indices. The first ?? second means that if first is not null or undefined, it returns first, otherwise it returns second.
Related
Here's what I would like to be able to do:
function convertVersionToNumber(line) {
const groups = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
return parseInt(groups[1] + groups[2] + groups[3]);
}
convertVersionToNumber("# 1.03.00")
This doesn't work because groups is an IterableIterator<RegExpMatchArray>, not an array. Array.from doesn't seem to turn it into an array of groups either. Is there an easy way (ideally something that can fit on a single line) that can convert groups into an array?
The API of that IterableIterator<RegExpMatchArray> is a little inconvenient, and I don't know how to skip the first element in a for...of. I mean, I do know how to use both of these, it just seems like it's going to add 4+ lines so I'd like to know if there is a more concise way.
I am using typescript, so if it has any syntactic sugar to do this, I'd be happy to use that.
1) matchAll will return an Iterator object Iterator [RegExp String Iterator]
result will contain an Iterator and when you use the spread operator It will give you all matches. Since it contains only one match so It contains a single element only.
[ '# 1.03.00', '1', '03', '00', index: 0, input: '# 1.03.00', groups: undefined ]
Finally, we used a spread operator to get all value and wrap it in an array
[...result]
function convertVersionToNumber(line) {
const result = line.matchAll(/^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const groups = [...result][0];
return parseInt(groups[1] + groups[2] + groups[3]);
}
console.log(convertVersionToNumber("# 1.03.00"));
Since you are using regex i.e /^# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/
2) If there are multiple matches then yon can spread results in an array and then use for..of to loop over matches
function convertVersionToNumber(line) {
const iterator = line.matchAll(/# ([0-9]).([0-9][0-9]).([0-9][0-9])\s*/g);
const results = [...iterator];
for (let arr of results) {
const [match, g1, g2, g3] = arr;
console.log(match, g1, g2, g3);
}
}
convertVersionToNumber("# 1.03.00 # 1.03.00");
Alternate solution: You can also get the same result using simple match also
function convertVersionToNumber(line) {
const result = line.match(/\d/g);
return +result.join("");
}
console.log(convertVersionToNumber("# 1.03.00"));
You do not need .matchAll in this concrete case. You simply want to match a string in a specific format and re-format it by only keeping the three captured substrings.
You may do it with .replace:
function convertVersionToNumber(line) {
return parseInt(line.replace(/^# (\d)\.(\d{2})\.(\d{2})[\s\S]*/, '$1$2$3'));
}
console.log( convertVersionToNumber("# 1.03.00") );
You may check if the string before replacing is equal to the new string if you need to check if there was a match at all.
Note you need to escape dots to match them as literal chars.
The ^# (\d)\.(\d{2})\.(\d{2})[\s\S]* pattern matches
^ - start of string
# - space + #
(\d) - Group 1: a digit
\. - a dot
(\d{2}) - Group 2: two digits
\. - a dot
(\d{2}) - Group 3: two digits
[\s\S]* - the rest of the string (zero or more chars, as many as possible).
The $1$2$3 replacement pattern is the concatenated Group 1, 2 and 3 values.
I have this string
a = "This is just an example";
If I used
split(" ",1)
it will print the first word as an array.
My question is how can I split just the second string as an array?
Use a limit of 2, then slice starting from the second element.
a = "This is just an example";
console.log(a.split(" ", 2).slice(1));
Or split the string with a limit of 2, then use an array literal containing just the second element.
a = "This is just an example";
console.log([a.split(" ", 2)[1]]);
A Better Approach would be to split on a space & that will give you an array of string, select the index you want & split it further
Note : ARRAY INDEX STARTS FROM 0 NOT ONE, SO IF YOU WANT THE SECOND STRING, YOU WILL HAVE TO USE THE INDEX 1, NOT 2
const a = "This is just an example";
const secondWordArr = a.split(' ')[1].split('');
// secondWordArr represents the array of characters of the seconds word
console.log(secondWordArr); // Output [ 'i', 's' ]
Explanation :
a.split(' ') // this splits the string into an array of strings/words
a.split(' ')[1] // Access the second string in the array of split strings/words
a.split(' ')[1].split('') // splits the second string from the array of strings/words into a separate array//
I have this string (notice the multi-line syntax):
var str = ` Number One: Get this
Number Two: And this`;
And I want a regex that returns (with match):
[str, 'Get this', 'And this']
So I tried str.match(/Number (?:One|Two): (.*)/g);, but that's returning:
["Number One: Get this", "Number Two: And this"]
There can be any whitespace/line-breaks before any "Number" word.
Why doesn't it return only what is inside of the capturing group? Am I misundersating something? And how can I achieve the desired result?
Per the MDN documentation for String.match:
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.
(emphasis mine).
So, what you want is not possible.
The same page adds:
if you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.
so if you're willing to give on using match, you can write your own function that repeatedly applies the regex, gets the captured substrings, and builds an array.
Or, for your specific case, you could write something like this:
var these = str.split(/(?:^|\n)\s*Number (?:One|Two): /);
these[0] = str;
Replace and store the result in a new string, like this:
var str = ` Number One: Get this
Number Two: And this`;
var output = str.replace(/Number (?:One|Two): (.*)/g, "$1");
console.log(output);
which outputs:
Get this
And this
If you want the match array like you requested, you can try this:
var getMatch = function(string, split, regex) {
var match = string.replace(regex, "$1" + split);
match = match.split(split);
match = match.reverse();
match.push(string);
match = match.reverse();
match.pop();
return match;
}
var str = ` Number One: Get this
Number Two: And this`;
var regex = /Number (?:One|Two): (.*)/g;
var match = getMatch(str, "#!SPLIT!#", regex);
console.log(match);
which displays the array as desired:
[ ' Number One: Get this\n Number Two: And this',
' Get this',
'\n And this' ]
Where split (here #!SPLIT!#) should be a unique string to split the matches. Note that this only works for single groups. For multi groups add a variable indicating the number of groups and add a for loop constructing "$1 $2 $3 $4 ..." + split.
Try
var str = " Number One: Get this\
Number Two: And this";
// `/\w+\s+\w+(?=\s|$)/g` match one or more alphanumeric characters ,
// followed by one or more space characters ,
// followed by one or more alphanumeric characters ,
// if following space or end of input , set `g` flag
// return `res` array `["Get this", "And this"]`
var res = str.match(/\w+\s+\w+(?=\s|$)/g);
document.write(JSON.stringify(res));
I am trying something like this
^(.*)[\s]*(?:\[[\s]*(.*)[\s]*\])?$
My idea is that first match returns everything but the occasional second match which is anything inside []. Incoming string to match is already trimmed.
For instance
'aaaaa [] [ddd]' -> returns 'aaaa []' plus 'ddd'
'[] [ddd]' -> returns '[]' plus 'ddd'
'aaaaaaaa' -> returns 'aaaaaaa' plus NULL
'aaaaaaaa []' -> returns 'aaaaaaa' plus ''
'aaaaaa [' -> returns 'aaaaaa [' plus NULL
'aaaa [] ddd' -> returns 'aaaa [] ddd' plus NULL
'[a] [b] [c] [d]' returns '[a] [b] [c]' plus 'd' instead of '' plus 'a] [b] [c] [d'
'[fff]' -> return '' plus 'fff' <- That's particular since first match can never be null
My main problems are due to the first match, since both .* (swallows all) and *? (swallows only until first ] if multiple) give an undesired result
Pseudocode for algorithm would be something like:
If last char is a ']', second match will be anything inside up to the
closest '[' backwards (if exist) -> this can be null or '' if input
string ends with '[]'
Rest is first match, which cannot be NULL, only
''
Any suggestion?
If there are no nesting, you can use this regex:
^(.*?)\s*(?:\[([^\]]*)\])?$
regex101 demo
Otherwise, if you can have nested [] in the main [], then the regex will have to be revised. You can make a regex for nested [] but only up to a certain level of nesting; if you have up to 2 levels of nesting, you make a regex for 2, if you have up to 5 levels of nesting, you make a more complex one for 5, etc.
I think regular expressions are not the answer here, especially because you give a simple algorithm to solve the problem. Just translate your algorithm into code.
Also regular expressions are not the solution because you have unbalanced and nested [] as you state in your comments, which make regex impractical.
Try some javascript like this :
function parse ( text ) {
var first, inside;
if ( text.substr (-1) == ']' ) {
var pos = text.lastIndexOf ('[');
first = text.substr (0, pos);
inside = text.substr ( pos + 1, text.length -pos - 1);
} else {
first = text;
}
return [ first, inside ];
}
I'm not sure to understand what you want to do but, here is a try : /(.*?)\[(.*?)\]$/.
Another try, allowing the second group to remain undefined : /(.*?)(?:\[(.*?)\])?$/.
I have never used Scriptular but here is what Chrome's console says :
// result : [full match, group 1, group 2]
'abc'.match(/(.*?)(?:\[(.*?)\])?$/) // ["abc", "abc", undefined]
'[abc]'.match(/(.*?)(?:\[(.*?)\])?$/) // ["[abc]", "", "abc"]
What about this one : /(.*?)(?:\[([^\[]*?)\])?$/?
'aze[[[rty]'.match(/(.*?)(?:\[([^\[]*?)\])?$/) // ["aze[[[rty]", "aze[[", "rty"]
Last try : /(.+?)(?:\[([^\[]*?)\])?$/.
test result
-------------------------------------------
'' null
'aze' ["aze", "aze", undefined]
'[rty]' ["[rty]", "[rty]", undefined]
'aze[rty]' ["aze[rty]", "aze", "rty"]
'aze[]' ["aze[]", "aze", ""]
'aze[][rty]' ["aze[][rty]", "aze[]", "rty"]
'aze[[]rty]' ["aze[[]rty]", "aze[", "]rty"]
Here's what I need in what I guess must be the right order:
The contents of each section of the string contained in square brackets (which each must follow after the rest of the original string) need to be extracted out and stored, and the original string returned without them.
If there is a recognized string followed by a colon at the start of a given extracted section, then I need that identified and removed.
For what's left (comma delimited), I need it dumped into an array.
Do not attempt to parse nested brackets.
What is a good way to do this?
Edit: Here's an example of a string:
hi, i'm a string [this: is, how] [it: works, but, there] [might be bracket, parts, without, colons ] [[nested sections should be ignored?]]
Edit: Here's what might be the results:
After extraction: 'hi, i'm a string'
Array identified as 'this': ['is', 'how']
Array identified as 'it': ['works', 'but', 'there']
Array identified without a label: ['might by bracket', 'parts', 'without', 'colons']
Array identified without a label: []
var results = [];
s = s.replace(/\[+(?:(\w+):)?(.*?)\]+/g,
function(g0, g1, g2){
results.push([g1, g2.split(',')]);
return "";
});
Gives the results:
>> results =
[["this", [" is", " how"]],
["it", [" works", " but", " there"]],
["", ["might be bracket", " parts", " without", " colons "]],
["", ["nested sections should be ignored?"]]
]
>> s = "hi, i'm a string "
Note it leaves spaces between tokens. Also, you can remove [[]] tokens in an earlier stage by calling s = s.replace(/\[\[.*?\]\]/g, ''); - this code captures them as a normal group.