Extract strings between occurences of a specific character - javascript

I'm attempting to extract strings between occurences of a specific character in a larger string.
For example:
The initial string is:
var str = "http://www.google.com?hello?kitty?test";
I want to be able to store all of the substrings between the question marks as their own variables, such as "hello", "kitty" and "test".
How would I target substrings between different indexes of a specific character using either JavaScript or Regular Expressions?

You could split on ? and use slice passing 1 as the parameter value.
That would give you an array with your values. If you want to create separate variables you could for example get the value by its index var1 = parts[0]
var str = "http://www.google.com?hello?kitty?test";
var parts = str.split('?').slice(1);
console.log(parts);
var var1 = parts[0],
var2 = parts[1],
var3 = parts[2];
console.log(var1);
console.log(var2);
console.log(var3);

Quick note: that URL would be invalid. A question mark ? denotes the beginning of a query string and key/value pairs are generally provided in the form key=value and delimited with an ampersand &.
That being said, if this isn't a problem then why not split on the question mark to obtain an array of values?
var split_values = str.split('?');
//result: [ 'http://www.google.com', 'hello', 'kitty', 'test' ]
Then you could simply grab the individual values from the array, skipping the first element.

I believe this will do it:
var components = "http://www.google.com?hello?kitty?test".split("?");
components.slice(1-components.length) // Returns: [ "hello", "kitty", "test" ]

using Regular Expressions
var reg = /\?([^\?]+)/g;
var s = "http://www.google.com?hello?kitty?test";
var results = null;
while( results = reg.exec(s) ){
console.log(results[1]);
}

The general case is to use RegExp:
var regex1 = new RegExp(/\?.*?(?=\?|$)/,'g'); regex1.lastIndex=0;
str.match(regex1)
Note that this will also get you the leading ? in each clause (no look-behind regexp in Javascript).
Alternatively you can use the sticky flag and run it in a loop:
var regex1 = new RegExp(/.*?\?(.*?)(?=\?|$)/,'y'); regex1.lastIndex=0;
while(str.match(regex1)) {...}

You can take the substring starting from the first question mark, then split by question mark
const str = "http://www.google.com?hello?kitty?test";
const matches = str.substring(str.indexOf('?') + 1).split(/\?/g);
console.log(matches);

Related

Extract part of a string which start with a certain word in Javascript

I have the following string
"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,
I need to get string after "ssu":" the Result should be 89c4eef0-3a0d-47ae-a97f-42adafa7cf8f. How do I do it in Javascript but very simple? I am thinking to collect 36 character after "ssu":".
You could build a valid JSON string and parse it and get the wanted property ssu.
var string = '"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,',
object = JSON.parse(`{${string.slice(0, -1)}}`), // slice for removing the last comma
ssu = object.ssu;
console.log(ssu);
One solution would be to use the following regular expression:
/\"ssu\":\"([\w-]+)\"/
This pattern basically means:
\"ssu\":\" , start searching from the first instance of "ssu":"
([\w-]+) , collect a "group" of one or more alphanumeric characters \w and hypens -
\", look for a " at the end of the group
Using a group allows you to extract a portion of the matched pattern via the String#match method that is of interest to you which in your case is the guid that corresponds to ([\w-]+)
A working example of this would be:
const str = `"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,`
const value = str.match(/\"ssu\":\"([\w-]+)\"/)[1]
console.log(value);
Update: Extract multiple groupings that occour in string
To extract values for multiple occurances of the "ssu" key in your input string, you could use the String#matchAll() method to achieve that as shown:
const str = `"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,"ssu":"value-of-second-ssu","ssu":"value-of-third-ssu"`;
const values =
/* Obtain array of matches for pattern */
[...str.matchAll(/\"ssu\":\"([\w-]+)\"/g)]
/* Extract only the value from pattern group */
.map(([,value]) => value);
console.log(values);
Note that for this to work as expected, the /g flag must be added to the end of the original pattern. Hope that helps!
Use this regExp: /(?!"ssu":")(\w+-)+\w+/
const str = '"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,';
const re = /(?!"ssu":")(\w+-)+\w+/;
const res = str.match(re)[0];
console.log(res);
You can use regular expressions.
var str = '"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049,'
var minhaRE = new RegExp("[a-z|0-9]*-[a-z|0-9|-]*");
minhaRE.exec(str)
OutPut: Array [ "89c4eef0-3a0d-47ae-a97f-42adafa7cf8f" ]
Looks almost like a JSON string.
So with a small change it can be parsed to an object.
var str = '"sis":4,"sct":15,"ssu":"89c4eef0-3a0d-47ae-a97f-42adafa7cf8f","ssv":384,"siw":96554,"scx":1049, ';
var obj = JSON.parse('{'+str.replace(/[, ]+$/,'')+'}');
console.log(obj.ssu)

Get all characters except hyphen and brackets from string using javascript regex

I have a string like this:
var myString = "MyString-[ADDAAD]-isGreat";
I want to extract this string into 3 parts:
var stringOne = "MyString-";
var stringTwo = "ADDAAD";
var stringThree = "-isGreat";
I know how to get the string between the two square brackets:
var matches = patternString.match(/\[(.*?)\]/);
now matches[1] contains ADDAAD
But how can I get the other two parts?
Select every character except -, [ and ] using bottom regex.
var myString = "MyString-[ADDAAD]-isGreat";
var parts = myString.match(/[^-\[\]]+/g);
console.log(parts);
So if you want to store values in custom variable, use bottom code
var stringOne = parts[0];
var stringTwo = parts[1];
var stringThree = parts[2];
You may split the string with your regex. Note that all the capturing group contents will be also part of the resulting array. To avoid empty items, you may add .filter(Boolean) after split().
See a JS demo below:
var myString = "MyString-[ADDAAD]-isGreat";
console.log(myString.split(/\[(.*?)]/).filter(Boolean));
console.log("s1-[s2]".split(/\[(.*?)]/).filter(Boolean));
Note you do not have to escape a ] used outside character classes, it is always parsed as a literal closing bracket if there is no corresponding [ before it.

Separate value from string using javascript

I have a string in which every value is between [] and it has a . at the end. How can I separate all values from the string?
This is the example string:
[value01][value02 ][value03 ]. [value04 ]
//want something like this
v1 = value01;
v2 = value02;
v3 = value03;
v4 = value04
The number of values is not constant. How can I get all values separately from this string?
Use regular expressions to specify multiple separators. Please check the following posts:
How do I split a string with multiple separators in javascript?
Split a string based on multiple delimiters
var str = "[value01][value02 ][value03 ]. [value04 ]"
var arr = str.split(/[\[\]\.\s]+/);
arr.shift(); arr.pop(); //discard the first and last "" elements
console.log( arr ); //output: ["value01", "value02", "value03", "value04"]
JS FIDDLE DEMO
How This Works
.split(/[\[\]\.\s]+/) splits the string at points where it finds one or more of the following characters: [] .. Now, since these characters are also found at the beginning and end of the string, .shift() discards the first element, and .pop() discards the last element, both of which are empty strings. However, your may want to use .filter() and your can replace lines 2 and 3 with:
var arr = str.split(/[\[\]\.\s]+/).filter(function(elem) { return elem.length > 0; });
Now you can use jQuery/JS to iterate through the values:
$.each( arr, function(i,v) {
console.log( v ); // outputs the i'th value;
});
And arr.length will give you the number of elements you have.
If you want to get the characters between "[" and "]" and the data is regular and always has the pattern:
'[chars][chars]...[chars]'
then you can get the chars using match to get sequences of characters that aren't "[" or "]":
var values = '[value01][value02 ][value03 ][value04 ]'.match(/[^\[\]]+/g)
which returns an array, so values is:
["value01", "value02 ", "value03 ", "value04 "]
Match is very widely supported, so no cross browser issues.
Here's a fiddle: http://jsfiddle.net/5xVLQ/
Regex patern: /(\w)+/ig
Matches all words using \w (alphanumeric combos). Whitespace, brackets, dots, square brackets are all non-matching, so they don't get returned.
What I do is create a object to hold results in key/value pairs such as v1:'value01'. You can iterate through this object, or you can access the values directly using objRes.v1
var str = '[value01][value02 ][value03 ]. [value04 ]';
var myRe = /(\w)+/ig;
var res;
var objRes = {};
var i=1;
while ( ( res = myRe.exec(str) ) != null )
{
objRes['v'+i] = res[0];
i++;
}
console.log(objRes);

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

Javascript split only once and ignore the rest

I am parsing some key value pairs that are separated by colons. The problem I am having is that in the value section there are colons that I want to ignore but the split function is picking them up anyway.
sample:
Name: my name
description: this string is not escaped: i hate these colons
date: a date
On the individual lines I tried this line.split(/:/, 1) but it only matched the value part of the data. Next I tried line.split(/:/, 2) but that gave me ['description', 'this string is not escaped'] and I need the whole string.
Thanks for the help!
a = line.split(/:/);
key = a.shift();
val = a.join(':');
Use the greedy operator (?) to only split the first instance.
line.split(/: (.+)?/, 2);
If you prefer an alternative to regexp consider this:
var split = line.split(':');
var key = split[0];
var val = split.slice(1).join(":");
Reference: split, slice, join.
Slightly more elegant:
a = line.match(/(.*?):(.*)/);
key = a[1];
val = a[2];
May be this approach will be the best for such purpose:
var a = line.match(/([^:\s]+)\s*:\s*(.*)/);
var key = a[1];
var val = a[2];
So, you can use tabulations in your config/data files of such structure and also not worry about spaces before or after your name-value delimiter ':'.
Or you can use primitive and fast string functions indexOf and substr to reach your goal in, I think, the fastest way (by CPU and RAM)
for ( ... line ... ) {
var delimPos = line.indexOf(':');
if (delimPos <= 0) {
continue; // Something wrong with this "line"
}
var key = line.substr(0, delimPos).trim();
var val = line.substr(delimPos + 1).trim();
// Do all you need with this key: val
}
Split string in two at first occurrence
To split a string with multiple i.e. columns : only at the first column occurrence
use Positive Lookbehind (?<=)
const a = "Description: this: is: nice";
const b = "Name: My Name";
console.log(a.split(/(?<=^[^:]*):/)); // ["Description", " this: is: nice"]
console.log(b.split(/(?<=^[^:]*):/)); // ["Name", " My Name"]
it basically consumes from Start of string ^ everything that is not a column [^:] zero or more times *. Once the positive lookbehind is done, finally matches the column :.
If you additionally want to remove one or more whitespaces following the column,
use /(?<=^[^:]*): */
Explanation on Regex101.com
function splitOnce(str, sep) {
const idx = str.indexOf(sep);
return [str.slice(0, idx), str.slice(idx+1)];
}
splitOnce("description: this string is not escaped: i hate these colons", ":")

Categories

Resources