How to split a string based on space and a limit? - javascript

For eg, I have a string:
var s = "Tyrannosaurus Rex";
My limit is 4. So I want to split this string to 4 letter short strings into an array. Which can be done using:
var arr = s.match(/.{1,4}/g);
But the troubling thing is, it should consider space or \n as a splitting criteria as well. So that final output should be:
["Tyra", "nnos", "auru", "s", "Rex"]
and not this:
["Tyra", "nnos", "auru", "s Re", "x"]
Any clean solution would be helpful!

Simply modify your regex to not match spaces or new lines:
var s = "Tyrannosaurus Rex";
console.log(s.match(/[^ \n]{1,4}/g));

Use this /\S{1,4}/g. It will consider every character except space, tabs and newline.
var s = "Tyrannosaurus Rex";
console.log(s.match(/\S{1,4}/g));

Start by splitting it into words then applying your regexp:
var s = "Tyrannosaurus Rex";
var words = s.split(' ');
var arr = [].concat(...words.map(w => w.match(/.{1,4}/g)));
console.log(arr);
EDIT: Leaving this here but #le_m answer is provably better

Related

JavaScript get first name and last name from string as array

I have a string that has the following format: <strong>FirstName LastName</strong>
How can I change this into an array with the first element firstName and second lastName?
I did this, but no luck, it won't produce the right result:
var data = [myString.split('<strong>')[1], myString.split('<strong>')[2]]
How can I produce ["firstName", "lastName"] for any string with that format?
In order to parse HTML, use the best HTML parser out there, the DOM itself!
// create a random element, it doesn't have to be 'strong' (e.g., it could be 'div')
var parser = document.createElement('strong');
// set the innerHTML to your string
parser.innerHTML = "<strong>FirstName LastName</strong>";
// get the text inside the element ("FirstName LastName")
var fullName = parser.textContent;
// split it into an array, separated by the space in between FirstName and LastName
var data = fullName.split(" ");
// voila!
console.log(data);
EDIT
As #RobG pointed out, you could also explicitly use a DOM parser rather than that of an element:
var parser = new DOMParser();
var doc = parser.parseFromString("<strong>FirstName LastName</strong>", "text/html");
console.log(doc.body.textContent.split(" "));
However, both methods work perfectly fine; it all comes down to preference.
Just match everything between <strong> and </strong>.
var matches = "<strong>FirstName LastName</strong>".match(/<strong>(.*)<\/strong>/);
console.log(matches[1].split(' '));
The preferred approach would be to use DOM methods; create an element and get the .textContent then match one or more word characters or split space character.
let str = '<strong>FirstName LastName</strong>';
let [,first, last] = str.split(/<[/\w\s-]+>|\s/g);
console.log(first, last);
/<[/\w\s-]+>|\s/g
Splits < followed by one or more word, space or dash characters characters followed by > character or space to match space between words in the string.
Comma operator , within destructuring assignment is used to omit that index from the result of .split() ["", "FirstName", "LastName", ""].
this is my approach of doing your problem. Hope it helps!
var str = "<strong>FirstName LastName</strong>";
var result = str.slice(0, -9).substr(8).split(" ");
Edit: it will only work for this specific example.
Another way to do this in case you had something other than an html
var string = "<strong>FirstName LastName</strong>";
string = string.slice(0, -9); // remove last 9 chars
string = string.substr(8); // remove first 8 chars
string = string.split(" "); // split into an array at space
console.log(string);

How to match a number at the start of a string

I would like to match a number at the start of each string:
1000_lang sorting_1 ghhgf_1002
1001_lang
100_abcdefg_sgdga_10001_321gg hjdshjdg
So, I will have numbers: 1000, 1001, 100 respectively. Basically, I want to match a number from a string until that number meets first underscore. But numbers can be any length, so if it is 12345_eyquyewuq_32136 df_1999 I need 12345. Don't need any other numbers coming after the first underscore.
^\d+
Get all numbers from the start of the line up to the first non-number
str = "123456_wibble";
patt = /^\d+/;
result = str.match( patt);
result is an array of matches, so as long as there is 1 or more, you've found something
See Mozilla Regular Expressions
This answer is javascript only, but it may be usefull if you don't care about regex:
var str = "1000_lang sorting_1 ghhgf_1002";
var result = str.split("_")[0];
result will hold the first number.
Something like this....
var str = '1000_lang sorting_1 ghhgf_1002',
matches = str.match(/^\d+/)
console.log(matches)

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

How to Split string with multiple rules in javascript

I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon
Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)
You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.
use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find
I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.
This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/

Remove leading comma from a string

I have the following string:
",'first string','more','even more'"
I want to transform this into an Array but obviously this is not valid due to the first comma. How can I remove the first comma from my string and make it a valid Array?
I’d like to end up with something like this:
myArray = ['first string','more','even more']
To remove the first character you would use:
var myOriginalString = ",'first string','more','even more'";
var myString = myOriginalString.substring(1);
I'm not sure this will be the result you're looking for though because you will still need to split it to create an array with it. Maybe something like:
var myString = myOriginalString.substring(1);
var myArray = myString.split(',');
Keep in mind, the ' character will be a part of each string in the split here.
In this specific case (there is always a single character at the start you want to remove) you'll want:
str.substring(1)
However, if you want to be able to detect if the comma is there and remove it if it is, then something like:
if (str[0] == ',') {
str = str.substring(1);
}
One-liner
str = str.replace(/^,/, '');
I'll be back.
var s = ",'first string','more','even more'";
var array = s.split(',').slice(1);
That's assuming the string you begin with is in fact a String, like you said, and not an Array of strings.
Assuming the string is called myStr:
// Strip start and end quotation mark and possible initial comma
myStr=myStr.replace(/^,?'/,'').replace(/'$/,'');
// Split stripping quotations
myArray=myStr.split("','");
Note that if a string can be missing in the list without even having its quotation marks present and you want an empty spot in the corresponding location in the array, you'll need to write the splitting manually for a robust solution.
var s = ",'first string','more','even more'";
s.split(/'?,'?/).filter(function(v) { return v; });
Results in:
["first string", "more", "even more'"]
First split with commas possibly surrounded by single quotes,
then filter the non-truthy (empty) parts out.
To turn a string into an array I usually use split()
> var s = ",'first string','more','even more'"
> s.split("','")
[",'first string", "more", "even more'"]
This is almost what you want. Now you just have to strip the first two and the last character:
> s.slice(2, s.length-1)
"first string','more','even more"
> s.slice(2, s.length-2).split("','");
["first string", "more", "even more"]
To extract a substring from a string I usually use slice() but substr() and substring() also do the job.
s=s.substring(1);
I like to keep stuff simple.
You can use directly replace function on javascript with regex or define a help function as in php ltrim(left) and rtrim(right):
1) With replace:
var myArray = ",'first string','more','even more'".replace(/^\s+/, '').split(/'?,?'/);
2) Help functions:
if (!String.prototype.ltrim) String.prototype.ltrim = function() {
return this.replace(/^\s+/, '');
};
if (!String.prototype.rtrim) String.prototype.rtrim = function() {
return this.replace(/\s+$/, '');
};
var myArray = ",'first string','more','even more'".ltrim().split(/'?,?'/).filter(function(el) {return el.length != 0});;
You can do and other things to add parameter to the help function with what you want to replace the char, etc.
this will remove the trailing commas and spaces
var str = ",'first string','more','even more'";
var trim = str.replace(/(^\s*,)|(,\s*$)/g, '');
remove leading or trailing characters:
function trimLeadingTrailing(inputStr, toRemove) {
// use a regex to match toRemove at the start (^)
// and at the end ($) of inputStr
const re = new Regex(`/^${toRemove}|{toRemove}$/`);
return inputStr.replace(re, '');
}

Categories

Resources