Split a string into an array of 36 characters in each line without breaking words - javascript

I am trying to split a string into an array of 36 characters in each index. Also the words should not break during this split. I am using below code to split a string without breaking words but this checks the next space after 36 is reached. My requirement is if the first index reaches 36 character , then it should check the previous space in that line and move this word to next index in array .
For example if I have a string "This is the new content for developing" This should be split into two lines as
[0]- This is the new content for
[1]-developing
Currently the below code splits this in a single line like
[0]- This is the new content for developing
var count; var len=36;
var curr = len; var prev = 0;
while (data[curr]) {
if (data[curr++] == ' ') {
splitArr.push(data.substring(prev,curr));
prev = curr;
curr += len;
}
}
splitArr.push(data.substr(prev));
What I can use instead of data[curr++] in if condition to get the white space before 36 characters?
Thanks for your help in advance.

Personally I would break the string into words, and add them to batches until we'd pass the max size (36), at which point we start a new batch.
We split the string into words with .split(). I use regex instead of a regular .split(" ") because I want to include the spaces when I split the string.
As we iterate through the items, we look at the last item. Would adding this string to that last item be more than 36 characters? If so, it starts a new item. If not, it adds it to the previous one.
To iterate and combine the items, I elected to use Array.reduce().
const str = "This is a demonstration of how your code might work with a longer text string. ";
const charLimit = 36;
let result = str
.split(/(\s+)/)
.reduce((output, item) => {
let last = output.pop() || ""; //get the last item
return last.length + item.length > charLimit //would adding the current item to it exceed 36 chars?
? [...output, last, item] //Yes: start a new item
: [...output, last + item] //No: add to previous item
}, []);
console.log("Result:");
console.log(result);
console.log("With lengths:");
console.log(result.map(i => ({string: i, length: i.length})));
console.log("Trimmed:");
console.log(result.map(i => i.trim()));

Simplest way i can think of using word boundaries
(?:^|\b)[\w .]{1,36}(?:\b|$)
let str = `This is a demonstration of how your code might work with a longer text string`
let op = str.match(/(?:^|\b)[\w .]{1,36}(?:\b|$)/gi)
console.log(op)
Regex demo

Related

JS Regex returning -1 & 0

I was tasked with the following:
take a string
print each of the vowels on a new line (in order) then...
print each of the consonants on a new line (in order)
The problem I found was with the regex. I originally used...
/[aeiouAEIOU\s]/g
But this would return 0 with a vowel and -1 with a consonant (so everything happened in reverse).
I really struggled to understand why and couldn't for the life of me find the answer. In the end it was simple enough to just invert the string but I want to know why this is happening the way it is. Can anyone help?
let i;
let vowels = /[^aeiouAEIOU\s]/g;
let array = [];
function vowelsAndConsonants(s) {
for(i=0;i<s.length;i++){
//if char is vowel then push to array
if(s[i].search(vowels)){
array.push(s[i]);
}
}
for(i=0;i<s.length;i++){
//if char is cons then push to array
if(!s[i].search(vowels)){
array.push(s[i]);
}
}
for(i=0;i<s.length;i++){
console.log(array[i]);
}
}
vowelsAndConsonants("javascript");
if(vowels.test(s[i])){ which will return true or false if it matches, or
if(s[i].search(vowels) !== -1){ and if(s[i].search(vowels) === -1){
is what you want if you want to fix your code.
-1 is not falsey so your if statement will not function correctly. -1 is what search returns if it doesn't find a match. It has to do this because search() returns the index position of the match, and the index could be anywhere from 0 to Infinity, so only negative numbers are available to indicate non-existent index:
MDN search() reference
Below is a RegEx that matches vowel OR any letter OR other, effectively separating out vowel, consonant, everything else into 3 capture groups. This makes it so you don't need to test character by character and separate them out manually.
Then iterates and pushes them into their respective arrays with a for-of loop.
const consonants = [], vowels = [], other = [];
const str = ";bat cat set rat. let ut cut mut,";
for(const [,cons,vow,etc] of str.matchAll(/([aeiouAEIOU])|([a-zA-Z])|(.)/g))
cons&&consonants.push(cons) || vow&&vowels.push(vow) || typeof etc === 'string'&&other.push(etc)
console.log(
consonants.join('') + '\n' + vowels.join('') + '\n' + other.join('')
)
There are a couple of inbuilt functions available:
let some_string = 'Mary had a little lamb';
let vowels = [...some_string.match(/[aeiouAEIOU\s]/g)];
let consonents = [...some_string.match(/[^aeiouAEIOU\s]/g)];
console.log(vowels);
console.log(consonents);
I think that you don't understand correctly how your regular expression works. In the brackets you have only defined a set of characters you want to match /[^aeiouAEIOU\s]/g and further by using the caret [^]as first in your group, you say that you want it to match everything but the characters in the carets. Sadly you don't provide an example of input and expected output, so I am only guessing, but I thing you could do the following:
let s = "avndexleops";
let keep_vowels = s.replace(/[^aeiouAEIOU\s]/g, '');
console.log(keep_vowels);
let keep_consonants = s.replace(/[aeiouAEIOU\s]/g, '');
console.log(keep_consonants);
Please provide example of expected input and output.
You used:
/[^aeiouAEIOU\s]/g
Instead of:
/[aeiouAEIOU\s]/g
^ means "not", so your REGEX /[^aeiouAEIOU\s]/g counts all the consonants.

Splitting a string based on max character length, but keep words into account

So In my program I can receive strings of all kinds of lengths and send them on their way to get translated. If those strings are of a certain character length I receive an error, so I want to check & split those strings if necessary before that. BUT I can't just split the string in the middle of a word, the words themself also need to be intact & taken into account.
So for example:
let str = "this is an input example of one sentence that contains a bit of words and must be split"
let splitStringArr = [];
// If string is larger than X (for testing make it 20) characters
if(str.length > 20) {
// Split string sentence into smaller strings, keep words intact
//...
// example of result would be
// splitStringArr = ['this is an input', 'example of one sentence' 'that contains...', '...']
// instead of ['this is an input exa' 'mple of one senten' 'ce that contains...']
}
But I'm not sure how to split a sentence and still keep into account the sentence length.
Would a solution for this be to iterate over the string, add every word to it and check every time if it is over the maximum length, otherwise start a new array index, or are there better/existing methods for this?
You can use match and lookahead and word boundaries, |.+ to take care string at the end which are less then max length at the end
let str = "this is an input example of one sentence that contains a bit of words and must be split"
console.log(str.match(/\b[\w\s]{20,}?(?=\s)|.+$/g))
Here's an example using reduce.
const str = "this is an input example of one sentence that contains a bit of words and must be split";
// Split up the string and use `reduce`
// to iterate over it
const temp = str.split(' ').reduce((acc, c) => {
// Get the number of nested arrays
const currIndex = acc.length - 1;
// Join up the last array and get its length
const currLen = acc[currIndex].join(' ').length;
// If the length of that content and the new word
// in the iteration exceeds 20 chars push the new
// word to a new array
if (currLen + c.length > 20) {
acc.push([c]);
// otherwise add it to the existing array
} else {
acc[currIndex].push(c);
}
return acc;
}, [[]]);
// Join up all the nested arrays
const out = temp.map(arr => arr.join(' '));
console.log(out);
What you are looking for is lastIndexOf
In this example, maxOkayStringLength is the max length the string can be before causing an error.
myString.lastIndexOf(/\s/,maxOkayStringLength);
-- edit --
lastIndexOf doesn't take a regex argument, but there's another post on SO that has code to do this:
Is there a version of JavaScript's String.indexOf() that allows for regular expressions?
I would suggest:
1) split string by space symbol, so we get array of words
2) starting to create string again selecting words one by one...
3) if next word makes the string exceed the maximum length we start a new string with this word
Something like this:
const splitString = (str, lineLength) => {
const arr = ['']
str.split(' ').forEach(word => {
if (arr[arr.length - 1].length + word.length > lineLength) arr.push('')
arr[arr.length - 1] += (word + ' ')
})
return arr.map(v => v.trim())
}
const str = "this is an input example of one sentence that contains a bit of words and must be split"
console.log(splitString(str, 20))

Getting line number from index of character in file

I have a string input which consists of words. I am using regex.exec (g) to get all the words by function getWord(input)
So my input may look like this:
word word2
someword blah
What I get from from exec is object containing index of match. So it is array like:
[ 'word', index: 0, input: "..."]
...
[ 'someword', index: 11, input: "..."]
...
What I need is to easily calculate that word "someword" is on line 2 by using the index(11) (as I don't have any other value telling me what is the number of lines)
Here is what I came up with: Match '\n's until you match \n with higher index then is index of word. Not sure if this may not be problematic in 10k lines file.
Snippet for idea:
getLineFromIndex: (index, input) ->
regex = /\n/g
line = 1
loop
match = regex.exec(input)
break if not match? or match.index > index
line++
return line
Kinda big optimalization can be done here. I can save the regex and last match, so I won't iterate all the input every time I want to check for line number. Regex will then be executed only when the last match has lower index then current index.
This is the final idea with optimization:
###
#variable content [String] is input content
###
getLineFromIndex: (index) ->
#lineMatcher = #lineMatcher || /\n/g
#lastLine = #lastLine || 1
if #eof isnt true
#lastMatch = #lastMatch || #lineMatcher.exec(#content)
if #eof or index < #lastMatch.index
return #lastLine
else
match = #lineMatcher.exec(#content)
if not #eof and match is null
#eof = true
else
#lastMatch = match
#lastLine++
return #lastLine
Cut input (a.substr(0, 11)).
Split it (a.substr(0, 11).split('\n')).
Count it (a.substr(0, 11).split('\n').length).
Your pseudo-code seems to do the job.
But I do not see how you can infer the line number by the offset of the searched word.
I would split the input text by lines, then look over the array for the searched word, and if found return the line index.
var input= "word word2 \n"+
"someword blah";
function getLinesNumberOf( input, word){
var line_numbers=[];
input.split("\n").forEach(function(line, index){
if( line.indexOf(word)>=0 ) line_numbers.push(index);
});
return line_numbers;
}
console.log(getLinesNumberOf(input,"someword"));
I have add support for multiple occurences of the searched word.
edit
To avoid too memory consumption with large inputs, you can parse sequentially (for the same avantanges of SAX vs DOM):
function getLinesNumberOf( word, input ){
input+= "\n";//make sure to not miss the last line;
var line_numbers=[], current_line=0;
var startline_offset=0;
do{
//get the offset next of the next breakline
endline_offset= input.indexOf("\n",startline_offset);
//get the offset of the searched word in the line
word_offset= input.substring(startline_offset,endline_offset).indexOf(word, 0);
//check if the searched word has been found and if it has been found on current_line
if( word_offset >= 0 && word_offset < endline_offset ) {
//if true the current_line is stored
line_numbers.push(current_line);
}
//the offset of the next line is just after the breakline offset
startline_offset= endline_offset+1;
current_line++;
}while(endline_offset>=0);//we continue while a breakline is found
console.log(line_numbers);
}

Extract strings in a .txt file with javascript

I have a .txt file with this structure:
chair 102
file 38
green 304
... ...
It has 140.000 elements.
Before introducing the numbers I used javascript and jQuery:
$(function () {
$.get('/words.txt', function (data) {
words = data.split('\n');
});
But because I have now numbers how could I treat separately the strings and the numbers?
Since this helped, I'll post as an answer:
Your format is <word><space><num>\n
You split on new line, so now you have an array of <word><space><num> which you should be able to split on space.
Then you can get the word part as myarray[0] and the number part as myarray[1].
you could split at each new line and then split each element at space, but this will gives you array of array of words .
you could replace line with space and then split at space
ie:
words = data.replace(/\n/g,' ').split(' ');
An efficient way of handling this problem is to replace all the line breaks with spaces, then split the resulting string by the spaces. Then use what you know about the position of the elements to determine whether you're dealing with a number or a string:
var resultArr = data.replace(/\n/g, " ").split(" ")
for(var i = 0; i < resultArr.length; i++) {
if(i % 2) {
// even indexes represent the word
console.info("word = " + resultArr[i]);
} else {
// odd indexes represent the number
console.info("number = " + resultArr[i]);
}
}
Depending on whether or not there's a line break at the end of the set, you may need to handle that case by looking for an empty string.

split string only on first instance of specified character

In my code I split a string based on _ and grab the second item in the array.
var element = $(this).attr('class');
var field = element.split('_')[1];
Takes good_luck and provides me with luck. Works great!
But, now I have a class that looks like good_luck_buddy. How do I get my javascript to ignore the second _ and give me luck_buddy?
I found this var field = element.split(new char [] {'_'}, 2); in a c# stackoverflow answer but it doesn't work. I tried it over at jsFiddle...
Use capturing parentheses:
'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element
They are defined as
If separator contains capturing parentheses, matched results are returned in the array.
So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).
In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.
We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.
What do you need regular expressions and arrays for?
myString = myString.substring(myString.indexOf('_')+1)
var myString= "hello_there_how_are_you"
myString = myString.substring(myString.indexOf('_')+1)
console.log(myString)
I avoid RegExp at all costs. Here is another thing you can do:
"good_luck_buddy".split('_').slice(1).join('_')
With help of destructuring assignment it can be more readable:
let [first, ...rest] = "good_luck_buddy".split('_')
rest = rest.join('_')
A simple ES6 way to get both the first key and remaining parts in a string would be:
const [key, ...rest] = "good_luck_buddy".split('_')
const value = rest.join('_')
console.log(key, value) // good, luck_buddy
Nowadays String.prototype.split does indeed allow you to limit the number of splits.
str.split([separator[, limit]])
...
limit Optional
A non-negative integer limiting the number of splits. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all.
The array may contain fewer entries than limit if the end of the string is reached before the limit is reached.
If limit is 0, no splitting is performed.
caveat
It might not work the way you expect. I was hoping it would just ignore the rest of the delimiters, but instead, when it reaches the limit, it splits the remaining string again, omitting the part after the split from the return results.
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C"]
I was hoping for:
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B_C_D_E"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C_D_E"]
This solution worked for me
var str = "good_luck_buddy";
var index = str.indexOf('_');
var arr = [str.slice(0, index), str.slice(index + 1)];
//arr[0] = "good"
//arr[1] = "luck_buddy"
OR
var str = "good_luck_buddy";
var index = str.indexOf('_');
var [first, second] = [str.slice(0, index), str.slice(index + 1)];
//first = "good"
//second = "luck_buddy"
You can use the regular expression like:
var arr = element.split(/_(.*)/)
You can use the second parameter which specifies the limit of the split.
i.e:
var field = element.split('_', 1)[1];
Replace the first instance with a unique placeholder then split from there.
"good_luck_buddy".replace(/\_/,'&').split('&')
["good","luck_buddy"]
This is more useful when both sides of the split are needed.
I need the two parts of string, so, regex lookbehind help me with this.
const full_name = 'Maria do Bairro';
const [first_name, last_name] = full_name.split(/(?<=^[^ ]+) /);
console.log(first_name);
console.log(last_name);
Non-regex solution
I ran some benchmarks, and this solution won hugely:1
str.slice(str.indexOf(delim) + delim.length)
// as function
function gobbleStart(str, delim) {
return str.slice(str.indexOf(delim) + delim.length);
}
// as polyfill
String.prototype.gobbleStart = function(delim) {
return this.slice(this.indexOf(delim) + delim.length);
};
Performance comparison with other solutions
The only close contender was the same line of code, except using substr instead of slice.
Other solutions I tried involving split or RegExps took a big performance hit and were about 2 orders of magnitude slower. Using join on the results of split, of course, adds an additional performance penalty.
Why are they slower? Any time a new object or array has to be created, JS has to request a chunk of memory from the OS. This process is very slow.
Here are some general guidelines, in case you are chasing benchmarks:
New dynamic memory allocations for objects {} or arrays [] (like the one that split creates) will cost a lot in performance.
RegExp searches are more complicated and therefore slower than string searches.
If you already have an array, destructuring arrays is about as fast as explicitly indexing them, and looks awesome.
Removing beyond the first instance
Here's a solution that will slice up to and including the nth instance. It's not quite as fast, but on the OP's question, gobble(element, '_', 1) is still >2x faster than a RegExp or split solution and can do more:
/*
`gobble`, given a positive, non-zero `limit`, deletes
characters from the beginning of `haystack` until `needle` has
been encountered and deleted `limit` times or no more instances
of `needle` exist; then it returns what remains. If `limit` is
zero or negative, delete from the beginning only until `-(limit)`
occurrences or less of `needle` remain.
*/
function gobble(haystack, needle, limit = 0) {
let remain = limit;
if (limit <= 0) { // set remain to count of delim - num to leave
let i = 0;
while (i < haystack.length) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain++;
i = found + needle.length;
}
}
let i = 0;
while (remain > 0) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain--;
i = found + needle.length;
}
return haystack.slice(i);
}
With the above definition, gobble('path/to/file.txt', '/') would give the name of the file, and gobble('prefix_category_item', '_', 1) would remove the prefix like the first solution in this answer.
Tests were run in Chrome 70.0.3538.110 on macOSX 10.14.
Use the string replace() method with a regex:
var result = "good_luck_buddy".replace(/.*?_/, "");
console.log(result);
This regex matches 0 or more characters before the first _, and the _ itself. The match is then replaced by an empty string.
Javascript's String.split unfortunately has no way of limiting the actual number of splits. It has a second argument that specifies how many of the actual split items are returned, which isn't useful in your case. The solution would be to split the string, shift the first item off, then rejoin the remaining items::
var element = $(this).attr('class');
var parts = element.split('_');
parts.shift(); // removes the first item from the array
var field = parts.join('_');
Here's one RegExp that does the trick.
'good_luck_buddy' . split(/^.*?_/)[1]
First it forces the match to start from the
start with the '^'. Then it matches any number
of characters which are not '_', in other words
all characters before the first '_'.
The '?' means a minimal number of chars
that make the whole pattern match are
matched by the '.*?' because it is followed
by '_', which is then included in the match
as its last character.
Therefore this split() uses such a matching
part as its 'splitter' and removes it from
the results. So it removes everything
up till and including the first '_' and
gives you the rest as the 2nd element of
the result. The first element is "" representing
the part before the matched part. It is
"" because the match starts from the beginning.
There are other RegExps that work as
well like /_(.*)/ given by Chandu
in a previous answer.
The /^.*?_/ has the benefit that you
can understand what it does without
having to know about the special role
capturing groups play with replace().
if you are looking for a more modern way of doing this:
let raw = "good_luck_buddy"
raw.split("_")
.filter((part, index) => index !== 0)
.join("_")
Mark F's solution is awesome but it's not supported by old browsers. Kennebec's solution is awesome and supported by old browsers but doesn't support regex.
So, if you're looking for a solution that splits your string only once, that is supported by old browsers and supports regex, here's my solution:
String.prototype.splitOnce = function(regex)
{
var match = this.match(regex);
if(match)
{
var match_i = this.indexOf(match[0]);
return [this.substring(0, match_i),
this.substring(match_i + match[0].length)];
}
else
{ return [this, ""]; }
}
var str = "something/////another thing///again";
alert(str.splitOnce(/\/+/)[1]);
For beginner like me who are not used to Regular Expression, this workaround solution worked:
var field = "Good_Luck_Buddy";
var newString = field.slice( field.indexOf("_")+1 );
slice() method extracts a part of a string and returns a new string and indexOf() method returns the position of the first found occurrence of a specified value in a string.
This should be quite fast
function splitOnFirst (str, sep) {
const index = str.indexOf(sep);
return index < 0 ? [str] : [str.slice(0, index), str.slice(index + sep.length)];
}
console.log(splitOnFirst('good_luck', '_')[1])
console.log(splitOnFirst('good_luck_buddy', '_')[1])
This worked for me on Chrome + FF:
"foo=bar=beer".split(/^[^=]+=/)[1] // "bar=beer"
"foo==".split(/^[^=]+=/)[1] // "="
"foo=".split(/^[^=]+=/)[1] // ""
"foo".split(/^[^=]+=/)[1] // undefined
If you also need the key try this:
"foo=bar=beer".split(/^([^=]+)=/) // Array [ "", "foo", "bar=beer" ]
"foo==".split(/^([^=]+)=/) // [ "", "foo", "=" ]
"foo=".split(/^([^=]+)=/) // [ "", "foo", "" ]
"foo".split(/^([^=]+)=/) // [ "foo" ]
//[0] = ignored (holds the string when there's no =, empty otherwise)
//[1] = hold the key (if any)
//[2] = hold the value (if any)
a simple es6 one statement solution to get the first key and remaining parts
let raw = 'good_luck_buddy'
raw.split('_')
.reduce((p, c, i) => i === 0 ? [c] : [p[0], [...p.slice(1), c].join('_')], [])
You could also use non-greedy match, it's just a single, simple line:
a = "good_luck_buddy"
const [,g,b] = a.match(/(.*?)_(.*)/)
console.log(g,"and also",b)

Categories

Resources