Split string by regex in JavaScript - javascript

I'm trying to split a string by the following array of characters:
"!", "%", "$", "#"
I thought about using regex, so I developed the following method which I thought would split the string by the characters:
var splitted = string.split(/\!|%|\$|#*/);
However, when I run the following code, the output is split by every character, not what I was hoping for:
var toSplit = "abc%123!def$456#ghi";
var splittedArray = toSplit.split(/\!|%|\$|#*/);
How could I make it so that splittedArray contains the following elements?
"abc", "123", "def", "456", "ghi"
Any help appreciated.

#* matches the empty string and there's an empty string between any two characters, so the string is split at every single character. Use + instead:
/\!|%|\$|#+/
Also if you meant the + to apply to every character and not just # then group them up:
/(\!|%|\$|#)+/
Or better yet, use a character class. This lets you omit the backslashes since none of these characters are special inside square brackets.
/[!%$#]+/

Use the following:
var splittedArray = toSplit.split(/[!%$#]+/);
Your current code will split between every character because #* will match empty strings. I am assuming since you used #* that you want to consider consecutive characters a single delimiter, which is why the + is at the end of the regex. This will only match one or more characters, so it will not match empty strings.
The [...] syntax is a character class, which is like alternation with the | character except that it only works for single characters, so [!%$#] will match either !, %, $, or #. Inside of the character class the escaping rules change a little bit, so you can just use $ instead of \$.

Related

Split string on spaces except for in quotes, but include incomplete quotes

I am trying to split a string in JS on spaces except when the space is in a quote. However, an incomplete quote should be maintained. I'm not skilled in regex wizardry, and have been using the below regex:
var list = text.match(/[^\s"]+|"([^"]*)"/g)
However, if I provide input like sdfj "sdfjjk this will become ["sdfj","sdfjjk"] rather than ["sdfj",""sdfjjk"].
You can use
var re = /"([^"]*)"|\S+/g;
By using \S (=[^\s]) we just drop the " from the negated character class.
By placing the "([^"]*)" pattern before \S+, we make sure substrings in quotes are not torn if they come before. This should work if the string contains well-paired quoted substrings and the last is unpaired.
Demo:
var re = /"([^"]*)"|\S+/g;
var str = 'sdfj "sdfjjk';
document.body.innerHTML = JSON.stringify(str.match(re));
Note that to get the captured texts in-between quotes, you will need to use RegExp#exec in a loop (as String#match "drops" submatches).
UPDATE
No idea what downvoter thought when downvoting, but let me guess. The quotes are usually used around word characters. If there is a "wild" quote, it is still a quote right before/after a word.
So, we can utilize word boundaries like this:
"\b[^"]*\b"|\S+
See regex demo.
Here, "\b[^"]*\b" matches a " that is followed by a word character, then matches zero or more characters other than " and then is followed with a " that is preceded with a word character.
Moving further in this direction, we can make it as far as:
\B"\b[^"\n]*\b"\B|\S+
With \B" we require that " should be preceded with a non-word character, and "\B should be followed with a non-word character.
See another regex demo
A lot depends on what specific issue you have with your specific input!
Try the following:
text.match(/".*?"|[^\s]+/g).map(s => s.replace(/^"(.*)"$/, "$1"))
This repeatedly finds either properly quoted substrings (first), OR other sequences of non-whitespace. The map part is to remove the quotes around the quoted substrings.
> text = 'abc "def ghi" lmn "opq'
< ["abc", "def ghi", "lmn", ""opq"]

Regex to get the string between a character and a whitespace and excluding the first delimiter

In the following text what Regex (Javascript) would match "user" (user is a random name), excluding the "#" character?
I want to tag this #user here
and this #user
#user
I have looked at the following solutions and made the following regexes that did not work
RegEx pattern to match a string between two characters, but exclude the characters
\#(.*)\s
Regular Expression to find a string included between two characters while EXCLUDING the delimiters
(?!\#)(.*?)(?=\s)
Regex: Matching a character and excluding it from the results?
^#[^\s]+
Finally I made this regex that works but returns "#user" instead of "user":
#[^\s\n]+
The Javascript used to execute the regex is:
string.match(/#[^\s\n]+/)
I see I need to post a clarification.
If one knows a pattern beforehand in JS, i.e. if you do not build a regex from separate variables, one should be using a RegExp literal notation (e.g. /<pattern>/<flag(s)>).
In this case, you need a capturing group to get a submatch from a match that will start with a # and go on until the next non-whitespace character. You cannot use String#match if you have multiple values inside one input string, as global regexps with that method lose the captured texts. You need to use RegExp#exec:
var s = "I want to tag this #user here\nand this #user\n#user";
var arr = [];
var re = /#(\S+)\b/g;
while ((m=re.exec(s)) !== null) {
arr.push(m[1]);
}
document.write(JSON.stringify(arr));
The regex I suggest is #(\S+)\b:
# - matches a literal #
(\S+) - matches and captures into Group 1 one or more non-whitespace characters that finish with
\b - word boundary (remove if you have Unicode letters inside the names).
If you execute it this way, it should work:
var str = "I want to tag this #user here";
var patt = new RegExp("#([^\\s\\n]+)");
var result = patt.exec(str)[1];

Remove entire word from string if it contains numeric value

What I'm trying to accomplish is to auto-generate tags/keywords for a file upload, basing these keywords from the filename.
I have accomplished auto-generating titles for each upload, as shown here:
But I have now moved on to trying to auto-generate keywords. Similar to titles, but with more formatting. First, I run the string through this to remove commonly used words from the filename (such as this,that,there... etc)
I am happy with it, but I need to not include words that have numbers in it. I have not found a solution on how to remove a word entirely if it contains a number. The solutions I have found like here only works for a certain match, while this one removes numbers alone. I would like to remove the entire word if it contains ANY numeric digit.
To remove all words which contain a number, use:
string = string.replace(/[a-z]*\d+[a-z]*/gi, '');
Try this expression:
var regex = /\b[^\s]*\d[^\s]*\b/g;
Example:
var str = "normal 5digit dig555it digit5 555";
console.log( str.replace(regex,'') );​ //Result-> normal
Apply a simple regular expression to you current filename strings, replacing all occurrences with the empty string. The regular expression matches "words" containing any digits.
Javascript example:
'asdf 8bit jawesome234 mayhem 234'.replace(/\s*\b\w*\d\w*\b/g, '')
Evaluates to:
"asdf mayhem"
Here the regular expression is /\s*\b\w*\d\w*\b/g, which matches maximal sequences consisting of zero or more whitespace characters (\s*) followed by a word-boundary transition (\b), followed by zero or more alphanum characters (\w*), followed by a digit (\d), followed by zero or more alphanum characters, followed by a word-boundary transition (\b). \b matches the empty string at the transition to an alphanumeric character from either the beginning or end of the word or a non-alphanumeric character. The g after the final / of the regular expression means replace all occurrences, not just the first.
Once the digit-words are removed, you can split the string into keywords however you want (by whitespace, for example).
"asdf mayhem".split(/\s+/);
Evaluates to:
["asdf", "mayhem"]
('Apple Cover Photo 23s423 of your 543634 moms').match(/\b([^\d]+)\b/g, '')
returns
Apple Cover Photo , of your , moms
http://jsfiddle.net/awBPX/2/
use this to Remove words containing numeric :
string.replace("[0-9]","");
hope this helps.
Edited :
check this :
var str = 'one 2two three3 fo4ur 5 six';
var result = str.match(/(^[\D]+\s|\s[\D]+\s|\s[\D]+$|^[\D]+$)+/g).join('');

Regex from character until end of string

Hey. First question here, probably extremely lame, but I totally suck in regular expressions :(
I want to extract the text from a series of strings that always have only alphabetic characters before and after a hyphen:
string = "some-text"
I need to generate separate strings that include the text before AND after the hyphen. So for the example above I would need string1 = "some" and string2 = "text"
I found this and it works for the text before the hyphen, now I only need the regex for the one after the hyphen.
Thanks.
You don't need regex for that, you can just split it instead.
var myString = "some-text";
var splitWords = myString.split("-");
splitWords[0] would then be "some", and splitWords[1] will be "text".
If you actually have to use regex for whatever reason though - the $ character marks the end of a string in regex, so -(.*)$ is a regex that will match everything after the first hyphen it finds till the end of the string. That could actually be simplified that to just -(.*) too, as the .* will match till the end of the string anyway.

Javascript split regex question

hello I am trying what I thought would be a rather easy regex in Javascript but is giving me lots of trouble.
I want the ability to split a date via javascript splitting either by a '-','.','/' and ' '.
var date = "02-25-2010";
var myregexp2 = new RegExp("-.");
dateArray = date.split(myregexp2);
What is the correct regex for this any and all help would be great.
You need the put the characters you wish to split on in a character class, which tells the regular expression engine "any of these characters is a match". For your purposes, this would look like:
date.split(/[.,\/ -]/)
Although dashes have special meaning in character classes as a range specifier (ie [a-z] means the same as [abcdefghijklmnopqrstuvwxyz]), if you put it as the last thing in the class it is taken to mean a literal dash and does not need to be escaped.
To explain why your pattern didn't work, /-./ tells the regular expression engine to match a literal dash character followed by any character (dots are wildcard characters in regular expressions). With "02-25-2010", it would split each time "-2" is encountered, because the dash matches and the dot matches "2".
or just (anything but numbers):
date.split(/\D/);
you could just use
date.split(/-/);
or
date.split('-');
Say your string is:
let str = `word1
word2;word3,word4,word5;word7
word8,word9;word10`;
You want to split the string by the following delimiters:
Colon
Semicolon
New line
You could split the string like this:
let rawElements = str.split(new RegExp('[,;\n]', 'g'));
Finally, you may need to trim the elements in the array:
let elements = rawElements.map(element => element.trim());
Then split it on anything but numbers:
date.split(/[^0-9]/);
or just use for date strings 2015-05-20 or 2015.05.20
date.split(/\.|-/);
try this instead
date.split(/\W+/)

Categories

Resources