This question already has answers here:
How to parse CSV data?
(14 answers)
Closed 6 months ago.
If given an comma separated string as follows
'UserName,Email,[a,b,c]'
i want a split array of all the outermost elements so expected result
['UserName','Email', '[a,b,c]']
string.split(',') will split across every comma but that wont work so any suggestions? this is breaking a CSV reader i have.
I wrote 2 similar answers, so might as well make it a 3rd instead of referring you there. It's a stateful split. This doesn't support nested arrays, but can easily made so.
var str = 'UserName,Email,[a,b,c]'
function tokenize(str) {
var state = "normal";
var tokens = [];
var current = "";
for (var i = 0; i < str.length; i++) {
c = str[i];
if (state == "normal") {
if (c == ',') {
if (current) {
tokens.push(current);
current = "";
}
continue;
}
if (c == '[') {
state = "quotes";
current = "";
continue;
}
current += c;
}
if (state == "quotes") {
if (c == ']') {
state = "normal";
tokens.push(current);
current = "";
continue;
}
current += c;
}
}
if (current) {
tokens.push(current);
current = "";
}
return tokens;
}
console.log(tokenize(str))
You can do this by matching the string to this Regex:
/(^|(?<=,))\[[^[]+\]|[^,]+((?=,)|$)/
let string = '[a,b,c],UserName,[1,2],Email,[a,b,c],password'
let regex = /(^|(?<=,))\[[^[]+\]|[^,]+((?=,)|$)/g
let output = string.match(regex);
console.log(output)
The regex can be summarized as:
Match either an array or a string that's enclosed by commas or at the start/end of our input
The key token we're using is alternative | which works as a sort of either this, or that and since the regex engine is eager, when it matches one, it moves on. So if we match and array, then we move on and don't consider what's inside.
We can break it down to 3 main sections:
(^|(?<=,))
^ Match from the beginning of our string
| Alternatively
(?<=,) Match a string that's preceded by a comma without returning the comma. Read more about positive lookaround here.
\[[^[]+\] | [^,]+
\[[^[]+\] Match a string that starts with [ and ends with ] and can contain a string of one or more characters that aren't [
This because in [1,2],[a,b] it can match the whole string at once since it starts with [ and ends with ]. This way our condition stops that by removing matches that also contain [ indicating that it belongs the second array.
| Alternatively
[^,]+ Match a string of any length that doesn't contain a comma, for the same reason as the brackets above since with ,asd,qwe, technically all of asd,qwe is enclosed with commas.
((?=,)|$)
(?=,) Match any string that's followed by a comma
| Alternatively
$ Match a string that ends with the end of the main string. Read here for a better explanation.
Related
Regex to fetch all spaces as long as they are not enclosed in braces
This is for a javascript mention system
ex: "Speak #::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}, all right?"
Need to get:
[ "Speak ", "#::{Joseph
Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}", ",", "all ", "right?"
]
[Edit]
Solved in: https://codesandbox.io/s/rough-http-8sgk2
Sorry for my bad english
I interpreted your question as you said to to fetch all spaces as long as they are not enclosed in braces, although your result example isn't what I would expect. Your example result contains a space after speak, as well as a separate match for the , after the {} groups. My output below shows what I would expect for what I think you are asking for, a list of strings split on just the spaces outside of braces.
const str =
"Speak #::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}, all right?";
// This regex matches both pairs of {} with things inside and spaces
// It will not properly handle nested {{}}
// It does this such that instead of capturing the spaces inside the {},
// it instead captures the whole of the {} group, spaces and all,
// so we can discard those later
var re = /(?:\{[^}]*?\})|( )/g;
var match;
var matches = [];
while ((match = re.exec(str)) != null) {
matches.push(match);
}
var cutString = str;
var splitPieces = [];
for (var len=matches.length, i=len - 1; i>=0; i--) {
match = matches[i];
// Since we have matched both groups of {} and spaces, ignore the {} matches
// just look at the matches that are exactly a space
if(match[0] == ' ') {
// Note that if there is a trailing space at the end of the string,
// we will still treat it as delimiter and give an empty string
// after it as a split element
// If this is undesirable, check if match.index + 1 >= cutString.length first
splitPieces.unshift(cutString.slice(match.index + 1));
cutString = cutString.slice(0, match.index);
}
}
splitPieces.unshift(cutString);
console.log(splitPieces)
Console:
["Speak", "#::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc},", "all", "right?"]
Input = ABCDEF ((3) abcdef),GHIJKLMN ((4)(5) Value),OPQRSTUVW((4(5)) Value (3))
Expected Output = ABCDEF,GHIJKLMN,OPQRSTUVW
Tried so far
Output = Input.replace(/ *\([^)]*\)*/g, "");
Using a regex here probably won't work, or scale, because you expect nested parentheses in your input string. Regex works well when there is a known and fixed structure to the input. Instead, I would recommend that you approach this using a parser. In the code below, I iterate over the input string, one character at at time, and I use a counter to keep track of how many open parentheses there are. If we are inside a parenthesis term, then we don't record those characters. I also have one simple replacement at the end to remove whitespace, which is an additional step which your output implies, but you never explicitly mentioned.
var pCount = 0;
var Input = "ABCDEF ((3) abcdef),GHIJKLMN ((4)(5) Value),OPQRSTUVW((4(5)) Value (3))";
var Output = "";
for (var i=0; i < Input.length; i++) {
if (Input[i] === '(') {
pCount++;
}
else if (Input[i] === ')') {
pCount--;
}
else if (pCount == 0) {
Output += Input[i];
}
}
Output = Output.replace(/ /g,'');
console.log(Output);
If you need to remove nested parentheses, you may use a trick from Remove Nested Patterns with One Line of JavaScript.
var Input = "ABCDEF ((3) abcdef),GHIJKLMN ((4)(5) Value),OPQRSTUVW((4(5)) Value (3))";
var Output = Input;
while (Output != (Output = Output.replace(/\s*\([^()]*\)/g, "")));
console.log(Output);
Or, you could use a recursive function:
function remove_nested_parens(s) {
let new_s = s.replace(/\s*\([^()]*\)/g, "");
return new_s == s ? s : remove_nested_parens(new_s);
}
console.log(remove_nested_parens("ABCDEF ((3) abcdef),GHIJKLMN ((4)(5) Value),OPQRSTUVW((4(5)) Value (3))"));
Here, \s*\([^()]*\) matches 0+ whitespaces, (, 0+ chars other than ( and ) and then a ), and the replace operation is repeated until the string does not change.
This question already has answers here:
Strip HTML from Text JavaScript
(44 answers)
removing html tags from string
(3 answers)
Closed 7 years ago.
I need to get rid of any text inside < and >, including the two delimiters themselves.
So for example, from string
<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>
I would like to get this one
that
This is what i've tried so far:
var str = annotation.split(' ');
str.substring(str.lastIndexOf("<") + 1, str.lastIndexOf(">"))
But it doesn't work for every < and >.
I'd rather not use RegEx if possible, but I'm happy to hear if it's the only option.
You can simply use the replace method with /<[^>]*>/g.It matches < followed by [^>]* any amount of non> until > globally.
var str = '<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>';
str = str.replace(/<[^>]*>/g, "");
alert(str);
For string removal you can use RegExp, it is ok.
"<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>".replace(/<\/?[^>]+>/g, "")
Since the text you want is always after a > character, you could split it at that point, and then the first character in each String of the array would be the character you need. For example:
String[] strings = stringName.split("<");
String word = "";
for(int i = 0; i < strings.length; i++) {
word += strings[i].charAt(0);
}
This is probably glitchy right now, but I think this would work. You don't need to actually remove the text between the "<>"- just get the character right after a '>'
Using a regular expression is not the only option, but it's a pretty good option.
You can easily parse the string to remove the tags, for example by using a state machine where the < and > characters turns on and off a state of ignoring characters. There are other methods of course, some shorter, some more efficient, but they will all be a few lines of code, while a regular expression solution is just a single replace.
Example:
function removeHtml1(str) {
return str.replace(/<[^>]*>/g, '');
}
function removeHtml2(str) {
var result = '';
var ignore = false;
for (var i = 0; i < str.length; i++) {
var c = str.charAt(i);
switch (c) {
case '<': ignore = true; break;
case '>': ignore = false; break;
default: if (!ignore) result += c;
}
}
return result;
}
var s = "<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>";
console.log(removeHtml1(s));
console.log(removeHtml2(s));
There are several ways to do this. Some are better than others. I haven't done one lately for these two specific characters, so I took a minute and wrote some code that may work. I will describe how it works. Create a function with a loop that copies an incoming string, character by character, to an outgoing string. Make the function a string type so it will return your modified string. Create the loop to scan from incoming from string[0] and while less than string.length(). Within the loop, add an if statement. When the if statement sees a "<" character in the incoming string it stops copying, but continues to look at every character in the incoming string until it sees the ">" character. When the ">" is found, it starts copying again. It's that simple.
The following code may need some refinement, but it should get you started on the method described above. It's not the fastest and not the most elegant but the basic idea is there. This did compile, and it ran correctly, here, with no errors. In my test program it produced the correct output. However, you may need to test it further in the context of your program.
string filter_on_brackets(string str1)
{
string str2 = "";
int copy_flag = 1;
for (size_t i = 0 ; i < str1.length();i++)
{
if(str1[i] == '<')
{
copy_flag = 0;
}
if(str1[i] == '>')
{
copy_flag = 2;
}
if(copy_flag == 1)
{
str2 += str1[i];
}
if(copy_flag == 2)
{
copy_flag = 1;
}
}
return str2;
}
I have a number of strings concatenated together
"[thing 1,thing 2,cat in the hat,Dr. Suese]"
I would like to traverse this string to stop at a specific comma (given an index) and return the substring immediately after the comma and before the next comma. The problem is I need to do it in JavaScript. I assume it would be something like this
function returnSubstring(i,theString){
var j,k = 0;
while(theString.charCodeAt(k) != ','){
while(i > 0){
if (theString.charCodeAt(j) == ','){
i--;
}
j++;
}
k++;
}
return theString.substring(j,k);
}
Is this what it should look like or is there some syntax issue here
I would like to traverse this string to stop at a specific comma (given an index) and return the substring immediately after the comma and before the next comma.
--> Let's assume specific index for comma accpeted is 8 i.e. first comma index, you can do :
var givenCommaIndex = 8;
var value = "[thing 1,thing 2,cat in the hat,Dr. Suese]";
var subString = value.substring(givenCommaIndex+1, value.indexOf(",", givenCommaIndex+1));
console.log(subString);
// Output :
"thing 2"
I can write the reusable function like below, it will not just work for comma but other delimiters as well :
function getSubString(str, delimiter, indexOfDelimiter) {
// TODO : handle specific cases like str is undefined or delimiter is null
return str.substring(indexOfDelimiter+1, str.indexOf(delimiter, indexOfDelimiter+1));
}
You may split :
var token = "[thing 1,thing 2,cat in the hat,Dr. Suese]"
.slice(1,-1) // remove [ and ]
.split(',')
[2]; // the third token
Or use a regular expression :
var token = "[thing 1,thing 2,cat in the hat,Dr. Suese]"
.match(/([^\]\[,]+)/g)
[2];
base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter