I have a JavaScript string like dog=1,cat=2,horse=3. The names and values can be anything. I want to remove dog and whatever value is associated with it from the string. So in this example I would end up with cat=2,horse=3. There may not be a entry for dog in the string, and it could be anywhere within the string, e.g. cat=22,dog=17,horse=3 which would end up as cat=22,horse=3.
The names and values will just be alphanumeric with no special characters like quotes and equals signs within them.
What is the best way of going about this in JavaScript?
Simplest solution:
str.split(",").filter(function(kv) {
return kv.slice(0, 4) != "dog=";
}.join(",")
You can do some regex magic as well, but that's not going to be as clear (and maintainable):
str.replace(.replace(/(,|^)dog=[^,]*/g, "").replace(/^,/,"")
You could do this, although may not be the best way:
convert the string to array as it is comma seperated.
remove the dog from the array.
join the array back as a string.
Related
I am trying to write a function to calculate how likely two strings are to mean the same thing. In order to do this I am converting to lower case and removing special characters from the strings before I compare them. Currently I am removing the strings '.com' and 'the' using String.replace(substring, '') and special characters using String.replace(regex, '')
str = str.toLowerCase()
.replace('.com', '')
.replace('the', '')
.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Is there a better regex that I can use to remove the common patterns like '.com' and 'the' as well as the special characters? Or some other way to make this more efficient?
As my dataset grows I may find other common meaningless patterns that need to be removed before trying to match strings and would like to avoid the performance hit of chaining more replace functions.
Examples:
Fish & Chips? => fish chips
stackoverflow.com => stackoverflow
The Lord of the Rings => lord of rings
You can connect the replace calls to a single one with a rexexp like this:
str = str.toLowerCase().replace(/\.com|the|[&\/\\#,+()$~%.'":*?<>{}]/g, '');
The different strings to remove are inside parentheses () and separated by pipes |
This makes it easy enough to add more string to the regexp.
If you are storing the words to remove in an array, you can generate the regex using the RegExp constructor, e.g.:
var words = ["\\.com", "the"];
var rex = new RegExp(words.join("|") + "|[&\\/\\\\#,+()$~%.'\":*?<>{}]", "g");
Then reuse rex for each string:
str = str.toLowerCase().replace(rex, "");
Note the additional escaping required because instead of a regular expression literal, we're using a string, so the backslashes (in the words array and in the final bit) need to be escaped, as does the " (because I used " for the string quotes).
The problem with this question is that im sure you have a very concrete idea in your mind of what you want to do, but the solution you have arrived at (removing un-informative letters before making a is-identical comparison) may not be the best for the comparison you want to do.
I think perhaps a better idea would be to use a different method comparison and a different datastructure than a string. A very simple example would be to condense your strings to sets with set('string') and then compare set similarity/difference. Another method might be to create a Directed Acyclic Graph, or sub-string Trei. The main point is that it's probably ok to reduce the information from the original string and store/compare that - however don't underestimate the value of storing the original string, as it will help you down the road if you want to change the way you compare.
Finally, if your strings are really really really long, you might want to use a perceptual hash - which is like an MD5 hash except similar strings have similar hashes. However, you will most likely have to roll your own for short strings, and define what you think is important data, and what is superfluous.
In this JavaScript code if the variable data does not have that character . then what will split return?
x = data.split('.');
Will it be an array of the original string?
Yes, as per ECMA262 15.5.4.14 String.prototype.split (separator, limit), if the separator is not in the string, it returns a one-element array with the original string in it. The outcome can be inferred from:
Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any substring in the returned array, but serve to divide up the String value.
If you're not happy inferring that, you can follow the rather voluminous steps at the bottom and you'll see that's what it does.
Testing it, if you type in the code:
alert('paxdiablo'.split('.')[0]);
you'll see that it outputs paxdiablo, the first (and only) array element. Running:
alert('pax.diablo'.split('.')[0]);
alert('pax.diablo'.split('.')[1]);
on the other hand will give you two alerts, one for pax and one for diablo.
.split() will return an array. However,
The value you are splitting needs to be a string.
If the value you are splitting doesn't contain the separator, and the value ends up being an integer (or something other than a string) the call to .split() will throw an error:
Uncaught TypeError: values.split is not a function.
For example, if you are loading in a comma-separated list of ID's, and the record has only has one ID (ex. 42), and you attempt to split that list of ID's, you will get the above error since the value you are splitting is considered an int; not a string.
You may want to preface the value you are splitting with .toString():
aValueToSplit.toString().split('.');
Quick. I have a string: #user 9#I'm alive! and I want to be about to pull out "user 9".
So far Im doing:
if(variable.match(/\#/g)){
console.log(variable):
}
But the output is still the full line.
Use .split(), in order to pull out your desired item.
var variable = variable.split("#");
console.log(variable[1]);
.split() turns your string into an array, with the first variable as a separator.
Of course, if you just want regex alone, you could do:
console.log(variable.match(/([^#]+)/g));
This will again give you an array of the items, but a smaller one as it doesn't use the empty value before the hash as an item. Further, as stated by #Stephen P, you will need to use a capture group(()) to capture the items you want.
Try something more along these lines...
var thetext="#user 9#I'm alive!";
thetext.match(/\#([^#]+)\#/g);
You want to introduce a capturing group (the part in the parentheses) to collect the text in between the pound signs.
You may want to use .exec() instead of .match() depending on what you're doing.
Also see this answer to the SO question "How do you access the matched groups in a javascript regex?"
How do I check if a variable contains Chinese or Japanese characters? I know that this line works:
if (document.body.innerText.match(/[\u3400-\u9FBF]/))
I need to do the same thing not for the document but for a single variable.
.match is a string method. You can apply it to anything that contains string. And, of course, to arbitrary variable.
In case you have something that is not string, most objects define .toString() method that converts its content to some reasonable stringified form. When you retrieve selection from page, you get selection object. Convert it to string and then use match on it: sel.toString().match(...).
afaik you can to the same with a variable... document.body.innerText just returns the text of the body. Therefore you can just do
myvar.match(...)
Here's an example: http://snipplr.com/view/15357/
So I'm trying to split a string in javacript, something that looks like this:
"foo","super,foo"
Now, if I use .split(",") it will turn the string into an array containing [0]"foo" [1]"super [2]foo"
however, I only want to split a comma that is between quotes, and if I use .split('","'), it will turn into [0]"foo [1]super,foo"
Is there a way I can split an element expressing delimiters, but have it keep certain delimiters WITHOUT having to write code to concatenate a value back onto the string?
EDIT:
I'm looking to get [0]"foo",[1]"super,foo" as my result. Essentially, the way I need to edit certain data, I need what is in [0] to never get changed, but the contents of [1] will get changed depending on what it contains. It will get concatenated back to look like "foo", "I WAS CHANGED" or it will indeed stay the same if the contents of [1] where not something that required a change
Try this:
'"foo","super,foo"'.replace('","', '"",""').split('","');
For the sake of discussion and benefit of everyone finding this page is search of help, here is a more flexible solution:
var text = '"foo","super,foo"';
console.log(text.match(/"[^"]+"/g));
outputs
['"foo"', '"super,foo"']
This works by passing the 'g' (global) flag to the RegExp instance causing match to return an array of all occurrences of /"[^"]"/ which catches "foo,bar", "foo", and even "['foo', 'bar']" ("["foo", "bar"]" returns [""["", "", "", ""]""].)