I'd like to know if I can parse & filter JSON text data based on a regular expression; say for example I have the following
{"key":"some:xx:yy", "value": 72311}
{"key":"some:xx:zz", "value": 72311}
{"key":"some:xx:qq", "value": 72311}
I want to select all tuples that have for the key field the same "some:xx:" part, how can I archive this using JSON in an 'elegant' way?
The example you gave contains three different objects. So you can use javascript to look for text in a property.
obj1 = {"key":"some:xx:yy", "value": 72311};
if (obj1.key.indexOf("xx") !== -1) { // obj1.key contains "xx"
//do something
}
If you have an array with those values, then you can simply loop through the array and look for "xx" just like above for each element of array. And when found, you can assign that element to another array. So at the end of the loop, "another array" will contain all elements that contain "xx".
If you don't insist on using RegEx, i can show you an example code for the loop. If you insist on RegEx, let me know and i will help you.. just kidding, let me know and i will delete my answer and silently leave this question :)
I'm going to give you a straight answer to the question you asked, but hopefully the complexity and the raft of caveats will convince you that JSON.parse is a better alternative.
You can write a regular expression to match a single such tuple, but you can't write a regular expression to match all such tuples.
To explain why, consider the regular expression that matches one:
var stringBody = '(?:[^"\\\\]|\\\\.)*';
var string = '"' + stringBody + '"';
var space = '[ \t\r\n\f]*';
var colon = space + ':' + space;
var comma = space + ',' + space;
var uglyRegex = '^(?:[^"]|' + string + ')*?'
+ '"key"' + colon + '"(some:xx:' + stringBody + ')"' + comma
+ '"value"' + colon + '((?:[^\},"]|' + string + ')*)';
This works by finding the minimal number of non-string or full-string tokens that precede a key whose value starts with some:xx: and then looks for a value.
It leaves the key in matching group 1 and the value in matching group 2.
Since it has to match starting at the beginning to correctly identify string token boundaries, it cannot be used in a 'g' flag match.
Caveats
It assumes certain characters in "key"'s property value are not \uABCD escaped.
It assumes characters in the property names "key" and "value" are not \uABCD escaped.
It requires the key and value to occur in that order.
It cannot tell what other properties occur in the same object.
Each of these problems could be worked around by making the regex much more complex, but with regular expressions, often, the only way to handle a corner case is to make the regex much bigger.
When incremental improvements to code explode the size, the code is unmaintainable.
Related
I can't seem to get my head around javascript regex, so I need your help!
I need to transform the following:
1234567891230
Into:
urn:epc:id:sgln:12345678.9123.0
I already did it with a normal javascript algorithm (see underneath), but we need to be able to configure this transformation. I just need it for the default configuration value!
Using slice it would be:
var result = "urn:epc:id:sgln:" + myString.slice(0, 8) + "." +
myString.slice(8, 12) + "." + myString.slice(12);
If you can include an explanation in your answer I would be grateful :)
If you want to use regex for this try the following:
var regex = /(\d{8})(\d{4})/;
var splittedNumber = regex.exec("123456789123");
var result = "urn:epc:id:sgln:"+splittedNumber[1]+"."+splittedNumber[2]+".0";
console.log(result);
But I would recommend the string split you did already.
You could use a regex to capture 3 groups for 12345678, 9123 and 0 and use a word boundary \b at the begin and at the end.
Then using slice you could get all elements but leave out the first element from the array returned by match because that contains the full match that we don't need.
After that you could join the elements from the array using the dot as the separator.
\b(\d{8})(\d{4})(\d)\b
let str = "1234567891230";
let prefix = "urn:epc:id:sgln:";
console.log(prefix + str.match(/\b(\d{8})(\d{4})(\d)\b/).slice(1).join("."));
I would like to split a spreadsheet cell reference (eg, A10, AB100, ABC5) to two string parts: column reference and row reference.
A10 => A and 10
AB100 => AB and 100 ...
Does anyone know how to do this by string functions?
var res = "AA123";
//Method 1
var arr = res.match(/[a-z]+|[^a-z]+/gi);
document.write(arr[0] + "<br>" + arr[1]);
//Method 2 (as deceze suggested)
var arr = res.match(/([^\d]+)(\d+)/);
document.write("<br>" + arr[1] + "<br>" + arr[2]);
//Note here [^\d] is the same as \D
This is easiest to do with a regular expression (regex). For example:
var ref = "AA100";
var matches = ref.match(/^([a-zA-Z]+)([1-9][0-9]*)$/);
if (matches) {
var column = matches[1];
var row = Number(matches[2]);
console.log(column); // "AA"
console.log(row); // 100
} else {
throw new Error('invalid ref "' + ref + '"');
}
The important part here is the regex literal, /^([a-zA-Z]+)([1-9][0-9]*)$/. I'll walk you through it.
^ anchors the regex to the start of the string. Otherwise you might match something like "123ABC456".
[a-zA-Z]+ matches one or more character from a-z or A-Z.
[1-9][0-9]* matches exactly one character from 1-9, and then zero or more characters from 0-9. This makes sure that the number you are matching never starts with zero (i.e. "A001" is not allowed).
$ anchors the regex to the end of the string, so that you don't match something like "ABC123DEF".
The parentheses around ([a-zA-Z]+) and ([1-9][0-9]*) "capture" the strings inside them, so that we can later find them using matches[1] and matches[2].
This example is strict about only matching valid cell references. If you trust the data you receive to always be valid then you can get away with a less strict regex, but it is good practice to always validate your data anyway in case your data source changes or you use the code somewhere else.
It is also up to you to decide what you want to do if you receive invalid data. In this example I make the script throw an error, but there might be better choices in your situation (e.g. prompt the user to enter another value).
With the code below, I have converted the following names into URL such as
Love & Relationships to http://domain.org/love-relationships
Career & Guidance to http://domain.org/career-guidance
filter('ampToDash', function(){
return function(text){
return text ? String(text).replace(/ & /g,'-'): '';
};
}).filter('dashToAmp', function(){
return function(text){
return text ? String(text).replace(/-/g,' & '): '';
};
})
However, I have a new set of names and I can't figure out how to do both at the same time.
Being Human to http://domain.org/being-human
Competitive Exams to http://domain.org/competitive-exams
filter('ampToDash', function(){
return function(text){
return text ? String(text).replace(/ /g,'-'): '';
};
}).filter('dashToAmp', function(){
return function(text){
return text ? String(text).replace(/-/g,' '): '';
};
})
How do I combine both the regex codes so it can work hand in hand?
You may also want to extend your replacement criteria to cover all "non-word" characters, instead of just accounting for the ones you're currently aware of (& and space). This would be more future-proof, and perhaps easier to reason with:
String(text).replace(/\W+/g, '-')
(\W+ means any sequence of non-word characters.)
Example:
'Jack & Jill went up the #$%#! hill'.replace(/\W+/g, '-')
Yields:
Jack-Jill-went-up-the-hill
And because there's loss of information (i.e. you don't know what exactly leads to a '-' by looking at the transformed string), a way you can find the original string is to simply store it and look up by the transformed string. To elaborate: You're probably going to be looking up some document from this new string (a "slug", as others pointed out). Store the slug along with the document and just look up the document (and its original title) from your database.
It looks like you simply want to change any instances of an ampersand with leading or trailing white-space or just white-space to a single hyphen. If so, you could just use the following expression :
// Replace any strings that have leading and trailing spaces or just a series of spaces
String(text).replace(/(\s+&\s+|\s+)/g,'-'): '';
Example
var input = ['Love & Relationships', 'Career & Guidance', 'Being Human', 'Competitive Exams'];
for (var i in input) {
var phrase = input[i];
console.log(phrase + ' -> ' + phrase.replace(/(\s+&\s+|\s+)/g, '-'));
}
I think you are looking for a lib that converts a string into a slug.
You can do this manually, but you'll probably have hard time covering other edge cases.
I would suggest you to use something like :
https://github.com/dodo/node-slug
Or check out this gist if you really want to stay with the regex way : https://gist.github.com/mathewbyrne/1280286
You have two separate problems:
how to 'slugify' a string
how to undo / reverse the slugify.
To answer 1: A generic slugify method would be something like: text.replace(/\W+/g, '-')
To answer 2: you can't. You have a function (ampToDash) that can produce the same output given different inputs. i.e. there is NO equivalent of dashToAmp any more.
Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));
I've grown spoiled by ColdFusion's lists, and have run across a situation or two where a comma delimited list shows up in Javascript. Is there an equivalent of listFindNoCase('string','list'), or a performant way to implement it in Javascript?
Oh, and it should be able to handle list items with commas, like:
( "Smith, John" , "Doe, Jane" , "etc..." )
That's what is really tripping me up.
FYI: jList's implementation: https://github.com/davidwaterston/jList
Although, this will fail your requirement that "it should be able to handle list items with commas"
listFind : function (list, value, delimiter) {
delimiter = (typeof delimiter === "undefined") ? "," : delimiter;
var i,
arr = list.split(delimiter);
if (arr.indexOf !== undefined) {
return arr.indexOf(value) + 1;
}
for (i = 0; i < list.length; i += 1) {
if (arr[i] === value) {
return i + 1;
}
}
return 0;
},
listFindNoCase : function (list, value, delimiter) {
delimiter = (typeof delimiter === "undefined") ? "," : delimiter;
list = list.toUpperCase();
value = String(value).toUpperCase();
return this.listFind(list, value, delimiter);
},
One relevant observation here is that CF lists themselves don't support the delimiter char also being part of the data. Your sample "list" of '"Smith, John", "Doe, Jane"' is a four-element comma-delimited list of '"Smith', 'John"', '"Doe', 'Jane"'. To fulfil your requirement here you don't want a JS equiv of CF's listFindNoCase(), because listFindNoCase() does not actually fulfill your requirement in from the CF perspective, and nothing native to CF does. To handle elements that have embedded commas, you need to use a different char as a delimiter.
TBH, CF lists are a bit rubbish (for the reason cited above), as they're only really useful in very mundane situations, which a) don't often crop up; b) aren't better served via an array anyhow. One observation to make here is you are asking about a performant solution here: not using string-based lists would be the first step in being performant (this applies equally to CF as it does to JS: CF's string-based lists are not at all performant).
So my first answer here would be: I think you ought to revise your requirement away from using lists, and look to use arrays instead.
With that in mind, how is the data getting to JS? Are you some how stuck with using a string-based list? If not: simply don't. If your source data is a string-based list, are you in the position to convert it to an array first? You're in trouble with the "schema" of your example list as I mentioned before: from CF's perspective you can't have a comma being both a delimiter and data. And you do have a bit of work ahead of you writing code to identify that a quoted comma is data, and a non-quoted comma is a delimiter. You should have a look around at CSV-parsing algorithms to deal with that sort of thing.
However if you can change the delimiter (to say a pipe or a semi-colon or something that will not show up in the data), then it's easy enough to turn that into an array (listToArray() in CF, or split() in JS). Then you can just use indexOf() as others have said.
For the sake of sh!ts 'n' giggles, if you were stuck with a string - provided you could change the delimiter - you could do this, I think:
use indexOf() to find the position of the first match of the substring in the string, you will need to use a regex to match the substring which is delimited by your delimiter char, or from the beginning of the string to a delimiter char, or from a delimiter char to the end of the string with no intermediary delimiter chars. I could come up with the regex for this if needs must. This will not be list-aware yet, but we know where it'll be in the string.
Take a substring of the original string from the beginning to the position indexOf() returned.
Use split() on that, splitting on the delimiter
The length of the ensuing array will be the position in the original list that the match was at.
However I stress you should not do that. Use an array instead of a string from the outset.
You can use indexOf combined with .toLowerCase()
var list = '"Smith, John" , "Doe, Jane" , "etc..."';
if(list.toLowerCase().indexOf('"Smith, John"'))
If you need an exact match, like "Smith" when "Smithson" exists, just pad the strings with your delimiter. For example, let's say your delimiter is a semi-colon (because you have commas in your string), pad the left and right sides of your string like so:
";Smith, John;Doe, Jane;"
Also pad the search value, so if you're looking for Smith, the value would become:
";Smith;"
.toLowerCase().indexOf() would return -1 (not found). But ";Smith, John;" would return 0