Regex: Match all double quotes in string and add comma - javascript

I need to turn a string like this:
' query: "help me" distance: "25" count: "50" '
into a javascript object or json string that looks like this:
'{ query: "help me", distance: "25", count: "50" }'

Something like this, perhaps:
var query = ' query: "help me" distance: "25" count: "50"';
query = '{' + query.replace(/"(?=\s)/g, '",') + '}';
console.log(query);
With that lookahead expression I just put comma after all the double quotation marks that are followed by whitespace symbol.
Having said that, I'd strongly suggest reconsidering the method of constructing your params: somehow I feel you could get away with simple JSON.stringify-ing the params. It will be much more bulletproof - and easier to parse too.

Related

Regex to split with multiple separators of one or several characters

I want to split a string with separators ' or .. WHILE KEEPING them:
"'TEST' .. 'TEST2' ".split(/([' ] ..)/g);
to get:
["'", "TEST", "'", "..", "'", "TEST2", "'" ]
but it doesn't work: do you know how to fix this ?
The [' ] .. pattern matches a ' or space followed with a space and any two chars other than line break chars.
You may use
console.log("'TEST' .. 'TEST2' ".trim().split(/\s*('|\.{2})\s*/).filter(Boolean))
Here,
.trim() - remove leading/trailing whitespace
.split(/\s*('|\.{2})\s*/) - splits string with ' or double dot (that are captured in a capturing group and thus are kept in the resulting array) that are enclosed in 0+ whitespaces
.filter(Boolean) - removes empty items.
I m not sure it will work for every situations, but you can try this :
"'TEST' .. 'TEST2' ".replace(/(\'|\.\.)/g, ' $1 ').trim().split(/\s+/)
return :
["'", "TEST", "'", "..", "'", "TEST2", "'"]
Splitting while keeping the delimiters can often be reduced to a matchAll. In this case, /(?:'|\.\.|\S[^']+)/g seems to do the job on the example. The idea is to alternate between literal single quote characters, two literal periods, or any sequence up to a single quote that starts with a non-space.
const result = [..."'TEST' .. 'TEST2' ".matchAll(/(?:'|\.\.|\S[^']+)/g)].flat();
console.log(result);
Another idea that might be more robust even if it's not a single shot regex is to use a traditional, non-clever "stuff between delimiters" pattern like /'([^']+)'/g, then flatMap to clean up the result array to match your format.
const s = "'TEST' .. 'TEST2' ";
const result = [...s.matchAll(/'([^']+)'/g)].flatMap(e =>
["'", e[1], "'", ".."]
).slice(0, -1);
console.log(result);

JSON Remove trailing comma from last object

This JSON data is being dynamically inserted into a template I'm working on. I'm trying to remove the trailing comma from the list of objects.
The CMS I'm working in uses Velocity, which I'm not too familiar with yet. So I was looking to write a snippet of JavaScript that detects that trailing comma on the last object (ITEM2) and removes it. Is there a REGEX I can use to detect any comma before that closing bracket?
[
{
"ITEM1":{
"names":[
"nameA"
]
}
},
{
"ITEM2":{
"names":[
"nameB",
"nameC"
]
}
}, // need to remove this comma!
]
You need to find ,, after which there is no any new attribute, object or array.
New attribute could start either with quotes (" or ') or with any word-character (\w).
New object could start only with character {.
New array could start only with character [.
New attribute, object or array could be placed after a bunch of space-like symbols (\s).
So, the regex will be like this:
const regex = /\,(?!\s*?[\{\[\"\'\w])/g;
Use it like this:
// javascript
const json = input.replace(regex, ''); // remove all trailing commas (`input` variable holds the erroneous JSON)
const data = JSON.parse(json); // build a new JSON object based on correct string
Try the first regex.
Another approach is to find every ,, after which there is a closing bracket.
Closing brackets in this case are } and ].
Again, closing brackets might be placed after a bunch of space-like symbols (\s).
Hence the regexp:
const regex = /\,(?=\s*?[\}\]])/g;
Usage is the same.
Try the second regex.
For your specific example, you can do a simple search/replace like this:
,\n]$
Replacement string:
\n]
Working demo
Code
var re = /,\n]$/;
var str = '[ \n { \n "ITEM1":{ \n "names":[ \n "nameA"\n ]\n }\n },\n { \n "ITEM2":{ \n "names":[ \n "nameB",\n "nameC"\n ]\n }\n },\n]';
var subst = '\n]';
var result = str.replace(re, subst);
Consider the Json input = [{"ITEM1":{"names":["nameA"]}},{"ITEM2":{"names":["nameB","nameC"]}},] without whitespaces.
I suggest a simple way using substring.
input = input.substring(0, input.length-2);
input = input + "]";
I developped a simple but useful logic for this purpose - you can try this.
Integer Cnt = 5;
String StrInput = "[";
for(int i=1; i<Cnt; i++){
StrInput +=" {"
+ " \"ITEM"+i+"\":{ "
+ " \"names\":["
+ " \"nameA\""
+ "]"
+"}";
if(i ==(Cnt-1)) {
StrInput += "}";
} else {
StrInput += "},";
}
}
StrInput +="]";
System.out.println(StrInput);

regex help in extracting values from a string

I have a string in javascript like
"some text #[14cd3:+Seldum Kype] things are going good for #[7f8ef3:+Kerry Williams] so its ok"
From this i want to extract the name and id for the 2 people. so data like -
[ { id: 14cd3, name : Seldum Kype},
{ id: 7f8ef3, name : Kerry Williams} ]
how can u use regex to extract this?
please help
var text = "some text #[14cd3:+Seldum Kype] things are going " +
"good for #[7f8ef3:+Kerry Williams] so its ok"
var data = text.match(/#\[.+?\]/g).map(function(m) {
var match = m.substring(2, m.length - 1).split(':+');
return {id: match[0], name: match[1]};
})
// => [ { id: '14cd3', name: 'Seldum Kype' },
// { id: '7f8ef3', name: 'Kerry Williams' } ]
// For demo
document.getElementById('output').innerText = JSON.stringify(data);
<pre id="output"></pre>
Get the id from Group index 1 and name from group index 2.
#\[([a-z\d]+):\+([^\[\]]+)\]
DEMO
Explanation:
# Matches a literal # symbol.
\[ Matches a literal [ symbol.
([a-z\d]+) Captures one or more chars lowercase alphabets or digits.
:\+ Matches :+ literally.
([^\[\]]+) Captures any character but not of [ or ] one or more times.
\] A literal ] symbol.
Try the following, the key is to properly escape reserved special symbols:
#\[([\d\w]+):\+([\s\w]+)\]

Parsing "relaxed" JSON without eval

What is the easiest method to parse "relaxed" JSON but avoid evil eval?
The following throws an error:
JSON.parse("{muh: 2}");
since proper JSON should have keys quoted: {"muh": 2}
My use case is a simple test interface I use to write JSON commands to my node server. So far I simply used eval as it's just a test application anyway. However, using JSHint on the whole project keeps bugging me about that eval. So I'd like a safe alternative that still allows relaxed syntax for keys.
PS: I don't want to write a parser myself just for the sake of the test application :-)
You could sanitize the JSON using a regular expression replace:
var badJson = "{muh: 2}";
var correctJson = badJson.replace(/(['"])?([a-z0-9A-Z_]+)(['"])?:/g, '"$2": ');
JSON.parse(correctJson);
You already know this, since you referred me here, but I figure it might be good to document it here:
I'd long had the same desire to be able to write "relaxed" JSON that was still valid JS, so I took Douglas Crockford's eval-free json_parse.js and extended it to support ES5 features:
https://github.com/aseemk/json5
This module is available on npm and can be used as a drop-in replacement for the native JSON.parse() method. (Its stringify() outputs regular JSON.)
Hope this helps!
This is what I ended up having to do. I extended #ArnaudWeil's answer and added support for having : appear in the values:
var badJSON = '{one : "1:1", two : { three: \'3:3\' }}';
var fixedJSON = badJSON
// Replace ":" with "#colon#" if it's between double-quotes
.replace(/:\s*"([^"]*)"/g, function(match, p1) {
return ': "' + p1.replace(/:/g, '#colon#') + '"';
})
// Replace ":" with "#colon#" if it's between single-quotes
.replace(/:\s*'([^']*)'/g, function(match, p1) {
return ': "' + p1.replace(/:/g, '#colon#') + '"';
})
// Add double-quotes around any tokens before the remaining ":"
.replace(/(['"])?([a-z0-9A-Z_]+)(['"])?\s*:/g, '"$2": ')
// Turn "#colon#" back into ":"
.replace(/#colon#/g, ':')
;
console.log('Before: ' + badJSON);
console.log('After: ' + fixedJSON);
console.log(JSON.parse(fixedJSON));
It produces this output:
Before: {one : "1:1", two : { three: '3:3' }}
After: {"one": "1:1", "two": { "three": "3:3" }}
{
"one": "1:1",
"two": {
"three": "3:3"
}
}
If you can't quote keys when writing the string, you can insert quotes before using JSON.parse-
var s= "{muh: 2,mah:3,moh:4}";
s= s.replace(/([a-z][^:]*)(?=\s*:)/g, '"$1"');
var o= JSON.parse(s);
/* returned value:[object Object] */
JSON.stringify(o)
/* returned value: (String){
"muh":2, "mah":3, "moh":4
}
You can also use Flamenco's really-relaxed-json (https://www.npmjs.com/package/really-relaxed-json) that goes a step further and allows no commas, dangling commas, comments, multiline strings, etc.
Here's the specification
http://www.relaxedjson.org
And some online parsers:
http://www.relaxedjson.org/docs/converter.html
Preloaded with the 'bad json'
{one : "1:1", two : { three: '3:3' }}
Bad JSON
Preloaded with even 'worse json' (no commas)
{one : '1:1' two : { three: '3:3' }}
Worse JSON
Preloaded with 'terrible json' (no commas, no quotes, and escaped colons)
{one : 1\:1 two : {three : 3\:3}}
Terrible JSON
[EDIT: This solution only serves for pretty simple objects and arrays, but does not do well with more complicated scenarios like nested objects. I recommend using something like jsonrepair to handle more interesting cases.]
I've modified Arnaud's solution slightly to handle periods in the keys, colons in the key values and arbitrary whitespace (although it doesn't deal with JSON object key values):
var badJson = `{
firstKey: "http://fdskljflksf",
second.Key: true, thirdKey:
5, fourthKey: "hello"
}`;
/*
\s*
any amount of whitespace
(['"])?
group 1: optional quotation
([a-z0-9A-Z_\.]+)
group 2: at least one value key character
(['"])?
group 3: optional quotation
\s*
any amount of whitespace
:
a colon literal
([^,\}]+)
group 4: at least one character of the key value (strings assumed to be quoted), ends on the following comma or closing brace
(,)?
group 5: optional comma
*/
var correctJson = badJson.replace(/\s*(['"])?([a-z0-9A-Z_\.]+)(['"])?\s*:([^,\}]+)(,)?/g, '"$2": $4$5');
JSON.parse(correctJson);

Regex remove everything thats outside { }

Regex to remove everything outside the { }
for example:
before:
|loader|1|2|3|4|5|6|7|8|9|{"data" : "some data" }
after:
{"data" : "some data" }
with #Marcelo's regex this works but not if there are others {} inside the {} like here:
"|loader|1|2|3|4|5|6|7|8|9|
{'data':
[
{'data':'some data'}
],
}"
This seems to work - What language are you using - Obviously Regex... but what server side - then I can put it into a statement for you
{(.*)}
You want to do:
Regex.Replace("|loader|1|2|3|4|5|6|7|8|9|{\"data\" : \"some data\" }", ".*?({.*?}).*?", "$1");
(C# syntax, regex should be fine in most languages afaik)
in javascript you can try
s = '|loader|1|2|3|4|5|6|7|8|9|{"data" : "some data" }';
s = s.replace(/[^{]*({[^}]*})/g,'$1');
alert(s);
of course this will not work if "some data" has curly braces so the solution highly depends on your input data.
I hope this will help you
Jerome Wagner
You can do something like this in Java:
String[] tests = {
"{ in in in } out", // "{ in in in }"
"out { in in in }", // "{ in in in }"
" { in } ", // "{ in }"
"pre { in1 } between { in2 } post", // "{ in1 }{ in2 }"
};
for (String test : tests) {
System.out.println(test.replaceAll("(?<=^|\\})[^{]+", ""));
}
The regex is:
(?<=^|\})[^{]+
Basically we match any string that is "outside", as defined as something that follows a literal }, or starting from the beginning of the string ^, until it reaches a literal{, i.e. we match [^{]+, We replace these matched "outside" string with an empty string.
See also
regular-expressions.info/Lookarounds
(?<=...) is a positive lookbehind
A non-regex Javascript solution, for nestable but single top-level {...}
Depending on the problem specification (it isn't exactly clear), you can also do something like this:
var s = "pre { { { } } } post";
s = s.substring(s.indexOf("{"), s.lastIndexOf("}") + 1);
This does exactly what it says: given an arbitrary string s, it takes its substring starting from the first { to the last } (inclusive).
For those who searching this for PHP, only this one worked for me:
preg_replace("/.*({.*}).*/","$1",$input);

Categories

Resources