Regex to remove everything outside the { }
for example:
before:
|loader|1|2|3|4|5|6|7|8|9|{"data" : "some data" }
after:
{"data" : "some data" }
with #Marcelo's regex this works but not if there are others {} inside the {} like here:
"|loader|1|2|3|4|5|6|7|8|9|
{'data':
[
{'data':'some data'}
],
}"
This seems to work - What language are you using - Obviously Regex... but what server side - then I can put it into a statement for you
{(.*)}
You want to do:
Regex.Replace("|loader|1|2|3|4|5|6|7|8|9|{\"data\" : \"some data\" }", ".*?({.*?}).*?", "$1");
(C# syntax, regex should be fine in most languages afaik)
in javascript you can try
s = '|loader|1|2|3|4|5|6|7|8|9|{"data" : "some data" }';
s = s.replace(/[^{]*({[^}]*})/g,'$1');
alert(s);
of course this will not work if "some data" has curly braces so the solution highly depends on your input data.
I hope this will help you
Jerome Wagner
You can do something like this in Java:
String[] tests = {
"{ in in in } out", // "{ in in in }"
"out { in in in }", // "{ in in in }"
" { in } ", // "{ in }"
"pre { in1 } between { in2 } post", // "{ in1 }{ in2 }"
};
for (String test : tests) {
System.out.println(test.replaceAll("(?<=^|\\})[^{]+", ""));
}
The regex is:
(?<=^|\})[^{]+
Basically we match any string that is "outside", as defined as something that follows a literal }, or starting from the beginning of the string ^, until it reaches a literal{, i.e. we match [^{]+, We replace these matched "outside" string with an empty string.
See also
regular-expressions.info/Lookarounds
(?<=...) is a positive lookbehind
A non-regex Javascript solution, for nestable but single top-level {...}
Depending on the problem specification (it isn't exactly clear), you can also do something like this:
var s = "pre { { { } } } post";
s = s.substring(s.indexOf("{"), s.lastIndexOf("}") + 1);
This does exactly what it says: given an arbitrary string s, it takes its substring starting from the first { to the last } (inclusive).
For those who searching this for PHP, only this one worked for me:
preg_replace("/.*({.*}).*/","$1",$input);
Related
Basically I am working with a string that is a json string in python but when used in javascript it has the "'" tags instead of double quotes and I would like to turn it into a real json (by using the JSON.parse()) but there are some quotation marks in the middle of the sentences (because I replaced the "'" for double marks).
Example: '{"author": "Jonah D"Almeida", ... }'
(I want to replace the one in between D and Almeida)
As it already has quotation marks around the whole sentence, javascript gives an error because it can't create a json out of it and so, to solve that basically I want to replace the quotation mark in the middle of the sentence for a ' (single mark) but only if it has letters preceeding and following the quotation mark.
My thought: myString.replace('letter before ... " ... letter after', "'")
Any idea how can I get the right expression for this? Basically I just want to know the regex expression the check if before and after the " quote it has letters, and if yes, change it to single mark (').
The OP ... "Basically I am working with a string that is a json string"
The above example is not what the OP refers to as json string. The OP's example data string already is invalid JSON.
Thus the first thing was to fix the process which generates such data.
Because ...
"parsing valid JSON data will return a perfectly valid object, and in case of the OP's use case a correctly escaped string value as well. "
... proof ...
const testSample_A = { author: "Jonah D'Almeida" };
const testSample_B = { author: 'Jonah D"Almeida' };
const testSample_C = { author: 'Jonah D\'Almeida' };
const testSample_D = { author: "Jonah D\"Almeida" };
console.log({
testSample_A,
testSample_B,
testSample_C,
testSample_D,
});
console.log('JSON.stringify(...) ... ', {
testSample_A: JSON.stringify(testSample_A),
testSample_B: JSON.stringify(testSample_B),
testSample_C: JSON.stringify(testSample_C),
testSample_D: JSON.stringify(testSample_D),
});
console.log('JSON.parse(JSON.stringify(...)) ... ', {
testSample_A: JSON.parse(JSON.stringify(testSample_A)),
testSample_B: JSON.parse(JSON.stringify(testSample_B)),
testSample_C: JSON.parse(JSON.stringify(testSample_C)),
testSample_D: JSON.parse(JSON.stringify(testSample_D)),
});
.as-console-wrapper { min-height: 100%!important; top: 0; }
Edit
A sanitizing task which exactly follows the OP's requirements nevertheless can be achieved based on a regex which features both a positive lookahead and a positive lookbehind ... either for basic latin only /(?<=\w)"(?=\w)/gm or more international with unicode escapes ... /(?<=\p{L})"(?=\p{L})/gmu
console.log('Letter unicode escapes ...', `
{"author": "Jonah D"Almeida", ... }
{"author": "Jon"ah D"Almeida", ... }
{"author": "Jon"ah D"Alme"ida", ... }`
.replace(/(?<=\p{L})"(?=\p{L})/gmu, '\\"')
);
console.log('Basic Latin support ...', `
{"author": "Jonah D"Almeida", ... }
{"author": "Jon"ah D"Almeida", ... }
{"author": "Jon"ah D"Alme"ida", ... }`
.replace(/(?<=\w)"(?=\w)/gm, '\\"')
);
console.log(
'sanitized and parsed string data ...',
JSON.parse(`[
{ "author": "Jonah D"Almeida" },
{ "author": "Jon"ah D"Almeida" },
{ "author": "Jon"ah D"Alme"ida" }
]`.replace(/(?<=\p{L})"(?=\p{L})/gmu, '\\"'))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
I may just be being thick here but I don't understand why I am receiving this error. Outside of the function the .test() works fine. But inside, I get the error. Was thinking it was something to do with the scope of the .test() function but am I just missing something blindingly obvious here?
function cFunctionfirst() {
firstField = document.getElementById("sname_input_first").value;
document.getElementById("demo").innerHTML = "first: " + firstField;
console.log(firstField);
var regex = "!##$£%^&*()+=[]\\\';,./{}|\":<>?";
if(regex.test(firstField)){
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
That's because regex is not a RegExp object, but just a string. It should be declared as such (remember to escape special characters using \):
var regex = /[!##\$£%\^&\*\(\)\+=\[\]\\\';,\.\/\{\}\|":<>\?]/;
Not only have I escaped some special regex characters, but you will need to wrap the entire selection inside unescaped [ and ] brackets, so that you test against a set of characters.
p/s: These are the set characters that need to be escaped: \ ^ $ * + ? . ( ) | { } [ ]
See proof-of-concept example:
function cFunctionfirst(value) {
var regex = /[!##\$£%\^&\*\(\)\+=\[\]\\\';,\.\/\{\}\|":<>\?]/;
if(regex.test(value)){
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
cFunctionfirst('Legal string');
cFunctionfirst('Illegal string #$%');
Alternatively, if you don't want to manually escape the characters, you can either use a utility method to do it, or use an ES6 non-regex approach, which is probably a lot less efficient: checkout the JSPerf test I have made. Simply add the blacklisted characters literally in a string, split it, and then use Array.prototype.some to check if the incoming string contains any of the blacklisted characters:
function cFunctionfirst(value) {
var blacklist = '!##$£%^&*()+=[]\\\';,./{}|":<>?'.split('');
if (blacklist.some(char => value.includes(char))) {
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
cFunctionfirst('Legal string');
cFunctionfirst('Illegal string #$%');
I'm trying to parse a QML file with Javascript, and make a JSON out of it.
I've encountered a problem that I cannot solve.
I'm trying to replace every string of the file which isn't already between " and put it between double ".
So if I have some strings like
Layout.fillHeight: true
height: 200
color: "transparent"
should become
"Layout.fillHeight": "true"
"height": 200"
"color": "transparent"
Here's the regex I've written, failingly miserably:
/((\S\.\S)|\w+?)(?![^"]*\")/g
(\S.\S)|w+? take every string (considering also words with . between them
Two problems:
If a line contains any string between 2 ", any words of that line is not considered.
With replace() I cannot replace the string because $1 or $2 are not containing the exact string I want to replace.
I'm not great with Regex, so if you guys could help me would be appreciated.
Here is a Notepad++ solution using two replacements. First double quote the keys, if necessary:
Find:
^([^":]+):
Replace:
"$1"
Then quote the values, again if necessary:
Find:
:\s+([^"]+)$
Replace:
"$1"
Something like this?
Gets rid of all the quotes, reinserts at the start and end of each line, and goes on to replace the colon and space with the rest
let string = `Layout.fillHeight: true
height: 200
color: "transparent"`
console.log(string.replace(/\"/g, "").replace(/^|$/gm, "\"").replace(/\:\ /gm, "\": \""))
As an alternative, if they are in an array format to begin with:
function quotify(string) {
let regex = /^\s*"?(.*?)"?:\s*"?(.*?)"?$/,
matches = regex.exec(string);
return `"${matches[1]}": "${matches[2]}"`;
}
let strings = ['Layout.fillHeight: true',
'height: 200',
'color: "transparent"'
],
quotedStrings = [];
strings.forEach((string) => {
quotedStrings.push(quotify(string))
})
let jsonString = JSON.parse(`{${quotedStrings}}`);
console.log(jsonString)
Here is a more involved solution that quotes only strings that aren't already quoted.
var str = `Layout.fillHeight: true
height: 200
color: "transparent"`;
var result = str.split(/\n/).map((v) => {
return v.split(/\s*\:\s*/).map((vv) => {
if(!isNaN(vv) || vv == "true" || vv == "false" || (vv[0] == '"' && vv[vv.length - 1] == '"')){
return vv;
}
return `"${vv}"`;
}).join(":");
}).join(",");
console.log(JSON.parse(`{${result}}`));
I've probably made things more complicated then they have to be and it will most probably fail when processing constructs that I haven't considered.
I tried the REGEX #Tushar provided:
(\S+)\s*:\s*(\S+)
It's the one I was looking for. Thank you for your contribution.
I have a file the contents of which are formatted as follows:
{
"title": "This is a test }",
"date": "2017-11-16T20:47:16+00:00"
}
This is a test }
I'd like to extract just the JSON heading via regex (which I think is the most sensible approach here). However, in the future the JSON might change, (e.g. it might have extra fields added, or the current fields could change), so I'd like to keep the regex flexible. I have tried with the solution suggested here, however, that seems a bit too simplistic for my use case: in fact, the above "tricks" the regex, as shown in this regex101.com example.
Since my knowledge of regex is not that advanced, I'd like to know whether there's a regex approach that is able to cover my use case.
Thank you!
You can check for the first index of \n} to get the sub-string:
s = `{
"title": "This is a test }",
"date": "2017-11-16T20:47:16+00:00"
}
This is a test }
}`
i = s.indexOf('\n}')
if (i > 0) {
o = JSON.parse(s = s.slice(0, i + 2))
console.log(s); console.log(o)
}
or a bit shorter with RegEx:
s = `{
"title": "This is a test }",
"date": "2017-11-16T20:47:16+00:00"
}
This is a test }
}`
s.replace(/.*?\n}/s, function(m) {
o = JSON.parse(m)
console.log(m); console.log(o)
})
If the JSON always starts with { at the left margin and ends with } at the right margin, with everything else indented as you show, you can use the regular expression
/^{.*?^}$/ms
The m modifier makes ^ and $ match the beginning and end of lines, not the whole string. The s modifier allows . to match newlines.
var str = `{
"title": "This is a test }",
"date": "2017-11-16T20:47:16+00:00"
}
This is a test }
`;
var match = str.match(/^{.*?^}$/ms);
if (match) {
var data = JSON.parse(match[0]);
}
console.log(data);
My Goal
What I want to do is something similar to this:
let json_obj = {
hello: {
to: 'world'
},
last_name: {
john: 'smith'
},
example: 'a ${type}', // ${type} -> json_obj.type
type: 'test'
}
// ${hello.to} -> json_obj.hello.to -> "word"
let sample_text = 'Hello ${hello.to}!\n' +
// ${last_name.john} -> json_obj.last_name.john -> "smith"
'My name is John ${last_name.john}.\n' +
// ${example} -> json_obj.example -> "a test"
'This is just ${example}!';
function replacer(text) {
return text.replace(/\${([^}]+)}/g, (m, gr) => {
gr = gr.split('.');
let obj = json_obj;
while(gr.length > 0)
obj = obj[gr.shift()];
/* I know there is no validation but it
is just to show what I'm trying to do. */
return replacer(obj);
});
}
console.log(replacer(sample_text));
Until now this is pretty easy to do.
But if $ is preceded by a backslash(\) I don't want to replace the thing between brackets. For example: \${hello.to}would not be replaced.
The problem grows up when I want to be able to escape the backslashes. What I mean by escaping the backslashes is for example:
\${hello.to} would become: ${hello.to}
\\${hello.to} would become: \world
\\\${hello.to} would become: \${hello.to}
\\\\${hello.to} would become: \\${hello.to}
etc.
What I've tried?
I didn't try many thing so far cause I've absolutely no idea how to achieve that since from what I know there is no lookbehind pattern in javascript regular expressions.
I hope the way I explained it is clear enoughto be understood andI hope someone has a solution.
I recommend you to solve this problem in separate steps :)
1) First step:
Simplify backslashes of your text replacing all occurrences of "\\" for "". This will eliminate all redundancies and make the token replacement part easier.
text = text.replace(/\\\\/g, '');
2) Second step:
To replace the tokens of the text, use this regex: /[^\\](\${([^}]+)})/. This one will not permit tokens that have with \ before them. Ex: \${hello.to}.
Here is you code with the new expression:
function replacer(text) {
return text.replace(/[^\\](\${([^}]+)})/, (m, gr) => {
gr = gr.split('.');
let obj = json_obj;
while(gr.length > 0)
obj = obj[gr.shift()];
/* I know there is no validation but it
is just to show what I'm trying to do. */
return replacer(obj);
});
}
If you still have any problems, let me know :)