I'm running a Node server that receives a plain utf8 text and parses the content to JSON. Part of the JSON will be the body of an HTML document.
The problem is that when the input has characters such as "ä" or " ' ", the HTML document gets all crazy. I guess it has to do with the coding/decoding of the parser for these special characters.
Any ideas regarding this ?
[EDIT]
The parsing and JSON object are basically this:
var string = <mail_body><html> html code here...<html><mail_body>
var mail_body = string.split("<mail_body>")[1]
var obj = {
"subject": "subject 123",
"mail_body": mail_body
}
You can use this for the "'"
var escapedText = text.replace(/\\'/g, "\\'");
and use a unicode for the "letter a with eyes"
like this -> \u2665
https://mathiasbynens.be/notes/javascript-escapes
The most important thing you need to do is to escape the incoming string to eliminate quotes that will break your JSON, which is the only significant problem I would expect to see with Node - browsers have a slightly harder time. From your input you're looking at something like this:
var string = <mail_body><html> html code here...<html><mail_body>
var mail_body = string.split("<mail_body>")[1]
mail_body = mail_body.replace(/\"/g, '\\"'); // regex for global replace, have to escape quotes
That should get you a mail body that doesn't unexpectedly end and break the rest of your JSON.
Related
I want to replace the smart quotes like ‘, ’, “ and ” to regular quotes. Also, I wanted to replace the ©, ® and ™. I used the following code. But it doesn't help.
Kindly help me to resolve this issue.
str.replace(/[“”]/g, '"');
str.replace(/[‘’]/g, "'");
Use:
str = str.replace(/[“”]/g, '"');
str = str.replace(/[‘’]/g, "'");
or to do it in one statement:
str = str.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
In JavaScript (as in many other languages) strings are immutable - string "replacement" methods actually just return the new string instead of modifying the string in place.
The MDN JavaScript reference entry for replace states:
Returns a new string with some or all matches of a pattern replaced by a replacement.
…
This method does not change the String object it is called on. It simply returns a new string.
replace return the resulting string
str = str.replace(/["']/, '');
The OP doesn't say why it isn't working, but there seems to be problems related to the encoding of the file. If I have an ANSI encoded file and I do:
var s = "“This is a test” ‘Another test’";
s = s.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
document.writeln(s);
I get:
"This is a test" "Another test"
I converted the encoding to UTF-8, fixed the smart quotes (which broke when I changed encoding), then converted back to ANSI and the problem went away.
Note that when I copied and pasted the double and single smart quotes off this page into my test document (ANSI encoded) and ran this code:
var s = "“This is a test” ‘Another test’";
for (var i = 0; i < s.length; i++) {
document.writeln(s.charAt(i) + '=' + s.charCodeAt(i));
}
I discovered that all the smart quotes showed up as ? = 63.
So, to the OP, determine where the smart quotes are originating and make sure they are the character codes you expect them to be. If they are not, consider changing the encoding of the source so they arrive as “ = 8220, ” = 8221, ‘ = 8216 and ’ = 8217. Use my loop to examine the source, if the smart quotes are showing up with any charCodeAt() values other than those I've listed, replace() will not work as written.
To replace all regular quotes with smart quotes, I am using a similar function. You must specify the CharCode as some different computers/browsers default settings may identify the plain characters differently ("",",',').
Using the CharCode with call the ASCII character, which will eliminate the room for error across different browsers, and operating systems. This is also helpful for bilingual use (accents, etc.).
To replace smart quotes with SINGLE QUOTES
function unSmartQuotify(n){
var name = n;
var apos = String.fromCharCode(39);
while (n.indexOf("'") > -1)
name = name.replace("'" , apos);
return name;
}
To find the other ASCII values you may need. Check here.
I have this JSON string:
{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}
I can parse it when I do this:
pg = JSON.parse(myJSONString.replace(/\\/g, ""));
But when I access pg.text the value is:
Line 1nLine 2.
But I want the value to be exactly:
Line 1\nLine 2
The JSON string is valid in terms of the target program which interprets it as part of a larger command. It's Minecraft actually. Minecraft will render this as you would expect with Line 1 and Line 2 on separate lines.
But I'm making a editor that needs to read the \n back in as is. Which will be displayed in an html input field.
Just as some context here is the full command which contains some JSON code.
/summon zombie ~ ~1 ~ {HandItems:[{id:"minecraft:written_book",Count:1b,tag:{title:"",author:"",pages:["{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}"]}},{}]}
Try adding [1] at /\[1]/g but works for single slash only, but since the type of the quoted json i think is a string when you parse that it slash will automatically be removed so you don't even need to use replace. and \n will remain as.
var myString ='{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}';
console.log(JSON.parse(myString.replace(/\\[1]/g, ""))); //adding [1] will remove single slash \\n -> \n
var myString =JSON.parse(myString.replace(/\\[1]/g, ""));
console.log(myString.text);
Your string is not valid JSON, and ideally you should fix the code that generates it, or contact the provider of it.
If the issue is that there is always one backslash too many, then you could do this:
// Need to escape the backslashes in this string literal to get the actual input:
var myJSONString = '{\\"text\\":\\"Line 1\\\\nLine 2\\",\\"color\\":\\"black\\"}';
console.log(myJSONString);
// Only replace backslashes that are not preceded by another:
var fixedJSON = myJSONString.replace(/([^\\])\\/g, "$1");
console.log(fixedJSON);
var pg = JSON.parse(fixedJSON);
console.log(pg);
Suppose I have an object variable:
var obj = {
key: '\"Hello World\"'
}
Then I tried parse it to string by using JSON.stringify in Chrome devtools console:
JSON.stringify(obj) // "{"key":"\"Hello World\""}"
I get the result "{"key":"\"Hello World\""}". Then I give it to a string
var str = '{"key":"\"Hello World\""}'
At least I try to convert it back to obj:
JSON.parse(str);
but the browser tell me wrong Uncaught SyntaxError
What confused me is why this is wrong? I get the string from an origin object and I just want turn it back.
How can I fix this problem? If I want do the job like convert obj to string and return it back, how can I do?
You're tried to convert your JSON into a string literal by wrapping it in ' characters, but \ characters have special meaning inside JavaScript string literals and \" gets converted to " by the JavaScript parser before it reaches the JSON parser.
You need to escape the \ characters too.
var str = '{"key":"\\"Hello World\\""}'
That said, in general, it is better to not try to embed JSON in JavaScript string literals only to parse them with JSON.parse in the first place. JSON syntax is a subset of JavaScript so you can use it directly.
var result = {"key":"\"Hello World\""};
try:
var str = '{"key":"\\"Hello World\\""}';
I am facing a very weird problem with Javascript. When I extract text from DOM and try to decode HTML entities, it's not working. However, when I assign the value directly in the code, it's working just fine.
I just don't get why the string is treated differently in both cases. I have tested in FireFox and Chrome and both produce the same result.
Update:
The correct output should be %7B (after decoding the string). That means that when I assign the value directly to the variable it's working correctly, but when extracted from DOM, it's not. How can I extract the text from DOM and decode it so it produces "%7B" ?
DEMO: jsFiddle
HTML:
<div class="myclass">\u00257B</div>
Javascript Code:
$(document).ready(function(){
//Extracting the text from DOM
var myText = $(".myclass").html();
//decoding HTML entities
var decodedText = $("<div />").html(myText).text();
//alerting the decoded text
alert(decodedText); // output: \u00257B
//assigning the value directly to the variable
var myText2 = "\u00257B";
//decoding HTML entities
var decodedText2 = $("<div />").html(myText2).text();
//alerting decoded text
alert(decodedText2); // output: %7B
});
The reason myText2 produces a different result is because the backslash in string literals is an escape character.
to escape a backslash, simply use it twice:
myText2 = "\\u00257b";
Here is a some further information about escape characters in JavaScript
EDIT
There's probably a better way, but this will work: (eval is generally frowned upon and has security implications if the value from your text is uncontrolled input)
myText = eval("\"" + decodedText + "\"")
I think this is because when you extract the string from the dom the "\u" is escaped.
If you do var myText2 = "\\u00257B"; you'll get the same result
http://jsfiddle.net/9n6t5qxr/1/
if you do console.log('\u0025') it prints %, which is why you are seeing %7B
I am running into an odd little problem with parsing some JSON which has quotes in it. I am using the native JSON.stringify and JSON.parse functions to do this. If I stringify an object which an object which has quote marks in it, they are escaped as one would expect. If I then parse this back into an object, it again works fine.
The problem is occurring where I stringify, then print the object to a page, then parse the resulting string. If I try to do this, the parse function fails as stringify has only put single slashes before each of the offending quote marks.
The reason I need to achieve this is I am working on an application that dynamically loads content stored in a database as JSON strings. The strings at some point need to be printed onto the page somewhere so that the javascript can find them and build the page based on their contents. I need some way of robustly passing the object into and out of strings which will not fail if a user inputs the wrong characters!
I can solve this for the moment by inserting extra slashes into the code with a replace call, but I was wondering if there is a better way to handle this?
I have put together a couple of jsfiddles to illustrate what I am trying to describe:
http://jsfiddle.net/qwUAJ/ (Stringify then parse back)
var ob = {};
ob["number1"] = 'Number "1"';
ob["number2"] = 'Number 2';
ob["number3"] = 'Number 3';
var string = JSON.stringify(ob);
var reOb = JSON.parse('{"number1":"Number \"1\"","number2":"Number 2","number3":"Number 3"}');
$('div').html(string);
http://jsfiddle.net/a3gBf/4/ (Stringify, then print, then parse back)
// Make an object
var ob = {};
ob["number1"] = 'Number "1"';
ob["number2"] = 'Number 2';
ob["number3"] = 'Number 3';
// Turn the object into a JSON string
var string = JSON.stringify(ob);
// Printing the string outputs
// {"number1":"Number \"1\"","number2":"Number 2","number3":"Number 3"}
$('.stringified').html(string);
// Attempt to turn the printed string back into an object
var reOb = JSON.parse('{"number1":"Number \"1\"","number2":"Number 2","number3":"Number 3"}');
// This fails due to the single escaped quote marks.
Thank you for any help in advance!
This is a problem which arises from re-evaluating a String without first converting it back into a string literal, so the meaning changes if it is even still valid.
You need to consider what does '\"' as a literal actually mean? The answer is ", without the \. Why?
\" resolves to "
If you want to have \" as the result of the literal, you need to write '\\\"'
\\ resolves to \
\" resolves to "
So basically, the extra slashes are required to escape any characters with special meaning in string literals.
If you did var reOb = JSON.parse($('.stringified').html()); it would work fine as is.
Consider further
str = '\\\"\\\''; // \"\'
str = '\"\''; // "'
str = '"''; // SyntaxError: Unexpected token ILLEGAL
As far as I'm aware, JavaScript offers no native implementation to convert strings as desired, so the easiest method I know of is using a replace
function toLiteral(str) {
var dict = {'\b': 'b', '\t': 't', '\n': 'n', '\v': 'v', '\f': 'f', '\r': 'r'};
return str.replace(/([\\'"\b\t\n\v\f\r])/g, function ($0, $1) {
return '\\' + (dict[$1] || $1);
});
}
toLiteral('foo\\bar'); // "foo\\bar"
If you generate JS with PHP code you should escape the quotes in your JSON string:
//PHP code generating js code
echo "var myJSONString = \"". str_replace("\"","\\\"",$mySqlJSON)."\";";