Asian-language characters being messed up through transfer

Asian-language characters being messed up through transfer - javascript

OK, I have a web app that uses PHP, MySQL and JavaScript. In an input box, you type something and if the user types in words using Korean/Chinese/Japanese then it will be messed up.
It appears like this: ãƒ˜ãƒ“ãƒ¼ãƒãƒ¼ãƒ†ãƒ¼ã‚·ãƒ§ãƒ³.
It uses a AJAX call and passes through JavaScript wrapped around in encodeURIComponent(), so maybe that's it? I don't know. In the MySQL database it shows messed up, too!
My charset encoding on my webpage is iso-8859-1. Help?

My charset encoding on my webpage is iso-8859-1
That won't work. You need to upgrade to UTF-8 for non-European languages.

Related

Newline characters disappear after uploading a txt to a server

Can't parse any data from a txt file (not a csv for a reason) when it's uploaded to a server because all the newline characters a apparently gone. d3.js parser that I'm using parseRows does not work properly without them.
On a localserver everything seems to be fine.
d3.text('fileName.txt', 'text/plain', function(fileContent) {
console.log(/\n/.test(fileContent));
});
[localserver]: true
[onlineserver]: false
Using free hosting on Hostinger, Apache server according to Wappalyzer. Don't know much about it.
Tried different encodings. No luck.
Update:
I downloaded the txt back from the server and opened it in Sublime Text. No newline characters in it. The exact local copy is fine.
Solved by avoiding: Decided to save some time and nerve and uploaded my txts to Dropbox. In case someone has same problems, here is a little trick to get direct links to Dropbox files http://techapple.net/2014/04/trick-obtain-direct-download-links-dropbox-files-dropbox-direct-link-maker-tool-cloudlinker/
Also solved by berserking: Changing the extension of the file (to csv for example) also helps, lol

Your server is probably trying to sanitize the strings it receives from the UI in order to prevent things like cross-site attacks.
Trying to escape the string you send to the server with encodeUri(str) and if you need to decodeUri(decodedStr)

Submitting utf-8 to iso-8859-1 from server a to server b issue

I don't really know how else to ask this, here is the scenario
On server A, it is in utf-8, I have a search form on server A which submits to server b which is in iso-8859-1. So when I submit a french word like "Carrières" by the time it gets to server B it comes out as "CarriÃ¨res".
I cannot change the encoding on server A or B, so is there a way to submit it from A and still get the right query on B? I can only edit server A files.
I tried converting it to html entities before submit but that comes out as is. I tried to add enctype to the form element (enctype="application/x-www-form-urlencoded; charset=ISO-8859-1") but it does not seem to work. I also tried the accept charset which seems to work on everything except IE (accept-charset="ISO-8859-1").
Is there anything else I can try other than those? Maybe something for IE only so i can use the accept charset method?
I can use jquery 1.4.2

Wrote a little bit of server side script using java to solve this, did not find any JS solutions

Why does this character keep popping up in my db: Â when saving other special characters?

It seems when I add any special characters (like © and ®) to my database, it saves it including this character: Â, so it saves it like this Â© It's popping up more and more now.
I can either think the database is doing this when I save the data (but it seems to be just currently happening), maybe it's because of the browser? Since I don't HTMLEncode the data when saving it to the database, but maybe not because I haven't been able to reproduce the issue myself personally, so wondering if someone else has had this same issue?
Do you think it's the web browser doing this? Maybe it's happening due to submitting the form via jQuery? What could be the culprit causing this?

That's caused by mis-matched encodings, typically one part of the application using iso-8859-1 and another part using utf-8
Check your database and individual table definitions, your web page content encoding, and your form encoding. Make sure they are consistently using the same encoding. We use utf-8 for everything, end-to-end.
We're using MySQL, not SQL Server, so I don't know particulars. For MySQL, our DDL files defining tables generally end with something like
SOMEDATA VARCHAR(100),
INDEX USER_ID
)
engine = InnoDB
default character set utf8
collate = utf8_general_ci
and in a common header that's included in all / generated pages
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
In the <form> tag you can set the accept-charset explicitly
<form accept-charset="utf-8" (etc)>
See this not-entirely-relevant question here on SO

Delphi indy10 http server and ExtJS form submit

I have a problem I don't know how to solve.
I have an Indy10 HTTP server. I have used both Indy9 and Indy10 HTTP servers in many applications and never had any problems. But now I am using Indy10 HTTP server with ExtJS javascript RAI framework.
The problem is when I submit data that contains non-ansi characters. For instance when I submit letter "č" which is a letter in 1250 codepage (slovenian, croatian...) I get the following in Indy under "unparsed params" -> "%C4%8D". This is correct hexadecimal representation of the "č" letter in utf-8 encoding. All my pages are utf-8 and I never had any problems submiting form data to Indy. I debugged the code and saw that I actually get a sequence of bytes like this: [37, 67, 52, 37, 56, 68]. This is the byte representation of the string "%C4%8D". But of course Indy cannot encode this correctly to UTF-16. So as an example. The actual form field:
FirstName=črt
comes out like this when submited:
FirstName=%C4%8Drt
I don't know how to solve this. I looked at ExtJS forums, but there is nothing on this topic. Anybody know anything about this kind of problem?
EDIT:
If I encode params ad JSON they arrive correctly. I also tried to URL decode the params, but the result is not correct. Maybe I missed something. I will look at this again. And yes it seems that ExtJS URL encodes the params
EDIT2:
Ok, I have discovered more. I compared the actual content of the post data. It is like this:
Delphi 2006 (Indy10): FirstName=%C4%8D
Delphi 2010 (Indy10): FirstName=%C4%8D
In both case the unparsed params are identical. I have ParseParams turned on and in BDS2006
they are correctly parsed, but under 2010 they are not. This is Indy10 bulked with delphi. Is there a bug in this version or am I doing something wrong?
EDIT3:
I downloaded the latest nightly build od Indy10. Still the same issue.
EDIT4:
I am forced to accept my own answer.

To answer on this topic.
This is definitely not working as it should under unicode. Indy uses unicode strings internally. The problem is when parameters are decoded to TStringList. The problem is the line:
Params.Add(TIdURI.URLDecode(s));
found in the "TIdHTTPRequestInfo.DecodeAndSetParams". It does not decode params correctly, probably because it is working over unicode strings.
The workaround I found is to use "HTTPDecode" from "HTTPApp.pas".
Params := TStringList.Create;
try
Params.StrictDelimiter := True;
Params.Delimiter := '&';
// parse the parameters and store them into temporary string list
Params.DelimitedText := UTF8ToString(HTTPDecode(UTF8String(Request.UnparsedParams)));
// do something with params...
finally
Params.Free;
end;
But I cannot believe that such a common task is not working correctly. Can someone confirm this is really a bug or am I just doing something wrong?

It appears the string is URL encoded, so you use the following code to decode:
uses
idURI;
value := TIdURI.URLDecode( value );
edit
It appears there is a case where the decoder does not properly decode the double bytes as a single character. Looking at the source, it does appear that it would decode properly if the character is coded like %UC48D but in my testing this still does not decode properly. What is interesting is that the TidURI.ParamsEncode function generates the proper encoding, but this encoding is not reversible using the proper routines in the latest version of Indy 10.

I´m using Delphi 7 and migrate to Indy 10. I found likely problem with portuguese characters and solve this changing the source below:
procedure TIdHTTPRequestInfo.DecodeAndSetParams(const AValue: String);
...
//Params.Add(TIdURI.URLDecode(s)); //-- UTF8 supose
Params.Add(TIdURI.URLDecode(s,TIdTextEncoding.Default)); //-- ASCII worked
...
end;

Javascript convert data from utf-8 to iso-8859-1

I work on a website which is all done in iso-8859-1 encoding using old ASP 3.0. I use Yahoo YQL to request data (XML) from external websites but which I request to be returned as JSON-P (JSON with a callback function so I can retrieve the data).
The problem I am facing is that YQL seems to always return data encoded in utf-8, which is bad for me when I try to display any textual data retrieved from that query. Characters like é, à, ô, get gibberished in IE6 & IE7 since the encoding does not match.
Anyone knows how to convert utf-8 data retrieved via JSON-P with YQL to iso-8859-1 and be displayed correctly ?
I already tried that solution, but it does not work. Server side functions are not an option too, ASP 3.0 does not include function such as utf8_decode.
Thank you

I have no idea whether this will work, but here's something you can try if you want.
A <script> tag can have a charset attribute specified when referencing a remote JS file. See the theory here. This definitely works for content that is stored inside the JavaScript file and e.g. output using document.write.
Whether the implicit conversion works for data fetched by a routine defined in that file through JSONP? I have no idea. My guess is, probably not. I can't test it right now but if you do, I'd be very interested in the outcome.

Develop Reference

JavaScript is the programming language of the Web.