What is the best way to encode special characters in a url? Let's say I pass an variable from javascript to a php script that way:
http://example.com/my sp3c!al var!a$le
What is the best way to encode that special characters (like whitespace, !, $, /, \ etc.)? Is there a method in javascript to encode it with a corresponding function in php to decode it there?
You need to make use of encodeURIComponent(yourvar);
You can do it in three ways
encodeURI(yourvar)
output http://example.com/my%20sp3c!al%20var!a$le
amd
encodeURIComponent(yourvar)
output http%3A%2F%2Fexample.com%2Fmy%20sp3c!al%20var!a%24le
escape()
output http%3A//example.com/my%20sp3c%21al%20var%21a%24le
using escape is not recomended because it is deprecated since ECMAScript v3.
functions is for JavaScript escaping, not HTTP.
Related
I need to parse URLs from javascript source code using a different langauge(i.e. PHP and Java). The issue is that I don't know what kind of encoding is used on certain URLs(See below). Is there a standard(i.e. RFC, specifications) for this kind of encoding that I can use to implement it in the language I need? Initially I thought it simply backslashes the forward slashes but it seems is more than that as we have escaped characters such \x3d...
var _F_jsUrl = 'https:\/\/www.example.com\/accounts\/static\/_\/js\/k\x3dgaia.gaiafe_glif.en.nnMHsIffkD4.O\/m\x3dglifb,identifier,unknownerror\/am\x3dggIgAAAAAAEKBCEImA2CYiCxoQo\/rt\x3dj\/d\x3d1\/rs\x3dABkqax3Fc8CWFtgWOYXlvHJI_bE3oVSwgA';
What is the difference between JavaScript encoding and URL encoding. I am not able to figure out exact difference between them. Also that are following types of encoding are called
%HH
\u00HH
&#HH;
\\
\HH
Any more encoding schemes used in Web technologies?
%HH
Percent encoding as used in URIs.
\u00HH
Unicode escape sequence. For JavaScript specific things, read the spec on String literals.
&#HH;
HTML entity, or numeric character reference.
\\
Could be anything. Usually escaping the backslash when the backslash is used for escaping things.
\HH
Maybe you're referring to a named escape sequence here. See escape sequences in C-based languages
Is there a uniform method in both PHP and JS to convert unicode characters, so the result would be same on both ends after encoding the unicode string ?
Is there any encoding/ decoding technique on both ends that shares the same mechanism ?
I have been trying with bin2hex() and hex2bin() with PHP and respective manual functions for the same in JS, but it doesn't work for unicode characters ?
Yes, it's called utf-8. If you encode all documents as utf-8 and store all data as utf-8, you should be in the clear. The problem is that php, per default, expects strings to be latin1, so you need to change a few things (Like send a Content-Type header etc.)
I referenced to this question :
How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?
The following functions helped me:
fixed_string = decodeURIComponent(escape(utf_string));
utf_string = unescape(encodeURIComponent(original_string));
The escape and unescape functions used for encoding and decoding query strings are defined for ISO characters whereas the newer encodeURIComponent and decodeURIComponent which do the same thing, are defined for UTF-8 characters.
This is a hack but it works.
PHP
rawurlencode($theString)
JS
decodeURIComponent(theString)
I'm working on Jmeter, and I need to send an encoded parameter along with the Http request.
I know that I can do the encoding of special characters using javascript escape(). But I can't use javascript here, as I'm using Jmeter's Regular Expression Extractor. I need a regular expression pattern that does the same as escape(). Please do help me. Thanks in advance.
New upcoming version 2.10 of jmeter will have a new function that does it:
https://issues.apache.org/bugzilla/show_bug.cgi?id=54991
Regular expressions can't do this. escape() (which is deprecated anyway and has been superseded by encodeURI()) takes ASCII control characters and non-ASCII characters and encodes them using %xx or %uxxxx hexadecimal notation. Regular expressions can only work with existing text, not convert it.
why does encodeURI and encodeURIComponent encode spaces as hex values, but then I see other encodings using the plus sign? there's something i'm obviously missing.
thanks!
IIRC + is a form encoded , while %20 is a standard URI encoding.
They are interchangeable, so don't worry about which one you use.
"+" is allowed as a substitute for spaces, however there are lots of other special characters that need escaping as hex values (in the form %nn). Presumably the authors of encodeURI and encodeURIComponent decided to use hex values for everything including space, since it made their code simpler and they didn't think the extra two characters for each space in a uri was really that important.
Look here for a discussion of the differences between escape, encodeURI, and encodeURIComponent, with interactive examples for all three of them:
http://xkr.us/articles/javascript/encode-compare/
To summarize:
The escape() method does not encode
the + character which is interpreted
as a space on the server side as well
as generated by forms with spaces in
their fields. Due to this shortcoming
and the fact that this function fails
to handle non-ASCII characters
correctly, you should avoid use of
escape() whenever possible. The
best alternative is usually
encodeURIComponent().
escape() will not encode: #*/+
Use of the encodeURI() method is a
bit more specialized than escape()
in that it encodes for
URIs
as opposed to the querystring, which
is part of a URL. Use this method when
you need to encode a string to be used
for any resource that uses URIs and
needs certain characters to remain
un-encoded. Note that this method does
not encode the ' character, as it is a
valid character within URIs.
encodeURI() will not encode:
~!##$&*()=:/,;?+'
Lastly, the encodeURIComponent()
method should be used in most cases
when encoding a single component of a
URI. This method will encode certain
chars that would normally be
recognized as special chars for URIs
so that many components may be
included. Note that this method does
not encode the ' character, as it is a
valid character within URIs.
encodeURIComponent() will not
encode: ~!*()'