How long is too long for an attribute value in HTML?
I'm using HTML5 style data attributes (data-foo="bar") in a new application, and in one place it would be really handy to store a fair whack of data (upwards of 100 characters). While I suspect that this amount is fine, it raises the question of how much is too much?
HTML5 has no limits on the length of attribute values.
As the spec says, "This version of HTML thus returns to a non-SGML basis."
Later on, when describing how to parse HTML5, the following passage appears (emphasis added):
The algorithm described below places
no limit on the depth of the DOM tree
generated, or on the length of tag
names, attribute names, attribute
values, text nodes, etc. While
implementors are encouraged to avoid
arbitrary limits, it is recognized
that practical concerns will likely
force user agents to impose nesting
depth constraints.
Therefore, (theoretically) there is no limit to the length/size of HTML5 attributes.
See revision history for original answer covering HTML4.
I've just written a test (Note! see update below) which puts a string of length 10 million into an attribute and then retrieves it again, and it works fine (Firefox 3.5.2 & Internet Explorer 7)
50 million makes the browser hang with the "This script is taking a long time to complete" message.
Update: I've fixed the script: it previously set the innerHTML to a long string and now it's setting a data attribute. https://output.jsbin.com/wikulamuni It works for me with a length of 100 million. YMMV.
el.setAttribute('data-test', <<a really long string>>)
I really don't think there is any limit. I know now you can do
<a onclick=" //...insert 100KB of javascript code here">
and it works fine. Albeit a little unreadable.
From HTML5 syntax doc
9.1.2.3 Attributes
Attributes for an element are
expressed inside the element's start
tag.
Attributes have a name and a value.
Attribute names must consist of one or
more characters other than the space
characters, U+0000 NULL, U+0022
QUOTATION MARK ("), U+0027 APOSTROPHE
('), U+003E GREATER-THAN SIGN (>),
U+002F SOLIDUS (/), and U+003D EQUALS
SIGN (=) characters, the control
characters, and any characters that
are not defined by Unicode. In the
HTML syntax, attribute names may be
written with any mix of lower- and
uppercase letters that are an ASCII
case-insensitive match for the
attribute's name.
Attribute values are a mixture of text
and character references, except with
the additional restriction that the
text cannot contain an ambiguous
ampersand.
Attributes can be specified in four
different ways:
Empty attribute syntax
Unquoted attribute value syntax
Single-quoted attribute value syntax
Double-quoted attribute value syntax
Here there hasn't mentioned a limit on the size of the attribute value. So I think there should be none.
You can also validate your document against the
HTML5 Validator(Highly Experimental)
I've never heard of any limit on the length of attributes.
In the HTML 4.01 specifications, in the section on Attributes there is nothing that mention any limitation on this.
Same in the HTML 4.01 DTD -- in fact, as far as I know, DTD don't allow you to specify a length to attributes.
If there is nothing about that in HTML 4, I don't imagine anything like that would appear for HTML 5 -- and I actually don't see any length limitation in the 9.1.2.3 Attributes section for HTML 5 either.
Tested recently in Edge (Version 81.0.416.58 (64 bits)), and data-attributes seem to have a limit of 64k.
From http://dev.w3.org/html5/spec/Overview.html#embedding-custom-non-visible-data:
Every HTML element may have any number of custom data attributes specified, with any value.
That which is used to parse/process these data-* attribute values will have
limitations.
Turns out the data-attributes and values are placed in a DOMStringMap object.
This has no inherent limits.
From http://dev.w3.org/html5/spec/Overview.html#domstringmap:
Note: The DOMStringMap interface definition here is only intended for JavaScript
environments. Other language bindings will need to define how DOMStringMap is to be
implemented for those languages
DOMStringMap is an interface with a getter, setter, greator and deleter.
The setter has two parameters of type DOMString, name and value.
The value is of type DOMString that is is mapped directly to a JavaScript String.
From https://bytes.com/topic/javascript/answers/92088-max-allowed-length-javascript-string:
The maximum length of a JavaScript String is implementation specific.
The SGML Defines attributes with a limit set of 65k characters, seen here:
http://www.highdots.com/forums/html/length-html-attribute-175546.html
Although for what you are doing, you should be fine.
As for the upper limits, I have seen jQuery use data attributes hold a few k of data personally as well.
Related
I want to display an ansi document (Unix terminal transcripts with escape codes to apply colors) in CodeMirror, but there is no mode for that.
I could try to make my own mode, and although it seems quite complicated, I haven't found any mode to use as a template that actually removes certain characters from the original content.
I need to remove the escape characters and use them to mark the start or end of a color.
Is this at all possible? Or does CodeMirror by design lack the possibility to do this (remove characters in a mode)?
Any other ideas?
This is possible, but awkward. A mode won't do this (as you noticed, it doesn't remove characters -- you could style them display: none, but that'll cause other problems.
An addon that listens for "change" events and uses markText 1 to style and hide pieces of the document could work. But making it clever (only updating the parts that change, rather than re-coloring the whole document on every change) requires some complexity.
The string returned by .toString() on a range created by document.createRange(...) will contain things like the inner part of script and style tags. (At least using current version of Chrome.)
Is there a way to get just the visible text?
I found a solution that seems reasonable and at least tentative standard compliant. (My guess, without checking, is that the standards perhaps does not handle all details in a case like this yet, but that the current implementation in Chrome seems useful and might become standard.)
The solution is simply to first create a document fragment from the range:
var fragment = r.cloneContents();
Then just walk the fragment the way you would walk a sub tree in the DOM. Do not enter nodes like "SCRIPT" and "STYLE". Collect the "#text" nodes.
I'm working on a multi-language website. I have a problem with the color of the Chinese characters. My text color is #333333 but the Chinese characters appear darker than the occidental chars. My content comes from a database.
I thought to do it with Javascript / jQuery. The script detects the Unicode from the paragraph with the .fromCharCode() function. But what I read was that function expects an integer and the Unicode for Chinese chars are not integers. And that should be the reason my function is not working.
EDIT
Here's an image from what I got:
My function to check for the Unicode:
if($('#container p').fromCharCode(4E00)){
alert('Chinese');
}
Any help?
The screenshot suggests that different characters have been taken from different fonts. This often happens when the primary font does not contain all the relevant characters. So the odds are that you are trying to solve the wrong problem. Perhaps you should just consider making a font suggestion that is suitable for all the characters that will appear in the content.
The code snippet is in error in several ways. For example, 4E00 should be 0x4E00. And even that way, you would check for a single character only.
You need to post the full code, or a URL, or both, to get more constructive help.
Your problem is that you are displaying Simplified Chinese in a font that was designed for Traditional Chinese. So when the display engine hits a character that's Simplified (and thus not in the Traditional font), it takes the default simplified font and uses that instead. Then it reverses back to the Traditional font. Hence the unseemly look.
You need to look into what would be the most common Simplified Chinese font (or font family) and use that specifically for Simplified Chinese texts. Something like Heiti TC and Heiti SC.
I am about to import SVG - Fonts with Javascript to be able to animate even single letters. Thereby i am storing the glyphs in an Javascript Object, where the Unicode-value of the glyph defines the key and the glyph itself becomes the value. So when i create text from a given string i use the each character from the string to get the suitable glyph for it.
And here comes my question:
The unicode-value of the glyphs unicode attribute (specified here) can be an XML character reference in hexadecimal (unicode="ffl") or decimal (unicode="ffl") notation. For the hkern- and vkern-elements (representing the kerning table) the characters given as Unicode Range (specified here and here). Is there any Library which could do the conversion from all these possible variations? Does anybody know a resource where i can find further information which could help me solving this problem?
The overall problem is to convert all possible variations of Input into a consistent list of unicode values i can use as Key for the glyph map.
Perhaps you could create a hidden element with automatic width (css width:auto), place your string into it, and then measure (perhaps with Mootools) the width of that element?
I am in the need of defining html classes from content, so pretty much every char could be used. According to html reference I may use cdata, so I should not run into problems. I figure though, that css and/or javascript/jquery won't play nicely with that.
Anyone has experience with what chars can be used without problems or if there is a function/plugin/.. that tidies the class names, so that they are usable?
css classnames must be the usual identifiers (http://www.w3.org/TR/CSS21/syndata.html#characters)
In CSS 2.1, identifiers (including
element names, classes, and IDs in
selectors) can contain only the
characters [A-Za-z0-9] and ISO 10646
characters U+00A1 and higher, plus the
hyphen (-) and the underscore (_);
they cannot start with a digit.
Javascript doesn't mind, since you will use classnames as Strings. So you can use any character as far as javascript is concerned.
If you want to strip your classnames down to the usable css subset, a simple regexp should be enough. If you want to encode your classnames into the same subset, it will be a little tougher, but I suppose you can try to Base64-encode them. Here are some jQuery plugins for base64 encoding/decoding.
As far as the class attribute is concerned you will run into problems using chars other than [a-z], [A-Z] and [_-].
For arbitrary data I would recommend the (upcoming with HTML5) data-x attribute.
See http://ejohn.org/blog/html-5-data-attributes/ for an example.
Cheers