How to get the range of characters

How to get the range of characters - javascript

I am about to import SVG - Fonts with Javascript to be able to animate even single letters. Thereby i am storing the glyphs in an Javascript Object, where the Unicode-value of the glyph defines the key and the glyph itself becomes the value. So when i create text from a given string i use the each character from the string to get the suitable glyph for it.
And here comes my question:
The unicode-value of the glyphs unicode attribute (specified here) can be an XML character reference in hexadecimal (unicode="ffl") or decimal (unicode="ffl") notation. For the hkern- and vkern-elements (representing the kerning table) the characters given as Unicode Range (specified here and here). Is there any Library which could do the conversion from all these possible variations? Does anybody know a resource where i can find further information which could help me solving this problem?
The overall problem is to convert all possible variations of Input into a consistent list of unicode values i can use as Key for the glyph map.

Perhaps you could create a hidden element with automatic width (css width:auto), place your string into it, and then measure (perhaps with Mootools) the width of that element?

Related

Javascript wrongly decodes unicode

I'm trying to get an id of a font-awesome icon. It is located in ::before style. When i use
window.getComputedStyle(document.querySelector("[id='5']"), '::before').getPropertyValue('content')
to get it, instead of "\f458", "\"\"" is returned.
I assume that JavaScript is trying to convert the code into a char but fails. Is there any way to prevent this?

The decoding works, the problem is the font. If there's no match for this character in a font you use, it'll be mangled or in the form of the unicode value in a box.
Since it's in the private area, depending on the font it might be resolved into a glyph or be mangled or be just empty/space.
Checking in the Font Awesome Cheatsheet it looks like a an icon for quidditch.
Perhaps there's a text to image/svg map somewhere on the internet (and if not, then just copy-paste localy and create one) which you might use if the font itself isn't good or you are decoding in a problematic environment (can't install fonts, etc).

JS: Text to Image

I'm currently making a toy project that converts a string of text into a series of GitHub commits. The end result should look something like this:
The solution I am currently working on is to take a string of text and convert each character to a 7x7 array of boolean values, where true is a green dot, and false is not. Then I'll iterate over that to come up with an array of commits to send to GitHub.
For the first part, I've been searching for an npm package that takes text, like ascii characters and returns a bmp or similar image representing the text, but I haven't had any luck.
Does anybody know of a library that will do something like that? My main requirement is that I can set the 'resolution' of the output, so I can get a 7x7 image from it. Alternatively, if there is an entirely different solution, I'd be happy to hear that too.
Thanks

Since you are using node.js, you could use opencv4node.js. It has features to write text to an image, that are normally used to annotate images, but should work well for your case.
The image could be whatever grid size you prefer. Then you could iterate over the pixels and use their coordinates to determine the commits.

Can a font omit the space character?

I'm working on a font detection library that needs to be very, very small, suitable for including inline on every page of a website. I've already gotten it pretty small (417 bytes gzipped). You can check it out at the Github repo.
Upon further thinking, I can reduce the size of the library significantly again (around another 10%) if I can make a few basic assumptions:
It is impossible to define a working font file (for browsers, anyway) without a space character/definition.
OR, if such a space-omitting font could be created, that all modern browsers and IE9+ will choose the fallback font's space character when a space character is needed. That is, there is otherwise nothing special about the space character in terms of font-fallback.
And, tangentially, all modern browsers and IE9+ will have a different width for the space character in serif and monospace generic fonts provided by the browser (probable, but I will need to test further).
I just attempted to explicitly define a font without a space character using FontSquirrel's font generator. This was done by explicitly omitting the space from the selected Subset and disabling the 'Fix Missing Glyphs' for space. FontSquirrel still generated a font with a space character with a width differing from both serif and monospace.
I understand that some languages do not have a space character in the traditional sense, but due to the nature of font file formats and definitions, I do not think that fonts tailored to such a language could or would omit a space character.
If these assumptions all hold, the library could remove the need to support custom text checking and reduce the number of tests from 3 to 2, also speeding up the library and reducing its memory usage. The new size would be around 380 bytes or less when gzipped.
So how about it, font experts? Is it possible to define a valid font without a space character definition? If there is such a font, can you provide an example?

It appears that while it is possible, it is virtually unheard-of to create a font without a space character. Even icon font services appear to include it. Other than a special case for a font detection library, there are no known fonts that omit the space character.
This research led to an improvement to my onfontready library. Details here. Thanks to Mike 'Pomax' Kamermans for his contributions.

Is there a size limit for data- attributes? [duplicate]

How long is too long for an attribute value in HTML?
I'm using HTML5 style data attributes (data-foo="bar") in a new application, and in one place it would be really handy to store a fair whack of data (upwards of 100 characters). While I suspect that this amount is fine, it raises the question of how much is too much?

HTML5 has no limits on the length of attribute values.
As the spec says, "This version of HTML thus returns to a non-SGML basis."
Later on, when describing how to parse HTML5, the following passage appears (emphasis added):
The algorithm described below places
no limit on the depth of the DOM tree
generated, or on the length of tag
names, attribute names, attribute
values, text nodes, etc. While
implementors are encouraged to avoid
arbitrary limits, it is recognized
that practical concerns will likely
force user agents to impose nesting
depth constraints.
Therefore, (theoretically) there is no limit to the length/size of HTML5 attributes.
See revision history for original answer covering HTML4.

I've just written a test (Note! see update below) which puts a string of length 10 million into an attribute and then retrieves it again, and it works fine (Firefox 3.5.2 & Internet Explorer 7)
50 million makes the browser hang with the "This script is taking a long time to complete" message.
Update: I've fixed the script: it previously set the innerHTML to a long string and now it's setting a data attribute. https://output.jsbin.com/wikulamuni It works for me with a length of 100 million. YMMV.
el.setAttribute('data-test', <<a really long string>>)

I really don't think there is any limit. I know now you can do
<a onclick=" //...insert 100KB of javascript code here">
and it works fine. Albeit a little unreadable.

From HTML5 syntax doc
9.1.2.3 Attributes
Attributes for an element are
expressed inside the element's start
tag.
Attributes have a name and a value.
Attribute names must consist of one or
more characters other than the space
characters, U+0000 NULL, U+0022
QUOTATION MARK ("), U+0027 APOSTROPHE
('), U+003E GREATER-THAN SIGN (>),
U+002F SOLIDUS (/), and U+003D EQUALS
SIGN (=) characters, the control
characters, and any characters that
are not defined by Unicode. In the
HTML syntax, attribute names may be
written with any mix of lower- and
uppercase letters that are an ASCII
case-insensitive match for the
attribute's name.
Attribute values are a mixture of text
and character references, except with
the additional restriction that the
text cannot contain an ambiguous
ampersand.
Attributes can be specified in four
different ways:
Empty attribute syntax
Unquoted attribute value syntax
Single-quoted attribute value syntax
Double-quoted attribute value syntax
Here there hasn't mentioned a limit on the size of the attribute value. So I think there should be none.
You can also validate your document against the
HTML5 Validator(Highly Experimental)

I've never heard of any limit on the length of attributes.
In the HTML 4.01 specifications, in the section on Attributes there is nothing that mention any limitation on this.
Same in the HTML 4.01 DTD -- in fact, as far as I know, DTD don't allow you to specify a length to attributes.
If there is nothing about that in HTML 4, I don't imagine anything like that would appear for HTML 5 -- and I actually don't see any length limitation in the 9.1.2.3 Attributes section for HTML 5 either.

Tested recently in Edge (Version 81.0.416.58 (64 bits)), and data-attributes seem to have a limit of 64k.

From http://dev.w3.org/html5/spec/Overview.html#embedding-custom-non-visible-data:
Every HTML element may have any number of custom data attributes specified, with any value.
That which is used to parse/process these data-* attribute values will have
limitations.
Turns out the data-attributes and values are placed in a DOMStringMap object.
This has no inherent limits.
From http://dev.w3.org/html5/spec/Overview.html#domstringmap:
Note: The DOMStringMap interface definition here is only intended for JavaScript
environments. Other language bindings will need to define how DOMStringMap is to be
implemented for those languages
DOMStringMap is an interface with a getter, setter, greator and deleter.
The setter has two parameters of type DOMString, name and value.
The value is of type DOMString that is is mapped directly to a JavaScript String.
From https://bytes.com/topic/javascript/answers/92088-max-allowed-length-javascript-string:
The maximum length of a JavaScript String is implementation specific.

The SGML Defines attributes with a limit set of 65k characters, seen here:
http://www.highdots.com/forums/html/length-html-attribute-175546.html
Although for what you are doing, you should be fine.
As for the upper limits, I have seen jQuery use data attributes hold a few k of data personally as well.

Chinese characters encoding

I'm working on a multi-language website. I have a problem with the color of the Chinese characters. My text color is #333333 but the Chinese characters appear darker than the occidental chars. My content comes from a database.
I thought to do it with Javascript / jQuery. The script detects the Unicode from the paragraph with the .fromCharCode() function. But what I read was that function expects an integer and the Unicode for Chinese chars are not integers. And that should be the reason my function is not working.
EDIT
Here's an image from what I got:
My function to check for the Unicode:
if($('#container p').fromCharCode(4E00)){
alert('Chinese');
}
Any help?

The screenshot suggests that different characters have been taken from different fonts. This often happens when the primary font does not contain all the relevant characters. So the odds are that you are trying to solve the wrong problem. Perhaps you should just consider making a font suggestion that is suitable for all the characters that will appear in the content.
The code snippet is in error in several ways. For example, 4E00 should be 0x4E00. And even that way, you would check for a single character only.
You need to post the full code, or a URL, or both, to get more constructive help.

Your problem is that you are displaying Simplified Chinese in a font that was designed for Traditional Chinese. So when the display engine hits a character that's Simplified (and thus not in the Traditional font), it takes the default simplified font and uses that instead. Then it reverses back to the Traditional font. Hence the unseemly look.
You need to look into what would be the most common Simplified Chinese font (or font family) and use that specifically for Simplified Chinese texts. Something like Heiti TC and Heiti SC.

Develop Reference

JavaScript is the programming language of the Web.