Urdu characters joining problem - javascript

I'm trying to write Urdu language characters in a div using javascript. The problem is that they dont change their shape when i write two characters that should have different shape when written together. For example ﺝ and ا when written together should look as جا. They dont merge with each other. Similar is the problem with other characters. Please help!

I got the answer. Actually I was copying other characters that visually looked the same but where not the one I needed. For example, one character from urdu and other from arabic will not join properly. So whenever copying characters, even if they look same, do consider that they may have different unicodes for different languages.

Related

How to translate Japanese Kanji to Katakana

The requirement is to:
As the user type his/her Japanese Kanji first and last names,
automatically fill in the corresponding Japanese Katakana first and
last names.
I have been searching for a while now, but I couldn't yet find anything. There seems to be several jQuery plugins that convert Hiragama to Katakana or Romaji or vice-versa but that is not what we need here.
There is one that claims to translate from Kanji to Kana but I don't think the code matches his description (it only executes the code if the input is Kana but that is supposed to be the output!).
Anyway, I need to translate a person's first/last names from Kanji to Kana.
How do I do this?
As this needs to happen while the user is filling the form, I am prefer a JavaScript solution (or any pointers to it) but if there are any pointers how to do this in .NET, I'll very much appreciated too.
It seems like there are several JavaScript solutions online to convert from Kanji, Romaji, Hiragana and Katakana. Check these out and see if they work for you:
JQuery Auto Kana Input
Kuroshiro
jp-conversion
WanaKanaJS

RegExp must have \w+ and \s+ characters

I've been trying to create a RegExp that makes sure a sure has entered at least one word and at least one space. I tried to use this:
/\w+\s+/
But that makes sure that there is a word AFTER a space. I just want to make sure there is both in a string. They don't need to be in the order of the above RegExp.
How can I make the RegExp work, but without matching the order?
/(?=.*?\w)(?=.*?\s)/
?= means "look-ahead", and .* means "any number of characters"
So "find any number of characters then a \w", "find any number of characters and a \s"
Another thing to note about how this works, look-aheads are "non-matching", making it so that this can match in any order.
You have two things:
Is there a word character?
Is there a space?
Two things.
str.match(/\w/)
str.match(/\s/)
So why are you trying to do them as one step?
if( str.match(/\w/) && str.match(/\s/))
There are a lot of answers to my question. However, I do not want to simply pick the one that is upvoted. Please give a detailed explanation of why your regex works, and maybe why mine doesn't.
My answer provides the simplest solution. It is very clear to anyone reading it that we are checking "if it has a word character, and if it contains a space character". It is also very easy to expand on, such as if you want to add another check.
zyklus' answer (/(?=.*?\w)(?=.*?\s)/) is the fastest when speed-tested on a 50Kb string of input. In more common cases (ie. 100 character at most), this speed difference will be practically non-existent. It is twice as fast as my answer, but "2 * very small number = very small number". It's easy enough to add new test cases (just add another (?=.*something) block) but is less humanly-obvious as to what it does.
Jacob's answer ((\w+.*\s+)|(\s+.*\w+)) does quite literally what you asked, checking first if there is a word character and then a space character, then checks the other way around before failing. It works, however it is slower. Furthermore, if you decide to add a new test case, you'd get something like (\w+.*\s+.*\d+)|(\w+.*\d+.*\s)|(\s+.*\w+.*\d+)|(\s+.*\d+.*\w+)|(\d+.*\w+.*\s+)|‌​(\d+.*\s+.*\w+). It only gets worse if you add a fourth test (24 arrangements to check) and is unreadably ugly. Do not use this answer.
Other answers are variants of existing ones.
If you need to do it in one RegEx for some reason:
(\w+.*\s+)|(\s+.*\w+)
Can be handy if you're working with a library that only enables you to use a single regular expression.

regex question mark in javascript

It is probably quite simple, but I do not how to do.
I have this regex :
new RegExp("^[A-Za-z\\u00C0-\\u017F][\\- ]?+$");
It validates a first name. The name have to begin with a letter (the range is in unicode and works fine) and then continue with letters or - or space. But it can be just letters, as in most names.
I have searched but I didn't find the right way to do it.
I don't want to duplicate the character range. It is just to have a code more "proper".
If you could help, it would be great :)
Thanks in advance
Right now you only allow one letter and then one or more dashes/spaces. You probably want
new RegExp("^[A-Za-z\\u00C0-\\u017F][A-Za-z\\u00C0-\\u017F -]*$");
But in general, trying to validate a name with regexes isn't such a good idea.

Chinese characters encoding

I'm working on a multi-language website. I have a problem with the color of the Chinese characters. My text color is #333333 but the Chinese characters appear darker than the occidental chars. My content comes from a database.
I thought to do it with Javascript / jQuery. The script detects the Unicode from the paragraph with the .fromCharCode() function. But what I read was that function expects an integer and the Unicode for Chinese chars are not integers. And that should be the reason my function is not working.
EDIT
Here's an image from what I got:
My function to check for the Unicode:
if($('#container p').fromCharCode(4E00)){
alert('Chinese');
}
Any help?
The screenshot suggests that different characters have been taken from different fonts. This often happens when the primary font does not contain all the relevant characters. So the odds are that you are trying to solve the wrong problem. Perhaps you should just consider making a font suggestion that is suitable for all the characters that will appear in the content.
The code snippet is in error in several ways. For example, 4E00 should be 0x4E00. And even that way, you would check for a single character only.
You need to post the full code, or a URL, or both, to get more constructive help.
Your problem is that you are displaying Simplified Chinese in a font that was designed for Traditional Chinese. So when the display engine hits a character that's Simplified (and thus not in the Traditional font), it takes the default simplified font and uses that instead. Then it reverses back to the Traditional font. Hence the unseemly look.
You need to look into what would be the most common Simplified Chinese font (or font family) and use that specifically for Simplified Chinese texts. Something like Heiti TC and Heiti SC.

Backreference each character

For the sake of simplicity & learning something new, please don't suggest using two separate replace functions. I know that's an option but I would rather also know how to do this (or if it's not possible).
'<test></test>'.replace(/<|>/g,'$&'.charCodeAt(0))
This is what I've got so far. This sample code is, as you can tell, for another piece of code to escape HTML entities while still using innerHTML (because I do intend to include a few HTML entities such as small images, so again please don't suggest textContent).
Since I'm trying to replace both < and >, the problem is converting each individual one to their respective character codes. Since regular expressions allow for this "OR" condition as well as backreferences to each one, I'm hoping there's a way to get the reference of each individual character as they're replaced. $& will return <><> (because they're replaced in that order), but I don't know how to get them as they're replaced and take their character codes for the HTML entities. The problem is, I don't know what to use in this case if anything.
If that explanation wasn't clear enough, I want it to be something like this (and this is obviously not going to work, it'll best convey what I mean):
Assuming x is the index of the character being replaced,
'<test></test>'.replace(/<|>/g,'$&'.charCodeAt(x))
Hopefully that makes more sense. So, is this actually possible in some way?
'<test></test>'.replace(/[<>]/g,function(a) {return '&#'+a.charCodeAt(0)+';';});
I've put the characters in a square-bracket-thing (don't know it's proper name). That way you can add whatever characters you want.
The above will return:
<test></test>

Categories

Resources