What is the best way to make a Romaji to Japanese keyboard?

What is the best way to make a Romaji to Japanese keyboard? - javascript

I am sorry if the below info seems irrelevant or lengthy.
TL;DR I am wondering what would be the best way to code an English
input to Japanese output keyboard.
So I recently started learning Japanese, and with that I started using websites and apps like WaniKani and Anki to help me remember Kanji (Borrowed Chinese Characters). WaniKani has a simple and really good interface for answering questions, while Anki is a flash Card system that lets you make and manage cards of whatever you want to memorize.
So I thought of an idea that would get me best of both the systems, and now I am making a website that takes data from my Anki deck and uses that to quiz me on my website with a WaniKani like interface.
So for those who don't know, Each Japanese Characters can be converted to English counterpart based on their phonetics,
For eg, つ --> tsu
This is called Romaji Translation. Google Translate and WaniKani (and a lot of other websites) use this to convert Standard English Keyboard Input to Japanese.
So, back to my question, I was wondering what would be the best way to convert English input
like, tsu to つ. At first I thought of going with regex and came up with this.
/((?:[aeiou])|(?:[kstpgzdnhpmr][aeiou])|(?:shi|chi|ji|tsu|fu|ya|yu|yo|wa|wo))/gm
The above regex just looks for patterns of Either only vowel or consonant + vowel or some special characters.
This one isn't complete yet, since it can't detect double consonants
kko --> っこ
Which is different from ko --> こ
Another thing it can't do is find patterns like
cho or chyo --> ちょ
This is a combination character. Which complicates things a lot more.
If there's another better way of making it happen, please point it out or if I can somehow improve on the regex.
Also, there is one more thing,
Characters have, lets say, kind of an accent, called the Dakuten that changes the sound of the character, for example
　は　Which is usually pronounced as Ha can be turned into ば which sounds like Ba, Notice those two little lines, most characters can have those.
Another form of it is the Handakuten, は Ha --> ぱ Pa.
So the above combination and double consonants apply to these accented characters too.
So I came up with a solution to convert the normal character to the accented one, by assigning a key to shuffle through the different forms of that word.
An example,
const keyboard = $(".keyboard");
keyboard.on("keydown", function(event) {
if (event.key == " ") {
// Disables the space key.
event.preventDefault();
var string = $(this).val();
// Position of the text cursor.
var caretPos = $(this)[0].selectionStart;
// If there is no character before the caret.
if (caretPos == 0) return false;
// String Operation
const invert = (char) => (/[a-z]/.test(char) ? char.toUpperCase() : char.toLowerCase());
var newString = string.slice(0, caretPos - 1) + invert(string[caretPos - 1]) + string.slice(caretPos, string.length);
$(this).val(newString);
// Sets the cursor after the changed character (its older position after being reset by the val()).
$(this)[0].setSelectionRange(caretPos, caretPos);
}
});
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
}
input {
text-align: center;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css" rel="stylesheet" />
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
</head>
<body>
<div class="container">
<div class="row">
<form class="col s12">
<div class="row">
<div class="input-field col s12">
<input id="first_name" type="text" class="validate keyboard" autocomplete="off" autocapitalize="off" />
<label for="first_name">Kanji</label>
</div>
Set the Caret after any charcter in the text box and press the Space Key.
</div>
</form>
</div>
</div>
</body>
</html>
All it does is find the character before the caret and change it to Uppercase or Lowercase.
Instead of changing cases, it will traverse an array of the different forms the character that is before the caret.
This project just uses JQuery 3.6
and Materialize for CSS.
Thanks for Reading through all that. And I appreciate any and all replies.
Please tell me if I missed something.

Found exactly what I needed.
wanakana.js
It is sponsored by WaniKani creators.
wanakana also comes as a npm package
npm i wanakana

Related

Include space from input

My user wants an input field where he can enter a message, which I will then have displayed as a scrolling marquee across the page. He wants to be able to include empty spaces, for example,
"Employees please look up."
My textarea, when I get the text, doesn't notice the space. Any way to get that?
I know this is an odd request - google only tells me how to remove whitespace, not include it.
var text = $('textarea').val();
<textarea class='messageInput'></textarea>

I am pretty sure that you are getting the value with all the white-spaces right, the problem is in the displaying of the value in your (probably) custom div. The best way would be to set white-space: pre on that element which (as the option suggests) will preserve white-space. ;) Example:
<div id="foo" style="white-space: pre"><!-- insert text here --></div>
And ignore all the &nbps; suggestions which are essentially modifying the text content. CSS is the right way to do this!

You need to convert space characters into non-breaking spaces, which can be done using String.prototype.replace() with a regular expression:
$('#text').html($('textarea').val().replace(/ /g, '&nbsp'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea class='messageInput'>employees hello</textarea>
<p id="text"></p>
You need / /g so that it will match all spaces, not just the first one, which .replace(' '...) would do.
Note that smajl's suggestion is less invasive and probably better, but I'll leave this for posterity.

The spaces in the value are not ignored. It's that the browser by default "compresses" duplicate spaces. Use white-space:pre;
<input type="text" id="input">
<button id="button">
TEST
</button>
<div id="result"></div>
document.getElementById("button").onclick = function() {
var content = document.getElementById("input").value;
console.log(content);
document.getElementById("result").innerHTML = content;
}
#result {
white-space:pre;
}
<input type="text" id="input">
<button id="button">
TEST
</button>
<div id="result"></div>
EDIT: too late ;)

How can I see my code in my browser?

Hi genius programmers how can I see my code in my website I'm currently making a web page tutorial where in you can see codes but the code I inputted is always executing can anyone help me about this?
Thank's a bunch.
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
</body>
</html>

You'll need to convert all your < to < and all your > to > and wrap everything in <pre> and </pre> tags. Like so:
<pre>
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<body>
<h1>This is a Heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
</pre>

The old-school solution (deprecated since HTML3.2 and removed/discouraged in HTML5) that still works is to wrap everything inside an <xmp></xmp> (example) tag.
The idea is that it can hold any string except the string representing the start of the xmp-closing tag: </xmp
Besides it being deprecated since forever there are browser-inconsistencies and other nuisances (like copying text from an xmp in firefox removes line-endings, or how much spaces a tab-character represents and the like).
The correct way is to just escape the & and < character (in this order) !
No need to escape the >: that's just myth since we already took out the open-tag-character < (and I thoroughly checked current and historical specs regarding this)!!
Obviously escaping the & characters prior to escaping the < character is because < becomes < and then becomes &38;#60; (displayed as < instead of the intended <) if we did not start by escaping the & characters first!
You can then wrap the result inside any tag, like <pre>, <code>, or ...
That leaves you with new-lines:
You either replace them with <br> or set the appropriate css-styling rules to control white-spacing (if they were not already set as default like for a <pre>-tag).
Live example:
<textarea style="width:99%;height:150px;"></textarea>
<button onclick="
var txt=document.getElementsByTagName('textarea')[0];
txt.value=txt.value.replace(/\u0026/g, '\u0026#38;').replace(/\u003c/g, '\u0026#60;');
">HTML-Escape (&) and (<)</button>
Fine print:
Maximum cross-browser (read ancient) compatibility is obtained by using decimal escapes since hexadecimal escapes were later added and implemented. Named entity escapes (like &amp) had similar problems (also in embedded javascript) because of the serious status these characters have/had in pre-parsing the HTML!
Theoretically there is a bigger rule-set regarding when & must absolutely be replaced and when it's not required (yes theoretically you don't need to escape all & characters, mainly just the ones that would render a valid escape and this is an ever-growing list since the living standard).
Thus the simplest and fastest way is to just replace them all! No need for special algorithms utilizing lookup-lists etc..

You can try this.
function show_html(){
var val = document.getElementsByTagName('textarea')[0].value
val = val.replace(/</g, "<").replace(/>/g, ">");
val = '<pre>'+val+'</pre>';
document.getElementById('result').innerHTML = val
}
<textarea id="html" onchange="javascript:show_html()"></textarea>
<p id="result">sdf</p>

The textContent and the innerText can be used to convert HTML code to plain string. You can use something like the following:
<textarea style="width:99%;height:150px;"></textarea>
<button onclick="
var htmlText= document.getElementsByTagName('textarea')[0].value;
var result = document.getElementById('result');
result.innerText = htmlText;"
>Convert HTML to plain text</button>
<div id="result" style="width:99%;height:150px;"></div>

Hit f12 for the developer tools or press Ctrl+U for the source code

Why do Arabic characters behave as separate characters when styling single Arabic character?

Basically what I am trying to accomplish is Arabic characters misuse highlighter !
To make it easy for understand I will try to explain a similar functionality but for English.
Imagine a string with wrong capitalization, and it is required to rewrite it correctly, so the user rewrites the string in an input box and submits, the js checks to see if any char wasn't corrected then it displays the whole string with those letter corrected and highlighted in red;
i.e. [test ] becomes [Test ]
To do so, I was checking those chars, and if faulty char was detected it get surrounded with span to be colored in red.
So far so good,
now when I try to replicate this for Arabic language the faulty char gets separated from the word making it unreadable.
Demo: jsfiddle
function check1() {
englishanswer.innerHTML = englishWord.value.replace(/t/, '<span style="color:red">T</span>');
}
function check2() {
arabicanswer.innerHTML =
arabicWord.value.replace(/\u0647/, '<span style="color:red">' +
unescape("%u0629") + '</span>') +
'<br>' + arabicWord.value.replace(/\u0647/, unescape('%u0629'));
}
fieldset {
border: 2px groove threedface;
border-image: initial;
width: 75%;
}
input {
padding: 5px;
margin: 5px;
font-size: 1.25em;
}
p {
padding: 5px;
font-size: 2em;
}
<fieldset>
<legend>English:</legend>
<input id='englishWord' value='test' />
<input type='submit' value='Check' onclick='check1()' />
<p id='englishanswer'></p>
</fieldset>
<fieldset style="direction:rtl">
<legend>عربي</legend>
<input id='arabicWord' value='بطله' />
<input type='submit' value='Check' onclick='check2()' />
<p id='arabicanswer'></p>
</fieldset>
Notice when testing the Arabic word, the spanned char [first preview] is separated from the rest of the word, while the non-spanned char [second preview] appears normally.
Edit: Preview for the problem [Chrome UA]

This is a longstanding bug in WebKit browsers (Chrome, Safari): HTML markup breaks joining behavior. Explicit use of ZWJ (zero-width joiner) used to help (see question Partially colored Arabic word in HTML), but it seems that the bug has become worse.
As a clumsy (but probably the only) workaround, you could use contextual forms for Arabic letters. This can be tested first using just static HTML markup and CSS, e.g.
بطﻠ<span style="color:red">ﺔ</span>
Here I am using, inside the span element, ﺔ U+FE94 ARABIC LETTER TEH MARBUTA FINAL FORM instead of the normal U+0629 ARABIC LETTER TEH MARBUTA and ﻠ U+FEE0 ARABIC LETTER LAM MEDIAL FORM instead of U+0644 ARABIC LETTER LAM.
To implement this in JavaScript, you would need, when inserting markup into a word Arabic letters, change characters before and after the break (caused by markup) to initial, medial, or final representation form according to its position in the word.

i know that this solution i'm giving you is not very elegant but it kinda works so tell me what you think:
<script>
function check1(){
englishanswer.innerHTML = englishWord.value.replace(/t/,'<span style="color:red">T</span>');
}
function check2(){
arabicanswer.innerHTML =
arabicWord.value.replace(/\u0647/,'<span style="color:red">'+
unescape("%u0640%u0629")+'</span>')+
'<br>'+arabicWord.value.replace(/\u0647/,unescape('%u0629'));
}
</script>
<fieldset>
<legend>English:</legend>
<input id='englishWord' value='test'/>
<input type='submit' value='Check' onclick='check1()'/>
<p id='englishanswer'></p>
</fieldset>
<fieldset style="direction:rtl">
<legend>عربي</legend>
<input id='arabicWord' value='بطلـه'/>
<input type='submit' value='Check' onclick='check2()'/>
<p id='arabicanswer'></p>
</fieldset>

You should take care of Beginning , Middle, End and Isolated characters. The complete list is available here
Use ufe94 instead of u0629
arabicWord.value.replace(/\u0647/,'<span style="color:red">'+ unescape("%ufe94")+'</span>')+

As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).
A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner (‍ or ‍) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.
<p>عرب‍<span style="color: Red;">‍ي</span></p>
demo: jsfiddle
see also the webkit bug report.

instead of using span, use HTML5 ruby element and add the Arabic-tatweel character "ـ" (U+0640), you know the character that extends letters (shift+j).
so your code becomes:
arabicanswer.innerHTML =
(arabicWord.value).replace(/\u0647/,'ـ<ruby style="color:red"> ـ'+
unescape("%u0629")+'</ruby>')+
'<br>'+arabicWord.value.replace(/\u0647/,unescape('%u0629'));
}
and here is an updated fiddle: http://jsfiddle.net/fjz5C/28/

I would try adding a ligature/taweel to the character before and after. It won't actually fix the problem, but it will make it difficult to notice, since it will force the lam into medial form and the taa marbuta into final form. If it works, that would be a lot less brittle than actually converting the letters to their medial or final forms.
You seem to have other problems, though. I went to your website and put in a misspelling of hadha , just to see what it would do with it, and it caused the ha to disconnect in both words, which doesn't make sense if the only problem is the formatting tags. (I'm using Firefox on a Mac.)
Good luck!

Restrict characters entered into a text_field

I have a text field where users enter a URL string, which cannot contain spaces or non-alphanumeric characters (if that's an accurate way of putting it).
Is there a way in Rails to restrict entry into the text_field itself so that spaces and characters like /":}{#$^# can be avoided?
Thanks a lot.
To clarify, the only characters that should be possible are letters and numbers.

The problem here is that URL strings can have slashes (/) and hash marks (#). So your regex is going to be quite complex to ensure the right portion of the field is filtered properly. But for plain character filtering, you can use simple regex to remove any non alpha-numeric characters.
Not sure about anything ruby-specific, but in straight javascript:
<html>
<body>
<form>
<input type="text" name="whatever" id="form-field" value="" />
</form>
</body>
<script type="text/javascript">
var oFormField = document.getElementById('form-field');
oFormField.onkeyup = function() {
oFormField.value = oFormField.value.replace(/[^a-zA-Z0-9]/, '');
}
</script>
</html>

You may use jQuery Tools Validator that uses HTML 5 tags to validate your forms.
This is a great way to validate your forms in an Unobscursive way without putting JS all over your forms :-).
Look at the "pattern" HTML 5 tag that allows you to validate a field against a Regexp.
http://flowplayer.org/tools/validator/index.html

Validating HTML text box with javascript and regex

The below code works to only allow alphanumerics and spaces. However, I would like to also allow an accented character (Ã). How should the regex be modified?
Thanks
<html>
<head>
<script type='text/javascript' src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
<script type="text/JavaScript">
$(function() {
$("#sub").bind("click",
function() {
$('#addText').val($('#addText').val().replace(new RegExp("[^a-zA-Z0-9 ]","g"), ''));
});
});
</script>
</head><body>
<div>Enter Text:</div>
<input id="addText" type=text/>
<input id="sub" type="button" value="Submit" />
</body></html>

If you only care about Latin-1 (Western European) letters, this should work:
[A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\xff]
For other scripts (eg: Greek, Cyrillic, Thai letters, CJK characters, etc.) things get much more complicated, and it becomes safer to just forbid things like control characters, rather than trying to keep track of which characters are "letters".

If you just want to add this one character .. just add it in the regex
[^a-zA-Z0-9Ã ]
Keep in mind though that there might be complications as mentioned below so do follow the suggestion by Laurence Gonsalves
http://unicode.org/reports/tr18/
http://www.regular-expressions.info/unicode.html

Develop Reference

JavaScript is the programming language of the Web.

What is the best way to make a Romaji to Japanese keyboard? - javascript

Found exactly what I needed. wanakana.js It is sponsored by WaniKani creators. wanakana also comes as a npm package npm i wanakana

Related

Include space from input

How can I see my code in my browser?

Why do Arabic characters behave as separate characters when styling single Arabic character?

Restrict characters entered into a text_field

Validating HTML text box with javascript and regex

Categories

Resources