Mixing european numbers and units with persian text - javascript

I am working on language mutation of frontend of our app for Iranian market, but currently we got stuck with issue with mixing europian numerals and units with persian texts. For example our desired format for one text is:
Result: <value>s (e.g. Result: 50s)
But when I try to compose this string in javascript, number 50 is before text (after in persian language) like this
50 :persiantext s
Is there any solution how to mix these things together, or it doesn't even make any sense to mix it and it all should be in persian?
Thank you for all help/suggestions.

Use placeholders in your text and then replace it with the number. If you have
Result: {$d}s
for english, and
{$d}s :نتیجه
for persian, then you can replace {$d} with 50 and get the correct text in both cases.
There are a few libraries, which could help replacing variables like Underscore or Lodash (though they use a slightly different syntax for the variables), see template function.

When concatenating mixed RTL/LTR text you should use corresponding Right-to-left and Left-to-right marks (as shown in this Java question - String concatenation containing Arabic and Western characters).
myArabicString + "\u202A" + englishNumberAndText + "\u202C" + moreArabic
Alternatively most RTL languages have native numbers that flow RTL too. To use that approach you will need to write own code to replace each individual digits - similar to code in convert numbers from arabic to english.
Mixing in Latin punctuation like ":" and "!" need to be done carefully - you may need to wrap it to RTL/LTR marks - but make sure to review results with people who actually know how text should look like.
Side note: you may want to check out JavaScript equivalent to printf/string.format if you need a lot of formatting.

Related

Matching all expressions using JS regex

I need to match all the expression (example: Laugh at Loud (LoL)) with 2 or more than 3 words. My regex works only for text with 3 character long expression. How do I make the regex very generic (without specifying the length as 3) so that expression are selected even if they are of any length.
The link shared provides an overview of it.
The last expression
light amplification by stimulated emission of radiation (LASER)
Green Skill Development Programme (GSDP) are not selected using the below regex
\b(\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* (\w)[\w']*[^a-zA-Z()]* \(\1\2\3\)
\b(?:\w[\w']* [^a-zA-Z]*){3} ?\([A-Z]{3}\)
https://regex101.com/r/QPMo5M/1
You can try the following:
/\b(\w)[-'\w]* (?:[-'\w]* ){1,}\(\1[A-Z]{1,}\)/gi
UPDATE
As #ikegami commented, this sloppy regex matches also things like Bring some drinks (beer) and Bring something to put on the grill (BBQ). I think these cases can be filtered by using proper JavaScript code after doing the regex matching. Maybe in case of Bring some drinks (beer), we can detect it by using the fact that (beer) has no uppercase letters. In case of Bring something to put on the grill (BBQ), we can detect it by using the fact that there's no matching initial letters for the second B and Q in Bring something to put on the grill.
UPDATE 2
When we match the following string by using the regex above:
We need to use technologies from Natural Language Processing (NLP).
It matches "need to use technologies from Natural Language Processing (NLP)", not "Natural Language Processing (NLP)". These problems should be tackled also.
UPDATE 3
The following regex matches acronyms whose length is from 2 to 5 and it doesn't have the issues mentioned above. And I think it can be quite easily extended to support longer length as you want:
/\b(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* (?:(?:by |of )?(\w)\S* )?)?)?) *\(\1\2\3\4\5\)/gi
\b(\w)[-'\w]* (?:[-`."?,~=#!/\\|+:;%°*#£&^€$¢¥§'\w]* ){2,}\(\1[A-Z]{2,}\)
I placed some special characters in between

RegEx that works in Javascript won't do so in PHP

I will try to make my question short yet understandable, I have a simple RegEx I use in javascript to check for characters that aren't alphanumeric (AKA Symbols). It would be "/[$-/:-?{-~!"^_`[]]/"
In javascript, doing
if(/[$-/:-?{-~!"^_`\[\]]/.test( string ))
just works, if any of those characters are in the string, it will give true, else, it will give false. I tried to do the same in PHP, the following way
if(preg_match('/[$-/:-?{-~!"^_`\[\]]/', $string ))
other regexes work when done this way, but this particular one simply will give false no matter what when ran in PHP.
Is there any reason to this? Am I doing something wrong? Does PHP comprehend regexes in a different way? What should I change to make it work?
Thanks for your time.
Since php uses PCRE, you will get a pattern error using delimiter / as seen here http://regex101.com/r/3ILGgE/1
So, it should be escaped correctly.
Using / as the delimiter, the string is
'/[$-\/:-?{-~!"^_`\[\]]/'
Using ~ as the delimiter, the string is
'~[$-/:-?{-\~!"^_`\[\]]~'
Also, be aware you have a couple of range's in the class $-/ and :-? and {-~
that will include the characters between the from/to range characters as well
and does not include the range character - itself as it is an operator.

JS RegEx challenge: replacing recurring patterns of commas and numbers

I am fiddling with a program to convert Japanese addresses into romaji (latin alphabet) for use in an emergency broadcast system for foreigners living in a Japanese city.
Emergency evacuation warnings are sent out to lists of areas all at once. I would like to be able to copy/paste this Japanese list of areas and spit out the romanized equivalent.
example Japanese input:
3条4~12丁目、15~18条12丁目、2、3条5丁目
(this list is of three areas, where 条(jo) and 丁目(chome) indicate block numbers in north-south and east-west directions, respectively)
The numbers are fine as they are, and I have already written code to replace the characters 条 and 丁目 with their romanized equivalents. My program currently outputs the first two areas (correctly) as "3-jo 4~12-chome" and "15~18-jo 12-chome"
However, I would like to replace patterns like that in the last area "2、5条6丁目" (meaning blocks 2 and 5 of 6-chome) such that the output reads "2&5-jo 6-chome"
The regular expression that denotes this pattern is \d*、\d* (note the Japanese format comma)
I am still getting used to regex - how can I replace the comma found in all \d*、\d* patterns with an "&"? Note that I can't simply replace all commas because they are also used to separate areas.
The easiest way is to isolate sequences like 15、18 and replace all commas in them.
text = "3条4~12丁目、15~18条12丁目、2、3条5丁目";
text.
replace(/(?:\d+、)+\d+/g, function(match) {
return match.replace(/、/g, "&");
}).
replace(/条/g, '-jō ').
replace(/丁目/g, '-chōme').
replace(/~/g, '-').
replace(/、/g, ', ')
// => "3-jō 4-12-chōme, 15-18-jō 12-chōme, 2&3-jō 5-chōme"
(Also... Where the heck do you live that has 丁 well-ordered by cardinal directions? Where I live, addresses are a mess... :P )
(Also also, thanks to sainaen for nitpicking my regexps into perfection :) )

Remove space between percentage symbol and number in Kendo

In Kendo I use kendo.toString(value, "p0") to format a string to include a percentage symbol.
kendo.toString(12, "p0") renders as 12 %. Is there a way to avoid the space between the number and the percent sign? I would like to render it as 12% instead.
I can of course take care of it manually, but I was wondering if there is a built in way to prevent manual formatting here.
You can use something like this.
kendo.format("{0:######.#####%}", 22.33)
More info about the format method can be found here.
The Kendo formats are stored as definitions in the "cultures" object. The default culture is "en-US" (US English), and you can replace the percentage format used throughout by doing this at document ready time:
kendo.cultures["en-US"].numberFormat.percent.pattern = ["-n%", "n%"];
I was puzzled about that odd space too, it looks particularly disconcerting in chart axis labels.
You can use built in javascript regular expression.
var yourstring = "12 %";
yourstring.replace(/\s+/g,''); // replaces all spaces using regex
\s+ means spaces, including multiple spaces in a row
g means as many times as possible in the string
'' is what character you want to replace the space with. In this case it's nothing ''
kendo.toString(kendo.format('{0:P1}', percentage)).replace(' ','')

JavaScript automatically converts some special characters

I need to extract a HTML-Substring with JS which is position dependent. I store special characters HTML-encoded.
For example:
HTML
<div id="test"><p>lösen & grüßen</p></div>​
Text
lösen & grüßen
My problem lies in the JS-part, for example when I try to extract the fragment
lö, which has the HTML-dependent starting position of 3 and the end position of 9 inside the <div> block. JS seems to convert some special characters internally so that the count from 3 to 9 is wrongly interpreted as "lösen " and not "lö". Other special characters like the & are not affected by this.
So my question is, if someone knows why JS is behaving in that way? Characters like ä or ö are being converted while characters like & or are plain. Is there any possibility to avoid this conversion?
I've set up a fiddle to demonstrate this: JSFiddle
Thanks for any help!
EDIT:
Maybe I've explained it a bit confusing, sorry for that. What I want is the HTML:
<p>lösen & grüßen</p> .
Every special character should be unconverted, except the HTML-Tags. Like in the HTML above.
But JS converts the ö or ü into ö or ü automatically, what I need to avoid.
That's because the browser (and not JavaScript) turns entities that don't need to be escaped in HTML into their respective Unicode characters (e.g. it skips &, < and >).
So by the time you inspect .innerHTML, it no longer contains exactly what was in the original page source; you could reverse this process, but it involves the full map of character <-> entity pairs which is just not practical.
If i understand you correctly, then try use innerHTML or .html('your html code') for jQuery on the target element

Categories

Resources