String comparison with a collation in javascript - javascript

I use jquery.autocomplete, which uses a javascript regexp to highlight substrings in the list of suggestions that match the autocomplete key string. So if the use types "Beat" and one of the autocomplete suggestions the server returns is "The Beatles" then plugin displays that suggestion as "The Beatles".
I'm trying to think of ways to make this work with string matching that isn't sensitive to accents, diacriticals and the rest. So if the user typed "Huske" and the server suggested "Hüsker Dü" then this would be displayed as "Hüsker Dü".
The principle is the same as string comparison with specified collations such as in MySql or ICU, or with Oracle's sorts. In SphinxSearch a charset_table works for this. A collation such as utf8_general_ci would be ideal for my purposes.

The only thing I can think of is pretty brute-force. If any character in the input string is known to have one or more accented forms, replace it with a character class containing all of the forms when you create the regex. For example, for the input string Huske, the regex might be /H[uùúûü]sk[eèéêë]/.

Related

How do i allow only one (dash or dot or underscore) in a user form input using regular expression in javascript?

I'm trying to implement a username form validation in javascript where the username
can't start with numbers
can't have whitespaces
can't have any symbols but only One dot or One underscore or One dash
example of a valid username: the_user-one.123
example of invalid username: 1----- user
i've been trying to implement this for awhile but i couldn't figure out how to have only one of each allowed symbol:-
const usernameValidation = /(?=^[\w.-]+$)^\D/g
console.log(usernameValidation.test('1username')) //false
console.log(usernameValidation.test('username-One')) //true
How about using a negative lookahead at the start:
^(?!\d|.*?([_.-]).*\1)[\w.-]+$
This will check if the string
neither starts with digit
nor contains two [_.-] by use of capture and backreference
See this demo at regex101 (more explanation on the right side)
Preface: Due to my severe carelessness, I assumed the context was usage of the HTML pattern attribute instead of JavaScript input validation. I leave this answer here for posterity in case anyone really wants to do this with regex.
Although regex does have functionality to represent a pattern occuring consecutively within a certain number of times (via {<lower-bound>,<upper-bound>}), I'm not aware of regex having "elegant" functionality to enforce a set of patterns each occuring within a range of number of times but in any order and with other patterns possibly in between.
Some workarounds I can think of:
Make a regex that allows for one of each permutation of ordering of special characters (note: newlines added for readability):
^(?:
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*-?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*\.?)|\.)[A-Za-z0-9]*_?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*\.?[A-Za-z0-9]*_?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*-?)|-)[A-Za-z0-9]*_?[A-Za-z0-9]*\.?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*\.?[A-Za-z0-9]*-?)|
(?:(?:(?:[A-Za-z][A-Za-z0-9]*_?)|_)[A-Za-z0-9]*-?[A-Za-z0-9]*\.?)
)[A-Za-z0-9]*$
Note that the above regex can be simplified if you don't want usernames to start with special characters either.
Friendly reminder to also make sure you use the HTML attributes to enforce a minimum and maximum input character length where appropriate.
If you feel that regex isn't well suited to your use-case, know that you can do custom validation logic using javascript, which gives you much more control and can be much more readable compared to regex, but may require more lines of code to implement. Seeing the regex above, I would personally seriously consider the custom javascript route.
Note: I find https://regex101.com/ very helpful in learning, writing, and testing regex. Make sure to set the "flavour" to "JavaScript" in your case.
I have to admit that Bobble bubble's solution is the better fit. Here ia a comparison of the different cases:
console.log("Comparison between mine and Bobble Bubble's solution:\n\nusername mine,BobbleBubble");
["valid-usrId1","1nvalidUsrId","An0therVal1d-One","inva-lid.userId","anot-her.one","test.-case"].forEach(u=>console.log(u.padEnd(20," "),chck(u)));
function chck(s){
return [!!s.match(/^[a-zA-Z][a-zA-Z0-9._-]*$/) && ( s.match(/[._-]/g) || []).length<2, // mine
!!s.match(/^(?!\d|.*?([_.-]).*\1)[\w.-]+$/)].join(","); // Bobble bulle
}
The differences can be seen in the last three test cases.

JS: Check if word "handover" contains "hand"

I'm working on this simple, straightforward text content filtering mechanism on our post commenting module where people are prohibited from writing foul, expletive words.
So far I'm able to compare (word-by-word, using .include()) comment contents against the blacklisted words we have in the database. But to save space, time and effort in entering database entries for each word such as 'Fucking' and 'Fuck', I want to create a mechanism where we check if a word contains a blacklisted word.
This way, we just enter 'Fuck' in the database. And when visitor's comment contains 'Fucking' or 'Motherfucker', the function will automatically detect that there is a word in the comment that contain's 'fuck' in it and then perform necessary actions.
I've been thinking of integrating .substring() but I guess that's not what I need.
Btw, I'm using React (in case you know of any built-in functions). Much as possible, I wanna deviate from using libraries for this mechanism.
Thanks a heap!
"handover".indexOf("hand")
It will return index if it exists otherwise -1
To ignore cases you can define all your blacklisted words in lower case and then use this
"HANDOVER".toLowerCase().indexOf("hand")
To detect if a string has another string inside of it you can simply use the .includes method, it does not work on a word by word basis but checks for a sequence of characters so it should meet you requirements. It returns a boolean value for if the string is inside the other string
var sentence = 'Stackoverflow';
console.log(sentence.includes("flow"));
You were on the right track with .includes()
console.log('handover'.includes('hand'));
Returns true

Get a string between two strings in Javascript

I have the below string that I need help pulling an ID from in Presto. Presto uses the javascript regex. I've searched multiple options including:
JavaScript text between double quotes
Javascript regex to extract all characters between quotation marks following a specific word
I need to pull the GA Client ID which looks like this:
75714ae471df63202106404675dasd800097erer1849995367
Below is a snipped where it sits in the string.
The struggle is that the "s:38:" is not constant. The number can be anything. For example, it could be s:40: or s:1000: etc. I need it to return just the alphanumeric id.
String Snippet
"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";
Full string listed below
99524";s:9:"FirstName";s:2:"John";s:8:"LastName";s:8:"Doe";s:7:"Company";s:10:"Sample";s:5:"Email";s:20:"xxxxx#gmail.com";s:5:"Phone";s:10:"8888888888";s:7:"Country";s:13:"United States";s:5:"Title";s:8:"Creative";s:5:"State";s:2:"NC";s:13:"Last_Asset__c";s:40:"White Paper: Be a More Strategic Partner";s:16:"Last_Campaign__c";s:18:"70160000000q6TgAAI";s:16:"Referring_URL__c";s:8:"[direct]";s:19:"leadPriorityMarketo";s:2:"P2";s:18:"ProductInterest__c";s:9:"sample";s:14:"landingpageurl";s:359:"https://www.sample.com;mkt_tok=samplesamplesamplesample";s:14:"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";s:13:"Drupal_SID__c";s:36:"e1380c07-0258-47de-aaf8-82d4d8061e1a";s:4:"form";s:4:"1046";} ```
This works for your sample
"GA_ClientID__c";[^"]*"([^"]*)"
https://regex101.com/r/Q4Orj6/1

what kind of encoding is this?

I've got some data from dbpedia using jena and since jena's output is based on xml so there are some circumstances that xml characters need to be treated differently like following :
Guns n &#039; Roses
I just want to know what kind of econding is this?
I want decode/encode my input based on above encode(r) with the help of javascript and send it back to a servlet.
(edited post if you remove the space between & and amp you will get the correct character since in stackoverflow I couldn't find a way to do that I decided to put like that!)
Seems to be XML entity encoding, and a numeric character reference (decimal).
A numeric character reference refers to a character by its Universal
Character Set/Unicode code point, and uses the format
You can get some info here: List of XML and HTML character entity references on Wikipedia.
Your character is number 39, being the apostrophe: ', which can also be referenced with a character entity reference: &apos;.
To decode this using Javascript, you could use for example php.js, which has an html_entity_decode() function (note that it depends on get_html_translation_table()).
UPDATE: in reply to your edit: Basically that is the same, the only difference is that it was encoded twice (possibly by mistake). & is the ampersand: &.
This is an SGML/HTML/XML numeric character entity reference.
In this case for an apostrophe '.

Javascript - parse formatted text and extract values in order?

I have a field with wiki style rendering on it that I'd like to bust up in Javascript.
The text I'm trying to parse looks like this:
{color:#47B}_name1_{color}
{color:#555}description1{color}
---
{color:#47B}_name2_{color}
{color:#555}description2{color}
---
{color:#47B}_name3_{color}
{color:#555}description3{color}
---
etc
Where name1 and description1 belong together, name2 and description2 belong together, and so forth. The values for name and description are user supplied values, with description potentially spanning multiple lines.
My end goal is to be able to extract the values of each name and each description from the text (and be able to reliably associated name1 with description1, etc).
My question is: If I used a regex to match all the names into an array and all the descriptions into an array, can I be ensured that the items in the array are in the correct order? That is, will names[0] always be the first name in the parsed text (assuming I did a javascript regex match into the names array)? Also- is this bad practice/should I do this another way?
The regular expression I'm trying to use to match names is:
/^(\{color\:#47B\})(_)(\s*?)(.*?)(\s*?)(_)(\{color\})$/
And the regular expression I'm using to match descriptions is:
/(\{color\:#555\})(.*?)(\{color\})/
A regex search will always return matches in source order (i.e. in the order in which they occur in the source text.)
I assume you are asking this question because you're hoping to do two regex matches (one for name, one for description) and then get two result arrays, and guarantee that namesmatch[i] always goes with descriptionmatch[i]. However, this will only be true if your source text is always exactly perfect.
In this case it may be better or safer either to use a single regex that matches both at once, or split your source up by those -- delimiters and then match within each block. The reason why it may be safer is that your source text may contain errors, and at least in this case you can detect that and have as much good data as possible.
A note about your regexes. The . does not match newlines, so if the text between your {color} braces might have a newline you need to include newlines explicitly. [\s\S] and [^] are common idioms for this. Alternatively, if all . in a regex should match newlines, set the dotAll flag (s).

Categories

Resources