I want to be clear I'm not looking for solutions. I'm really trying to understand what is being done. With that said all pointers and recommendations are welcomed. I am woking through freecodecamp.com task of Check for Palindromes. Below is the description.
Return true if the given string is a palindrome. Otherwise, return
false.
A palindrome is a word or sentence that's spelled the same way both
forward and backward, ignoring punctuation, case, and spacing.
Note You'll need to remove all non-alphanumeric characters
(punctuation, spaces and symbols) and turn everything lower case in
order to check for palindromes.
We'll pass strings with varying formats, such as "racecar", "RaceCar",
and "race CAR" among others.
We'll also pass strings with special symbols, such as "2A3*3a2", "2A3
3a2", and "2_A3*3#A2".
This is what I have for code right now again I'm working through this and using chrome dev tools to figure out what works and what doesn't.
function palindrome(str) {
// Good luck!
str = str.toLowerCase();
//str = str.replace(/\D\S/i);
str = str.replace(/\D\s/g, "");
for (var i = str.length -1; i >= 0; i--)
str += str[i];
}
palindrome("eye");
What I do not understand is when the below code is run in dev tools the "e" is missing.
str = str.replace(/\D\s/g, "");
"raccar"
So my question is what part of the regex am I miss understanding? From my understand the regex should only be getting rid of spaces and integers.
/\D\s/g is replacing any character not a digit, followed by a space with "".
So, in race car, the Regex matches "e " and replaces it with "", making the string raccar
For digit, you need to use \d. I think using an OR would get you what you want. So, you may try something like /\d|\s/g to get a digit or a space.
Hope this helps in some way in your understanding!
Related
I was pointed out to this post, which does not seem to follow the criteria I have:
Replace a Regex capture group with uppercase in Javascript
I am trying to make a regex that will:
format a string by adding uppercase for the first letter of each word and lower case for the rest of the characters
ignore HTML markup
Accept swedish characters (åäöÅÄÖ)
Say I've got this string:
<b>app</b>le store östersund
Then I want it to be (changes marked by uppercase characters)
<b>App</b>le Store Östersund
I've been playing around with it and the closest I've got is the following:
(?!([^<])*?>)[åäöÅÄÖ]|\s\b\w
Resulted in
<b>app</b>le Store Östersund
Or this
/(?!([^<])*?>)[åäöÅÄÖ]|\S\b\w/g
Resulted in
<B>App</B>Le store Östersund
Here's a fiddle:
http://refiddle.com/refiddles/598aabef75622d4a531b0000
Any help or advice is much appreciated.
It is not possible to do this with regexp alone, since regexp doesn't understand HTML structure. [*] Instead, we need to process each text node, and carry through our logic for what is the beginning of the word in case a word continues across different text nodes. A character is at start of the word if it is preceded by a whitespace, or if it is at the start of the string and it is either the first text node, or the previous text node ended in whitespace.
function htmlToTitlecase(html, letters) {
let div = document.createElement('div');
let re = new RegExp("(^|\\s)([" + letters + "])", "gi");
div.innerHTML = html;
let treeWalker = document.createTreeWalker(div, NodeFilter.SHOW_TEXT);
let startOfWord = true;
while (treeWalker.nextNode()) {
let node = treeWalker.currentNode;
node.data = node.data.replace(re, function(match, space, letter) {
if (space || startOfWord) {
return space + letter.toUpperCase();
} else {
return match;
}
});
startOfWord = node.data.match(/\s$/);
}
return div.innerHTML;
}
console.log(htmlToTitlecase("<b>app</b>le store östersund", "a-zåäö"));
// <b>App</b>le Store Östersund
[*] Maybe possible, but even if so, it would be horribly ugly, since it would need to cover an awful amount of corner cases. Also might need a stronger RegExp engine than JavaScript's, like Ruby's or Perl's.
EDIT:
Even if just specifying really simple html tags? The only ones I am actually in need of covering is <b> and </b> at the moment.
This was not specified in the question. The solution is general enough to work for any markup (including simple tags). But...
function simpleHtmlToTitlecaseSwedish(html) {
return html.replace(/(^|\s)(<\/?b>|)([a-zåäö])/gi, function(match, space, tag, letter) {
return space + tag + letter.toUpperCase();
});
}
console.log(simpleHtmlToTitlecaseSwedish("<b>app</b>le store östersund", "a-zåäö"));
I have a solution which use almost only regex. It may be not the most intuitive way to do it, but it should be effective and I find it funny :)
You have to append at the end of your string every lowercase character followed by their uppercase counterpart, like this (it must also be preceded by a space for my regex) :
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZåÅäÄöÖ
(I don't know which letters are missing, I know nothing about swedish alphabet, sorry... I'm counting on you to correct that !)
Then you can use the following regex :
(?![^<]*>)(\s<[^/]*?>|\s|^)([\wåäö])(?=.*\2(.)\S*$)|[\wåÅäÄöÖ]+$
Replace by :
$1$3
Test it here
Here is a working javascript code :
// Initialization
var regex = /(?![^<]*>)(\s<[^/]*?>|\s|^)([\wåäö])(?=.*\2(.)\S*$)|[\wåÅäÄöÖ]+$/g;
var string = "test <b when=\"2>1\">ap<i>p</i></b>le store östersund";
// Processing
result = string + " aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZåÅäÄöÖ";
result = result.replace(regex, "$1$3");
// Display result
console.log(result);
Edit : I forgot to handle first word of the string, it's corrected :)
The use case is I want to compare a query string of characters to an array of words, and return the matches. A match is when a word contains all the characters in the query string, order doesn't matter, repeated characters are okay. Regex seems like it may be too powerful (a sledgehammer where only a hammer is needed). I've written a solution that compares the characters by looping through them and using indexOf, but it seems consistently slower. (http://jsperf.com/indexof-vs-regex-inside-a-loop/10) Is Regex the fastest option for this type of operation? Are there ways to make my alternate solution faster?
var query = "word",
words = ['word', 'wwoorrddss', 'words', 'argument', 'sass', 'sword', 'carp', 'drowns'],
reStoredMatches = [],
indexOfMatches = [];
function match(word, query) {
var len = word.length,
charMatches = [],
charMatch,
char;
while (len--) {
char = word[len];
charMatch = query.indexOf(char);
if (charMatch !== -1) {
charMatches.push(char);
}
}
return charMatches.length === query.length;
}
function linearIndexOf(words, query) {
var wordsLen = words.length,
wordMatch,
word;
while (wordsLen--) {
word = words[wordsLen];
wordMatch = match(word, query);
if (wordMatch) {
indexOfMatches.push(word);
}
}
}
function linearRegexStored(words, query) {
var wordsLen = words.length,
re = new RegExp('[' + query + ']', 'g'),
match,
word;
while (wordsLen--) {
word = words[wordsLen];
match = word.match(re);
if (match !== null) {
if (match.length >= query.length) {
reStoredMatches.push(word);
}
}
}
}
Note that your regex is wrong, that's most certainly why it goes so fast.
Right now, if your query is "word" (as in your example), the regex is going to be:
/[word]/g
This means look for one of the characters: 'w', 'o', 'r', or 'd'. If one matches, then match() returns true. Done. Definitively a lot faster than the most certainly more correct indexOf(). (i.e. in case of a simple match() call the 'g' flag is ignored since if any one thing matches, the function returns true.)
Also, you mention the idea/concept of any number of characters, I suppose as shown here:
'word', 'wwoorrddss'
The indexOf() will definitively not catch that properly if you really mean "any number" for each and every character. Because you should match an infinite number of cases. Something like this as a regex:
/w+o+r+d+s+/g
That you will certainly have a hard time to write the right code in plain JavaScript rather than use a regex. However, either way, that's going to be somewhat slow.
From the comment below, all the letters of the word are required, in order to do that, you have to have 3! tests (3 factorial) for a 3 letter word:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
Obviously, a factorial is going to very quickly grow your number of possibilities and blow away your memory in a super long regex (although you can simplify if a word has the same letter multiple times, you do not have to test that letter more than once).
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
...
That's probably why your properly written test in plain JavaScript is much slower.
Also, in your case you should write the words nearly as done in Scrabble dictionaries: all letters once in alphabetical order (Scrabble keeps duplicates). So the word "word" would be "dorw". And as you shown in your example, the word "wwoorrddss" would be "dorsw". You can have some backend tool to generate your table of words (so you still write them as "word" and "words", and your tool massage those and convert them to "dorw" and "dorsw".) Then you can sort the letters of the words you are testing in alphabetical order and the result is that you do not need a silly factorial for the regex, you can simply do this:
/d.*o.*r.*w/
And that will match any word that includes the word "word" such as "password".
One easy way to sort the letters will be to split your word in an array of letters, and then sort the array. You may still get duplicates, it will depend on the sort capabilities. (I don't think that the default JavaScript sort will remove duplicates automatically.)
One more detail, if you test is supposed to be case insensitive, then you want to transform your strings to lowercase before running the test. So something like:
query = query.toLowerCase();
early on in your top function.
You are trying to speed up the algorithm "chars in word are a subset of the chars of query." You can short circuit this check and avoid some assignments (that are more readable but not strictly needed). Try the following version of match
function match(word, query) {
var len = word.length;
while (len--) {
if (query.indexOf(word[len]) === -1) { // found a missing char
return false;
}
}
return true; // couldn't find any missing chars
}
This gives a 4-5X improvement
Depending on the application you could try presorting words and presorting each word in words as another optimization.
The regexp match algorithm constructs a finite state automaton and makes its decisions on the current state and character read from left to right. This involves reading each character once and make a decision.
For static strings (to look a fixed string on a couple of text) you have better algorithms, like Knuth-Morris that allow you to go faster than one character at a time, but you must understand that this algorithm is not for matching regular expressions, just plain strings.
if you are interested in Knuth-Morris (there are several other algorithms) just have a round in wikipedia. http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
A good thing you can do is to investigate if you regexp match routines do it with an DFA or a NDFA, as NDFAs occupy less memory and are easier to compute, but DFAs do it faster, but with some compilation penalties and more memory occupied.
Knuth-Morris algorithm also needs to compile the string into an automaton before working, so perhaps it doesn't apply to your problem if you are using it just to find one word in some string.
I'm working on a pretty crude sanitizer for string input in Node(express):
I have glanced at some plugins and library, but it seems most of them are either too complex or too heavy. Therefor i decided to write a couple of simple sanitizer-functions on my own.
One of them is this one, for hard-sanitizing most strings (not numbers...)
function toSafeString( str ){
str = str.replace(/[^a-öA-Ö0-9\s]+/g, '');
return str;
}
I'm from Sweden, therefore i Need the åäö letters. And i have noticed that this regex also accept others charachters aswell... for example á or é....
Question 1)
Is there some kind of list or similar where i can see WHICH charachters are actually accepted in, say this regex: /[^a-ö]+/g
Question 2)
Im working in Node and Express... I'm thinking this simple function is going to stop attacks trough input fields. Am I wrong?
Question 1: Find out. :)
var accepted = [];
for(var i = 0; i < 65535 /* the unicode BMP */; i++) {
var s = String.fromCharCode(i);
if(/[a-ö]+/g.test(s)) accepted.push(s);
}
console.log(s.join(""));
outputs
abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³
´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö
on my system.
Question 2: What attacks are you looking to stop? Either way, the answer is "No, probably not".
Instead of mangling user data (I'm sure your, say, French or Japanese customers will have some beef with your validation), make sure to sanitize your data whenever it's going into customer view or out thereof (HTML escaping, SQL parameter escaping, etc.).
[x-y] matches characters whose unicode numbers are between that of x and that of y:
charsBetween = function(a, b) {
var a = a.charCodeAt(0), b = b.charCodeAt(0), r = "";
while(a <= b)
r += String.fromCharCode(a++);
return r
}
charsBetween("a", "ö")
> "abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö"
See character tables for the reference.
For your validation, you probably want something like this instead:
[^a-zA-Z0-9ÅÄÖåäö\s]
This matches ranges of latin letters and digits + individual characters from a list.
There is a lot of characters that we actually have no idea about, like Japanese or Russian and many more.
So to take them in account we need to use Unicode ranges rather than ASCII ranges in regular expressions.
I came with this regular expression that covers almost all written letters of the whole Unicode table, plus a bit more, like numbers, and few other characters for punctuation (Chinese punctuation is already included in Unicode ranges).
It is hard to cover everything and probably this ranges might include too many characters including "exotic" ones (symbols):
/^[\u0040-\u1FE0\u2C00-\uFFC00-9 ',.?!]+$/i
So I was using it this way to test (have to be not empty):
function validString(str) {
return str && typeof(str) == 'string' && /^[\u0040-\u1FE0\u2C00-\uFFC00-9 ',.?!]+$/i.test(str);
}
Bear in mind that this is missing characters like:
:*()&#'\-:%
And many more others.
Greetings JavaScript and regular expression gurus,
I want to return all matches in an input string that are 6-digit hexadecimal numbers with any amount of white space in between. For example, "333333 e1e1e1 f4f435" should return an array:
array[0] = 333333
array[1] = e1e1e1
array[2] = f4f435
Here is what I have, but it isn't quite right-- I'm not clear how to get the optional white space in there, and I'm only getting one match.
colorValuesArray = colorValues.match(/[0-9A-Fa-f]{6}/);
Thanks for your help,
-NorthK
Use the g flag to match globally:
/[0-9A-Fa-f]{6}/g
Another good enhancement would be adding word boundaries:
/\b[0-9A-Fa-f]{6}\b/g
If you like you could also set the i flag for case insensitive matching:
/\b[0-9A-F]{6}\b/gi
Alternatively to the answer above, a more direct approach might be:
/\p{Hex_Digit}{6}/ug
You can read more about Unicode Properties here.
It depends on the situation, but I usually want to make sure my code can't silently accept (and ignore, or misinterpret) incorrect input. So I would normally do something like this.
var arr = s.split();
for (var i = 0; i < arr.length; i++) {
if (!arr[i].match(/^[0-9A-Fa-f]{6}$/)
throw new Error("unexpected junk in string: " + arr[i]);
arr[i] = parseInt(arr[i], 16);
}
try:
colorValues.match(/[0-9A-Fa-f]{6}/g);
Note the g flag to Globally match.
result = subject.match(/\b[0-9A-Fa-f]{6}\b/g);
gives you an array of all 6-digit hexadecimal numbers in the given string subject.
The \b word boundaries are necessary to avoid matching parts of longer hexadecimal numbers.
For people who are looking for hex color with alpha code, the following regex works:
/\b[0-9A-Fa-f]{6}[0-9A-Fa-f]{0,2}\b\g
The code allows both hex with or without the alpha code.
Friends,
I'm new to both Javascript and Regular Expressions and hope you can help!
Within a Javascript function I need to check to see if a comma(,) appears 1 or more times. If it does then there should be one or more numbers either side of it.
e.g.
1,000.00 is ok
1,000,00 is ok
,000.00 is not ok
1,,000.00 is not ok
If these conditions are met I want the comma to be removed so 1,000.00 becomes 1000.00
What I have tried so is:
var x = '1,000.00';
var regex = new RegExp("[0-9]+,[0-9]+", "g");
var y = x.replace(regex,"");
alert(y);
When run the alert shows ".00" Which is not what I was expecting or want!
Thanks in advance for any help provided.
strong text
Edit
strong text
Thanks all for the input so far and the 3 answers given. Unfortunately I don't think I explained my question well enough.
What I am trying to achieve is:
If there is a comma in the text and there are one or more numbers either side of it then remove the comma but leave the rest of the string as is.
If there is a comma in the text and there is not at least one number either side of it then do nothing.
So using my examples from above:
1,000.00 becomes 1000.00
1,000,00 becomes 100000
,000.00 is left as ,000.00
1,,000.00 is left as 1,,000.00
Apologies for the confusion!
Your regex isn't going to be very flexible with higher orders than 1000 and it has a problem with inputs which don't have the comma. More problematically you're also matching and replacing the part of the data you're interested in!
Better to have a regex which matches the forms which are a problem and remove them.
The following matches (in order) commas at the beginning of the input, at the end of the input, preceded by a number of non digits, or followed by a number of non digits.
var y = x.replace(/^,|,$|[^0-9]+,|,[^0-9]+/g,'');
As an aside, all of this is much easier if you happen to be able to do lookbehind but almost every JS implementation doesn't.
Edit based on question update:
Ok, I won't attempt to understand why your rules are as they are, but the regex gets simpler to solve it:
var y = x.replace(/(\d),(\d)/g, '$1$2');
I would use something like the following:
^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)$
[0-9]{1,3}: 1 to 3 digits
(,[0-9]{3})*: [Optional] More digit triplets seperated by a comma
(\.[0-9]+): [Optional] Dot + more digits
If this regex matches, you know that your number is valid. Just replace all commas with the empty string afterwards.
It seems to me you have three error conditions
",1000"
"1000,"
"1,,000"
If any one of these is true then you should reject the field, If they are all false then you can strip the commas in the normal way and move on. This can be a simple alternation:
^,|,,|,$
I would just remove anything except digits and the decimal separator ([^0-9.]) and send the output through parseFloat():
var y = parseFloat(x.replace(/[^0-9.]+/g, ""));
// invalid cases:
// - standalone comma at the beginning of the string
// - comma next to another comma
// - standalone comma at the end of the string
var i,
inputs = ['1,000.00', '1,000,00', ',000.00', '1,,000.00'],
invalid_cases = /(^,)|(,,)|(,$)/;
for (i = 0; i < inputs.length; i++) {
if (inputs[i].match(invalid_cases) === null) {
// wipe out everything but decimal and dot
inputs[i] = inputs[i].replace(/[^\d.]+/g, '');
}
}
console.log(inputs); // ["1000.00", "100000", ",000.00", "1,,000.00"]