How to generate easy anagrams preserving punctuation? - javascript

I'm writing the hint system for a quiz. The hints are to be anagrams of the answers. To make the anagrams easier, I keep the first and last letters the same.
var _ = require('underscore');
var easy = function(s) {
if (s.length <= 1) {
return s;
}
return s[0] + _.shuffle(s.slice(1, -1)).join("") + s.slice(-1);
};
For multiple word answers, I want to anagram each word separately. I wrote:
var peasy = function(s) {
return s.split(/\W/).map(easy).join(" ");
}
However this loses any punctuation in the answer (replacing it with a space). I'd like to keep the punctuation in its original position. How can I do that?
Here are three examples to test on:
console.log(peasy("mashed potatoes"));
console.log(peasy("computer-aided design"));
console.log(peasy("sophie's choice"));
My function peasy above fails the second and third examples because it loses hyphen and apostrophe.

Splitting by word separator does the trick:
var peasy = function(s) {
return s.split(/\b/).map(easy).join("");
}
Explanation:
"computer-aided design".split(/\b/) results in ["computer", "-", "aided", " ", "design"]. Then you shuffle each element with easy and join them, getting something like "ctemopur-adeid diegsn" back...

It's more simple to use String.prototype.replace instead of splitting:
you don't need to slice after the first and before the last letter.
the replacement occurs only on word characters (other characters stay untouched)
words with less than 4 characters are skipped.
it's shorter and doesn't need the easy function.
var _ = require('underscore');
var peasy = function(s) {
return s.replace(/\B\w{2,}\B/g, function (m) {
return _.shuffle(m).join('');
});
}

Related

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");
You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}
You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");
([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);
var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

Find all matches in a concatenated string of same-length words?

I have a long Javascript string with letters like :
"aapaalaakaaiartaxealpyaaraa"
This string is actually a chained list of 3-letter-words : "aap","aal","aak","aai", "art", "axe","alp", "yaa" and "raa"
In reality I have many of these strings, with different word lengths, and they can be up to 2000 words long, so I need the fastest way to get all the words that start with a certain string. So when searching for all words that start with "aa" it should return :
"aap","aal","aak" and "aai"
Is there a way to do this with a regex ? It's very important that it only matches on each 3-letter word, so matches in between words should not be counted, so "aar" should not be returned, and also not "yaa" or "raa".
The simple way:
var results = [];
for (var i = 0; i < str.length; i += 3) {
if (str.substring(i, i + 2) === "aa") {
results.push(str.substring(i, i + 3));
}
}
Don’t ask whether it’s the fastest – just check whether it’s fast enough, first. :)
How about:
var str = 'aapaalaakaaiartaxealpyaaraa';
var pattern = /^aa/;
var result = str.match(/.{3}/g).filter(function(word) {
return pattern.test(word);
});
console.log(result); //=> ["aap","aal","aak","aai"]
"aapaalaakaaiartaxealpyaaraa".replace(/\w{3}|\w+/g,function(m){return m.match(/^aa/)?m+',':','}).split(',').filter(Boolean)

Why is my RegExp ignoring start and end of strings?

I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}

split string only on first instance of specified character

In my code I split a string based on _ and grab the second item in the array.
var element = $(this).attr('class');
var field = element.split('_')[1];
Takes good_luck and provides me with luck. Works great!
But, now I have a class that looks like good_luck_buddy. How do I get my javascript to ignore the second _ and give me luck_buddy?
I found this var field = element.split(new char [] {'_'}, 2); in a c# stackoverflow answer but it doesn't work. I tried it over at jsFiddle...
Use capturing parentheses:
'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element
They are defined as
If separator contains capturing parentheses, matched results are returned in the array.
So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).
In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.
We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.
What do you need regular expressions and arrays for?
myString = myString.substring(myString.indexOf('_')+1)
var myString= "hello_there_how_are_you"
myString = myString.substring(myString.indexOf('_')+1)
console.log(myString)
I avoid RegExp at all costs. Here is another thing you can do:
"good_luck_buddy".split('_').slice(1).join('_')
With help of destructuring assignment it can be more readable:
let [first, ...rest] = "good_luck_buddy".split('_')
rest = rest.join('_')
A simple ES6 way to get both the first key and remaining parts in a string would be:
const [key, ...rest] = "good_luck_buddy".split('_')
const value = rest.join('_')
console.log(key, value) // good, luck_buddy
Nowadays String.prototype.split does indeed allow you to limit the number of splits.
str.split([separator[, limit]])
...
limit Optional
A non-negative integer limiting the number of splits. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all.
The array may contain fewer entries than limit if the end of the string is reached before the limit is reached.
If limit is 0, no splitting is performed.
caveat
It might not work the way you expect. I was hoping it would just ignore the rest of the delimiters, but instead, when it reaches the limit, it splits the remaining string again, omitting the part after the split from the return results.
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C"]
I was hoping for:
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B_C_D_E"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C_D_E"]
This solution worked for me
var str = "good_luck_buddy";
var index = str.indexOf('_');
var arr = [str.slice(0, index), str.slice(index + 1)];
//arr[0] = "good"
//arr[1] = "luck_buddy"
OR
var str = "good_luck_buddy";
var index = str.indexOf('_');
var [first, second] = [str.slice(0, index), str.slice(index + 1)];
//first = "good"
//second = "luck_buddy"
You can use the regular expression like:
var arr = element.split(/_(.*)/)
You can use the second parameter which specifies the limit of the split.
i.e:
var field = element.split('_', 1)[1];
Replace the first instance with a unique placeholder then split from there.
"good_luck_buddy".replace(/\_/,'&').split('&')
["good","luck_buddy"]
This is more useful when both sides of the split are needed.
I need the two parts of string, so, regex lookbehind help me with this.
const full_name = 'Maria do Bairro';
const [first_name, last_name] = full_name.split(/(?<=^[^ ]+) /);
console.log(first_name);
console.log(last_name);
Non-regex solution
I ran some benchmarks, and this solution won hugely:1
str.slice(str.indexOf(delim) + delim.length)
// as function
function gobbleStart(str, delim) {
return str.slice(str.indexOf(delim) + delim.length);
}
// as polyfill
String.prototype.gobbleStart = function(delim) {
return this.slice(this.indexOf(delim) + delim.length);
};
Performance comparison with other solutions
The only close contender was the same line of code, except using substr instead of slice.
Other solutions I tried involving split or RegExps took a big performance hit and were about 2 orders of magnitude slower. Using join on the results of split, of course, adds an additional performance penalty.
Why are they slower? Any time a new object or array has to be created, JS has to request a chunk of memory from the OS. This process is very slow.
Here are some general guidelines, in case you are chasing benchmarks:
New dynamic memory allocations for objects {} or arrays [] (like the one that split creates) will cost a lot in performance.
RegExp searches are more complicated and therefore slower than string searches.
If you already have an array, destructuring arrays is about as fast as explicitly indexing them, and looks awesome.
Removing beyond the first instance
Here's a solution that will slice up to and including the nth instance. It's not quite as fast, but on the OP's question, gobble(element, '_', 1) is still >2x faster than a RegExp or split solution and can do more:
/*
`gobble`, given a positive, non-zero `limit`, deletes
characters from the beginning of `haystack` until `needle` has
been encountered and deleted `limit` times or no more instances
of `needle` exist; then it returns what remains. If `limit` is
zero or negative, delete from the beginning only until `-(limit)`
occurrences or less of `needle` remain.
*/
function gobble(haystack, needle, limit = 0) {
let remain = limit;
if (limit <= 0) { // set remain to count of delim - num to leave
let i = 0;
while (i < haystack.length) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain++;
i = found + needle.length;
}
}
let i = 0;
while (remain > 0) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain--;
i = found + needle.length;
}
return haystack.slice(i);
}
With the above definition, gobble('path/to/file.txt', '/') would give the name of the file, and gobble('prefix_category_item', '_', 1) would remove the prefix like the first solution in this answer.
Tests were run in Chrome 70.0.3538.110 on macOSX 10.14.
Use the string replace() method with a regex:
var result = "good_luck_buddy".replace(/.*?_/, "");
console.log(result);
This regex matches 0 or more characters before the first _, and the _ itself. The match is then replaced by an empty string.
Javascript's String.split unfortunately has no way of limiting the actual number of splits. It has a second argument that specifies how many of the actual split items are returned, which isn't useful in your case. The solution would be to split the string, shift the first item off, then rejoin the remaining items::
var element = $(this).attr('class');
var parts = element.split('_');
parts.shift(); // removes the first item from the array
var field = parts.join('_');
Here's one RegExp that does the trick.
'good_luck_buddy' . split(/^.*?_/)[1]
First it forces the match to start from the
start with the '^'. Then it matches any number
of characters which are not '_', in other words
all characters before the first '_'.
The '?' means a minimal number of chars
that make the whole pattern match are
matched by the '.*?' because it is followed
by '_', which is then included in the match
as its last character.
Therefore this split() uses such a matching
part as its 'splitter' and removes it from
the results. So it removes everything
up till and including the first '_' and
gives you the rest as the 2nd element of
the result. The first element is "" representing
the part before the matched part. It is
"" because the match starts from the beginning.
There are other RegExps that work as
well like /_(.*)/ given by Chandu
in a previous answer.
The /^.*?_/ has the benefit that you
can understand what it does without
having to know about the special role
capturing groups play with replace().
if you are looking for a more modern way of doing this:
let raw = "good_luck_buddy"
raw.split("_")
.filter((part, index) => index !== 0)
.join("_")
Mark F's solution is awesome but it's not supported by old browsers. Kennebec's solution is awesome and supported by old browsers but doesn't support regex.
So, if you're looking for a solution that splits your string only once, that is supported by old browsers and supports regex, here's my solution:
String.prototype.splitOnce = function(regex)
{
var match = this.match(regex);
if(match)
{
var match_i = this.indexOf(match[0]);
return [this.substring(0, match_i),
this.substring(match_i + match[0].length)];
}
else
{ return [this, ""]; }
}
var str = "something/////another thing///again";
alert(str.splitOnce(/\/+/)[1]);
For beginner like me who are not used to Regular Expression, this workaround solution worked:
var field = "Good_Luck_Buddy";
var newString = field.slice( field.indexOf("_")+1 );
slice() method extracts a part of a string and returns a new string and indexOf() method returns the position of the first found occurrence of a specified value in a string.
This should be quite fast
function splitOnFirst (str, sep) {
const index = str.indexOf(sep);
return index < 0 ? [str] : [str.slice(0, index), str.slice(index + sep.length)];
}
console.log(splitOnFirst('good_luck', '_')[1])
console.log(splitOnFirst('good_luck_buddy', '_')[1])
This worked for me on Chrome + FF:
"foo=bar=beer".split(/^[^=]+=/)[1] // "bar=beer"
"foo==".split(/^[^=]+=/)[1] // "="
"foo=".split(/^[^=]+=/)[1] // ""
"foo".split(/^[^=]+=/)[1] // undefined
If you also need the key try this:
"foo=bar=beer".split(/^([^=]+)=/) // Array [ "", "foo", "bar=beer" ]
"foo==".split(/^([^=]+)=/) // [ "", "foo", "=" ]
"foo=".split(/^([^=]+)=/) // [ "", "foo", "" ]
"foo".split(/^([^=]+)=/) // [ "foo" ]
//[0] = ignored (holds the string when there's no =, empty otherwise)
//[1] = hold the key (if any)
//[2] = hold the value (if any)
a simple es6 one statement solution to get the first key and remaining parts
let raw = 'good_luck_buddy'
raw.split('_')
.reduce((p, c, i) => i === 0 ? [c] : [p[0], [...p.slice(1), c].join('_')], [])
You could also use non-greedy match, it's just a single, simple line:
a = "good_luck_buddy"
const [,g,b] = a.match(/(.*?)_(.*)/)
console.log(g,"and also",b)

How to evaluate "Who's on first?" as being equal to "whos on first." in JavaScript?

I need to evaluate two strings as being equal even if they have minor punctuation differences that would not make them different for the purposes of a Google search.
For example, these pairs would be considered equal (along with any other minor grammatical/spelling mistakes you can think might work in Google):
Who's on first?
whos on first.
Where's the beef/problem?
wheres the beef problem
Is there a library function in JavaScript that would do this?
This is actually not a simple task, to do it right you need to look up stemming.
This is a really naive way since it obviously doesn't handle a whole range of issues like misspellings:
var a = "some text totest....ok";
var b = "sometext totest ok";
function testRoughEquality(a, b) {
var ax = a.replace(/[^a-z]/gi, "");
var bx = b.replace(/[^a-z]/gi, "");
if(ax === bx)
{
alert('These strings were roughly the same: "' + a + '" and "' + b + '"');
}
return true;
};
The simplest answer is to remove characters that don't matter (the apostrophes and punctuation in your example), normalize other characters to word separators (the slash in your example), and downcase the lot.
var strs = ["Who's on first?","whos on first."];
for (var i=0,len=strs.length;i<len;++i){
strs[i] = strs[i].replace(/['?.]/g,'').replace(/[\/]/g,' ').toLowerCase();
}
console.log( strs[0] == strs[1] );
// true
"who's on First?".replace(/[\?' ]/g,'').toLowerCase()
Gets you closer, but it's not the best way to do it.
If it was only the punctuation and capitalisation issue (like the examples above), a simple solution would be to pass both through a regular expression to remove certain punctuation characters, then convert to lower case and compare.
Something like:
function stringCompare(str1, str2)
{
var test = /[\?\'\/]/g;
var s1 = str1.replace(test,"").toLowerCase();
var s2 = str2.replace(test,"").toLowerCase();
if(str1 === str2) { return true; }
return false;
}

Categories

Resources