JavaScript - Matching alphanumeric patterns with RegExp - javascript

I'm new to RegExp and to JS in general (Coming from Python), so this might be an easy question:
I'm trying to code an algebraic calculator in Javascript that receives an algebraic equation as a string, e.g.,
string = 'x^2 + 30x -12 = 4x^2 - 12x + 30';
The algorithm is already able to break the string in a single list, with all values on the right side multiplied by -1 so I can equate it all to 0, however, one of the steps to solve the equation involves creating a hashtable/dictionary, having the variable as key.
The string above results in a list eq:
eq = ['x^2', '+30x', '-12', '-4x^2', '+12x', '-30'];
I'm currently planning on iterating through this list, and using RegExp to identify both variables and the respective multiplier, so I can create a hashTable/Dictionary that will allow me to simplify the equation, such as this one:
hashTable = {
'x^2': [1, -4],
'x': [30, 12],
' ': [-12]
}
I plan on using some kind of for loop to iter through the array, and applying a match on each string to get the values I need, but I'm quite frankly, stumped.
I have already used RegExp to separate the string into the individual parts of the equation and to remove eventual spaces, but I can't imagine a way to separate -4 from x^2 in '-4x^2'.

You can try this
(-?\d+)x\^\d+.
When you execute match function :
var res = "-4x^2".match(/(-?\d+)x\^\d+/)
You will get res as an array : [ "-4x^2", "-4" ]
You have your '-4' in res[1].
By adding another group on the second \d+ (numeric char), you can retrieve the x power.
var res = "-4x^2".match(/(-?\d+)x\^(\d+)/) //res = [ "-4x^2", "-4", "2" ]
Hope it helps

If you know that the LHS of the hashtable is going to be at the end of the string. Lets say '4x', x is at the end or '-4x^2' where x^2 is at end, then we can get the number of the expression:
var exp = '-4x^2'
exp.split('x^2')[0] // will return -4
I hope this is what you were looking for.

function splitTerm(term) {
var regex = /([+-]?)([0-9]*)?([a-z](\^[0-9]+)?)?/
var match = regex.exec(term);
return {
constant: parseInt((match[1] || '') + (match[2] || 1)),
variable: match[3]
}
}
splitTerm('x^2'); // => {constant: 1, variable: "x^2"}
splitTerm('+30x'); // => {constant: 30, variable: "x"}
splitTerm('-12'); // => {constant: -12, variable: undefined}
Additionally, these tool may help you analyze and understand regular expressions:
https://regexper.com/
https://regex101.com/
http://rick.measham.id.au/paste/explain.pl

Related

Look for substring in a string with at most one different character-javascript

I am new in programing and right now I am working on one program. Program need to find the substring in a string and return the index where the chain starts to be the same. I know that for that I can use "indexOf". Is not so easy. I want to find out substrings with at moste one different char.
I was thinking about regular expresion... but not really know how to use it because I need to use regular expresion for every element of the string. Here some code wich propably will clarify what I want to do:
var A= "abbab";
var B= "ba";
var tb=[];
console.log(A.indexOf(B));
for (var i=0;i<B.length; i++){
var D=B.replace(B[i],"[a-z]");
tb.push(A.indexOf(D));
}
console.log(tb);
I know that the substring B and string A are the lowercase letters. Will be nice to get any advice how to make it using regular expresions. Thx
Simple Input:
A B
1) abbab ba
2) hello world
3) banana nan
Expected Output:
1) 1 2
2) No Match!
3) 0 2
While probably theoretically possible, I think it would very complicated to try this kind of search while attempting to incorporate all possible search query options in one long complex regular expression. I think a better approach is to use JavaScript to dynamically create various simpler options and then search with each separately.
The following code sequentially replaces each character in the initial query string with a regular expression wild card (i.e. a period, '.') and then searches the target string with that. For example, if the initial query string is 'nan', it will search with '.an', 'n.n' and 'na.'. It will only add the position of the hit to the list of hits if that position has not already been hit on a previous search. i.e. It ensures that the list of hits contains only unique values, even if multiple query variations found a hit at the same location. (This could be implemented even better with ES6 sets, but I couldn't get the Stack Overflow code snippet tool to cooperate with me while trying to use a set, even with the Babel option checked.) Finally, it sorts the hits in ascending order.
Update: The search algorithm has been updated/corrected. Originally, some hits were missed because the exec search for any query variation would only iterate as per the JavaScript default, i.e. after finding a match, it would start the next search at the next character after the end of the previous match, e.g. it would find 'aa' in 'aaaa' at positions 0 and 2. Now it starts the next search at the next character after the start of the previous match, e.g. it now finds 'aa' in 'aaaa' at positions 0, 1 and 2.
const findAllowingOneMismatch = (target, query) => {
const numLetters = query.length;
const queryVariations = [];
for (let variationNum = 0; variationNum < numLetters; variationNum += 1) {
queryVariations.push(query.slice(0, variationNum) + "." + query.slice(variationNum + 1));
};
let hits = [];
queryVariations.forEach(queryVariation => {
const re = new RegExp(queryVariation, "g");
let myArray;
while ((searchResult = re.exec(target)) !== null) {
re.lastIndex = searchResult.index + 1;
const hit = searchResult.index;
// console.log('found a hit with ' + queryVariation + ' at position ' + hit);
if (hits.indexOf(hit) === -1) {
hits.push(searchResult.index);
}
}
});
hits = hits.sort((a,b)=>(a-b));
console.log('Found "' + query + '" in "' + target + '" at positions:', JSON.stringify(hits));
};
[
['abbab', 'ba'],
['hello', 'world'],
['banana', 'nan'],
['abcde abcxe abxxe xbcde', 'abcd'],
['--xx-xxx--x----x-x-xxx--x--x-x-xx-', '----']
].forEach(pair => {findAllowingOneMismatch(pair[0], pair[1])});

Searching for multiple partial phrases so that one original phrase can not match multiple search phrases

Given a predefined set of phrases, I'd like to perform a search based on user's query. For example, consider the following set of phrases:
index phrase
-----------------------------------------
0 Stack Overflow
1 Math Overflow
2 Super User
3 Webmasters
4 Electrical Engineering
5 Programming Jokes
6 Programming Puzzles
7 Geographic Information Systems
The expected behaviour is:
query result
------------------------------------------------------------------------
s Stack Overflow, Super User, Geographic Information Systems
web Webmasters
over Stack Overflow, Math Overflow
super u Super User
user s Super User
e e Electrical Engineering
p Programming Jokes, Programming Puzzles
p p Programming Puzzles
To implement this behaviour I used a trie. Every node in the trie has an array of indices (empty initially).
To insert a phrase to the trie, I first break it to words. For example, Programming Puzzles has index = 6. Therefore, I add 6 to all the following nodes:
p
pr
pro
prog
progr
progra
program
programm
programmi
programmin
programming
pu
puz
puzz
puzzl
puzzle
puzzles
The problem is, when I search for the query prog p, I first get a list of indices for prog which is [5, 6]. Then, I get a list of indices for p which is [5, 6] as well. Finally, I calculate the intersection between the two, and return the result [5, 6], which is obviously wrong (should be [6]).
How would you fix this?
Key Observation
We can use the fact that two words in a query can match the same word in a phrase only if one query word is a prefix of the other query word (or if they are same). So if we process the query words in descending lexicographic order (prefixes come after their "superwords"), then we can safely remove words from the phrases at the first match. Doing so we left no possibility to match the same phrase word twice. As I said, it is safe because prefixes match superset of phrase words what their "superwords" can match, and pair of query words, where one is not a prefix of the other, always match disjoint set of phrase words.
We don't have to remove words from phrases or the trie "physically", we can do it "virtually".
Implementation of the Algorithm
var PhraseSearch = function () {
var Trie = function () {
this.phraseWordCount = {};
this.children = {};
};
Trie.prototype.addPhraseWord = function (phrase, word) {
if (word !== '') {
var first = word.charAt(0);
if (!this.children.hasOwnProperty(first)) {
this.children[first] = new Trie();
}
var rest = word.substring(1);
this.children[first].addPhraseWord(phrase, rest);
}
if (!this.phraseWordCount.hasOwnProperty(phrase)) {
this.phraseWordCount[phrase] = 0;
}
this.phraseWordCount[phrase]++;
};
Trie.prototype.getPhraseWordCount = function (prefix) {
if (prefix !== '') {
var first = prefix.charAt(0);
if (this.children.hasOwnProperty(first)) {
var rest = prefix.substring(1);
return this.children[first].getPhraseWordCount(rest);
} else {
return {};
}
} else {
return this.phraseWordCount;
}
}
this.trie = new Trie();
}
PhraseSearch.prototype.addPhrase = function (phrase) {
var words = phrase.trim().toLowerCase().split(/\s+/);
words.forEach(function (word) {
this.trie.addPhraseWord(phrase, word);
}, this);
}
PhraseSearch.prototype.search = function (query) {
var answer = {};
var phraseWordCount = this.trie.getPhraseWordCount('');
for (var phrase in phraseWordCount) {
if (phraseWordCount.hasOwnProperty(phrase)) {
answer[phrase] = true;
}
}
var prefixes = query.trim().toLowerCase().split(/\s+/);
prefixes.sort();
prefixes.reverse();
var prevPrefix = '';
var superprefixCount = 0;
prefixes.forEach(function (prefix) {
if (prevPrefix.indexOf(prefix) !== 0) {
superprefixCount = 0;
}
phraseWordCount = this.trie.getPhraseWordCount(prefix);
function phraseMatchedWordCount(phrase) {
return phraseWordCount.hasOwnProperty(phrase) ? phraseWordCount[phrase] - superprefixCount : 0;
}
for (var phrase in answer) {
if (answer.hasOwnProperty(phrase) && phraseMatchedWordCount(phrase) < 1) {
delete answer[phrase];
}
}
prevPrefix = prefix;
superprefixCount++;
}, this);
return Object.keys(answer);
}
function test() {
var phraseSearch = new PhraseSearch();
var phrases = [
'Stack Overflow',
'Math Overflow',
'Super User',
'Webmasters',
'Electrical Engineering',
'Programming Jokes',
'Programming Puzzles',
'Geographic Information Systems'
];
phrases.forEach(phraseSearch.addPhrase, phraseSearch);
var queries = {
's': 'Stack Overflow, Super User, Geographic Information Systems',
'web': 'Webmasters',
'over': 'Stack Overflow, Math Overflow',
'super u': 'Super User',
'user s': 'Super User',
'e e': 'Electrical Engineering',
'p': 'Programming Jokes, Programming Puzzles',
'p p': 'Programming Puzzles'
};
for(var query in queries) {
if (queries.hasOwnProperty(query)) {
var expected = queries[query];
var actual = phraseSearch.search(query).join(', ');
console.log('query: ' + query);
console.log('expected: ' + expected);
console.log('actual: ' + actual);
}
}
}
One can test this code here: http://ideone.com/RJgj6p
Possible Optimizations
Storing the phrase word count in each trie node is not very memory
efficient. But by implementing compressed trie it is possible to
reduce the worst case memory complexity to O(n m), there n is the
number of different words in all the phrases, and m is the total
number of phrases.
For simplicity I initialize answer by adding all the phrases. But
a more time efficient approach is to initialize answer by adding
the phrases matched by the query word matching least number of
phrases. Then intersect with the phrases of the query word matching
second least number of phrases. And so on...
Relevant Differences from the Implementation Referenced in the Question
In trie node I store not only the phrase references (ids) matched by the subtrie, but also the number of matched words in these phrases. So, the result of the match is not only the matched phrase references, but also the number of matched words in these phrases.
I process query words in descending lexicographic order.
I subtract the number of superprefixes (query words of which the current query word is a prefix) from current match results (by using variable superprefixCount), and a phrase is considered matched by the current query word only when the resulting number of matched words in it is greater than zero. As in the original implementation, the final result is the intersection of the matched phrases.
As one can see, changes are minimal and asymptotic complexities (both time and memory) are not changed.
If the set of phrases is defined and does not contain long phrases, maybe you can create not 1 trie, but n tries, where n is the maximum number of words in one phrase.
In i-th trie store i-th word of the phrase. Let's call it the trie with label 'i'.
To process query with m words let's consider the following algorithm:
For each phrase we will store the lowest label of a trie, where the word from this phrase was found. Let's denote it as d[j], where j is the phrase index. At first for each phrase j, d[j] = -1.
Search the first word in each of n tries.
For each phrase j find the label of a trie that is greater than d[j] and where the word from this phrase was found. If there are several such labels, pick the smallest one. Let's denote such label as c[j].
If there is no such index, this phrase can not be matched. You can mark this case with d[j] = n + 1.
If there is such c[j] that c[j] > d[j], than assign d[j] = c[j].
Repeat for every word left.
Every phrase with -1 < d[j] < n is matched.
This is not very optimal. To improve performance you should store only usable values of d array. After first word, store only phrases, matched with this word. Also, instead of assignment d[j] = n + 1, delete index j. Process only already stored phrase indexes.
You can solve it as a Graph Matching Problem in a Bipartite Graph.
For each document, query pair define the graph:
G=(V,E) Where
V = {t1 | for each term t1 in the query} U { t2 | for each term t2 in the document}
E = { (t1,t2) | t1 is a match for t2 }
Intuitively: you have a vertex for each term in the query, a vertex for each term in the document, and an edge between a document term and a query term, only if the query term matches the document term. You have already solved this part with your trie.
You got yourself a bipartite graph, there are only edges between the "query vertices" and the "document vertices" (and not between two vertices of the same type).
Now, invoke a matching problem for bipartite graph, and get an optimal matching {(t1_1,t2_1), ... , (t1_k,t2_k)}.
Your algorithm should return a document d for a query q with m terms in the query, if (and only if) all m terms are satisfied, which means - you have maximal matching where k=m.
In your example, the graph for query="prog p", and document="Programming Jokes", you will get the bipartite graph with the matching: (or with Programming,p matched - doesn't matter which)
And, for the same query, and document="Programming Puzzles", you will get the bipartite graph with the matching:
As you can see, for the first example - there is no matching that covers all the terms, and you will "reject" the document. For the 2nd example - you were able to match all terms, and you will return it.
For performance issues, you can do the suggested algorithm only on a subset of the phrases, that were already filtered out by your initial approach (intersection of documents that have matching for all terms).
After some thought I came up with a similar idea to dened's - in addition to the index of a matched phrase, each prefix will refer to how many words it is a prefix of in that phrase - then that number can be reduced in the query process by the number of its superfixes among other query words, and the returned results include only those with at least the same number of matched words as the query.
We can implement an additional small tweak to avoid large cross-checks by adding (for the English language) a maximum of approximately 26 choose 2 + 26 choose 3 and even an additional 26 choose 4 special elements to the trie that refer to ordered first-letter intersections. When a phrase is inserted, the special elements in the trie referring to the 2 and 3 first-letter combinations will receive its index. Then match results from larger query words can be cross-checked against these. For example, if our query is "Geo i", the match results for "Geo" would be cross-checked against the special trie element, "g-i", which hopefully would have significantly less match results than "i".
Also, depending on the specific circumstances, large cross-checks could at times be more efficiently handled in parallel (for example, via a bitset &).

I need a JavaScript procedure that reverses the following procedure

I have the following function that encrypts a string and I was hoping for a function that reverses the process.
function encryptStr(thisString)
{
retString = "";
/* Make retString a string of the 8-bit representations of
the ASCII values of its thisCharacters in order.
EXAMPLE: "abc" --> "011000010110001001100011"
since the ASCII values for 'a', 'b' and 'c'
are 97=01100001, 98=01100010 and 99=01100011
respectively
*/
for (i = 0, j = thisString.length; i < j; i++)
{
bits = thisString.charCodeAt(i).toString(2);
retString += new Array(8-bits.length+1).join('0') + bits;
}
/* Compress retString by taking each substring of 3, 4, ..., 9
consecutive 1's or 0's and it by the number of such consecutive
thisCharacters followed by the thisCharacter.
EXAMPLES:
"10101000010111" --> "10101401031"
"001100011111111111111" --> "0011319151"
*/
retString = retString.replace(/([01])\1{2,8}/g, function($0, $1) { return ($0.length + $1);});
return retString;
}
I tried to make a function and I'm probably doing it wrong because it's 50 lines already. I'm realizing that there's tons of error checking that needs to go on. For instance, I just realized a potential problem because JavaScript characters don't span the entire 127 ASCII values. Should I just give up? Is this a futile problem?
First, find the numbers in the string which are not 0 or 1. Then, expand them in the opposite way that the original function collapsed them. You can again use String.prototype.replace() here with a replacement function...
str.replace(/([2-9])([01])/g,
function(all, replacementCount, bit) {
return Array(+replacementCount + 1).join(bit);
});
Then, simply decode the bit stream back into characters with String.fromCharCode(). You'd need to chunk the stream into 8 bit chunks, and then perform the conversion. I chose to use Array.prototype.reduce() as it's quite suited to this task. Alternatively, you could use String.fromCharCode.apply(String, chunks.map(function(byte) { return parseInt(byte, 2); })) to get the resulting string.
Something like...
str.split(/(.{8})/g).reduce(function(str, byte) {
return str + String.fromCharCode(parseInt(byte, 2));
}, "");
Put it together, and you get a function like...
function decryptStr(thisString) {
return thisString.replace(/([2-9])([01])/g,
function (all, replacementCount, bit) {
return Array(+replacementCount + 1).join(bit);
}).split(/(.{8})/g).reduce(function (str, byte) {
return str + String.fromCharCode(parseInt(byte, 2));
}, "");
}
jsFiddle.
Also, remember to place var in front of your variable declarations, otherwise those variable identifiers will leak to the containing scope until they're resolved (which is usually the global object).

Eval alternative

This code works as a calculator, but the scratch pad at codeacademy tells me that eval is evil. Is there another way to do the same thing without using eval?
var calculate = prompt("Enter problem");
alert(eval(calculate));
eval evaluates the string input as JavaScript and coincidentally JavaScript supports calculations and understands 1+1, which makes it suitable as a calculator.
If you don't want to use eval, which is good, you have to parse that string yourself and, finally, do the computation yourself (not exactly yourself though). Have a look at this math processor, which does what you want.
Basically what you do is:
Read the input string char by char (with this kind of problem it's still possible)
Building a tree of actions you want to do
At the end of the string, you evaluate the tree and do some calculations
For example you have "1+2/3", this could evaluate to the following data structure:
"+"
/ \
"1" "/"
/ \
"2" "3"
You could then traverse that structure from top to bottom and do the computations.
At first you've got the "+", which has a 1 on the left side and some expression on the right side,
so you have to evaluate that expression first. So you go to the "/" node, which has two numeric children. Knowing that, you can now compute 2/3 and replace the whole "/" node with the result of that. Now you can go up again and compute the result of the "+" node: 1 + 0.66. Now you replace that node with the result and all you've got left is the result of the expression.
Some pseudo code on how this might look in your code:
calculation(operator, leftValue, rightValue):
switch operator {
case '+': return leftValue + rightValue
case '-': return 42
}
action(node):
node.value = calculation(node.operator, action(node.left) action(node.right))
As you might have noticed, the tree is designed in such a way that it honors operator precedence. The / has a lower level than the +, which means it get's evaluated first.
However you do this in detail, that's basically the way to go.
You can use the expression parser that is included in the math.js library:
http://mathjs.org
Example usage:
mathjs.evaluate('1.2 / (2.3 + 0.7)'); // 0.4
mathjs.evaluate('5.08 cm in inch'); // 2 inch
mathjs.evaluate('sin(45 deg) ^ 2'); // 0.5
mathjs.evaluate('9 / 3 + 2i'); // 3 + 2i
mathjs.evaluate('det([-1, 2; 3, 1])'); // -7
You can use eval safely for a simple arithmetic calculator by filtering the input- if you only accept digits, decimal points and operators (+,-,*,/) you won't get in much trouble. If you want advanced Math functions, you are better off with the parser suggestions.
function calculate(){
"use strict";
var s= prompt('Enter problem');
if(/[^0-9()*+\/ .-]+/.test(s)) throw Error('bad input...');
try{
var ans= eval(s);
}
catch(er){
alert(er.message);
}
alert(ans);
}
calculate()
I write some functions when I had a problem like this. Maybe this can help:
data = [
{id:1,val1:"test",val2:"test2",val2:"test3"},
{id:2,val1:"test",val2:"test2",val2:"test3"},
{id:3,val1:"test",val2:"test2",val2:"test3"}
];
datakey = Object.keys(data[0]);
// here's a fix for e['datakey[f]'] >> e[x]
vix = function(e,f){
a = "string";
e[a] = datakey[f];
x = e.string;
end = e[x];
delete e.string;
return end;
};
// here's a fix to define that variable
vox = function(e,f,string){
a = "string";
e[a] = datakey[f];
x = e.string;
end = e[x] = string;
delete e.string;
};
row = 2 // 3th row ==> {id:3,val1:"test",val2:"test2",val2:"test3"}
column = 1 //datakey 2 ==> val1
vox(data[row],column,"new value");
alert(data[2].val1); //the value that we have changed

Doing assignment in VBscript now.. Need to give positions of each "e" in a string

I've done this in JavaScript but needless to say I can't just swap it over.
In Jscript I used this:
var estr = tx_val
index = 0
positions = []
while((index = estr.indexOf("e", index + 1)) != -1)
{
positions.push(index);
}
document.getElementById('ans6').innerHTML = "Locations of 'e' in string the are: "
+ positions;
I tried using the same logic with VBS terms, ie join, I also tried using InStr. I'm just not sure how to yank out that 'e'... Maybe I'll try replacing it with another character.
Here is what I tried with VBScript. I tried using InStr and replace to yank out the first occurance of 'e' in each loop and replace it with an 'x'. I thought that maybe this would make the next loop through give the location of the next 'e'. -- When I don't get a subscript out of range 'i' error, I only get one location back from the script and its 0.
(6) show the location of each occurence of the character "e" in the string "tx_val" in the span block with id="ans6"
countArr = array()
countArr = split(tx_val)
estr = tx_val
outhtml = ""
positions = array()
i=0
for each word in countArr
i= i+1
positions(i) = InStr(1,estr,"e",1)
estr = replace(estr,"e","x",1,1)
next
document.getElementById("ans6").innerHTML = "E is located at: " & positions
What can I do that is simpler than this and works? and thank you in advance, you all help a lot.
EDIT AGAIN: I finally got it working right. I'm not 100% how. But I ran through the logic in my head a few dozen times before I wrote it and after a few kinks it works.
local = ""
simon = tx_val
place=(InStr(1,simon,"e"))
i=(len(simon))
count = tx_val
do
local = (local & " " & (InStr((place),simon,"e")))
place = InStr((place+1),simon,"e")
count = (InStr(1,simon,"e"))
loop while place <> 0
document.getElementById("ans6").innerHTML= local
InStr has slightly different parameters to indexOf:
InStr([start, ]string, searchValue[, compare])
start: The index at which to start searching
string: The string to search
searchValue: The string to search for
Also note that Visual Basic indexes strings beginning at 1 so all the input and return index values are 1 more than the original JavaScript.
You can try split(). For example a simple string like this:
string = "thisismystring"
Split on "s", so we have
mystring = Split(string,"s")
So in the array mystring, we have
thi i my tring
^ ^ ^ ^
[0] [1] [2] [3]
All you have to do is check the length of each array item using Len(). For example, item 0 has length of 3 (thi), so the "s" is at position 4 (which is index 3). Take note of this length, and do for the next item. Item 1 has length of 1, so we add it to 4, to get 5, and so on.
#Update, here's an example using vbscript
thestring = "thisismystring"
delimiter="str"
mystring = Split(thestring,delimiter)
c=0
For i=0 To UBound(mystring)-1
c = c + Len(mystring(i)) + Len(delimiter)
WScript.Echo "index of s: " & c - Len(delimiter)
Next
Trial:
C:\test> cscript //nologo test.vbs
index of str is: 8

Categories

Resources