I have this string it represent a chart of accounts
"1: Comptes de capitaux
10\. Capital et Réserves.
101\. Capital.
1011\. Capital souscrit - non appelé.
1012\. Capital souscrit - appelé, non versé.
1013\. Capital souscrit - appelé, versé.
10131\. Capital non amorti.
10132\. Capital amorti.
1018\. Capital souscrit soumis à une réglementation particulière.
105\. Ecarts de réévaluation.
108\. Compte de l'exploitant. "
I want this output :
"1":
{ id:"1",
accountName: "Comptes de capitaux",
children:{
id:"10",
accountName: "Capital et Réserves" ,
children:{
id:"101",
accountName:"Capital",
children: {
id:"1011",
accountName:"Capital souscrit - non appelé",
children: {...}
},{
id:"1012",
accountName:"Capital souscrit - appelé, non versé",
children: {...}
},...
},{
id:"105",
accountName:"Ecarts de réévaluation",
children: {}
},{...}
}
}
The solution should correctly create the tree structure with the parent-child relationships based on the account ID this my solution and is always wrong it does not correctly create the tree structure above can you provide me the right answer or the error i have commited in my code
function createChartOfAccountsTree(input) {
// Split the input by line
const lines = input.split("\n");
// Create an object to store the accounts
let accounts = {};
// Loop through each line
for (let i = 0; i < lines.length; i++) {
// Split the line by space
let parts = lines[i].split(" ");
// Extract the ID and account name
let id = parts[0];
let accountName = parts.slice(1).join(" ");
// remove the dot from account name
accountName = accountName.replace(".","");
// Create an account object
let account = {
id,
accountName,
children: {}
};
// Check if the account is a child of an existing account
let parentId = id.slice(0, -1);
let parent = accounts[parentId];
if (parent) {
// If the account has a parent, add it as a child
parent.children[id] = account;
} else {
// If the account does not have a parent, it's a top level account
accounts[id] = account;
}
}
return accounts;
}
console.log(createChartOfAccountsTree("1 : Comptes de capitaux\n10. Capital et Réserves.\n101.Capital.\n1011. Capital souscrit - non appelé.\n1012. Capital souscrit - appelé, non versé.\n1013. Capital souscrit - appelé, versé.\n10131. Capital non amorti.\n10132. Capital amorti.\n1018. Capital souscrit soumis à une réglementation particulière.\n105. Ecarts de réévaluation.\n108. Compte de l'exploitant. ")); ```
There are a few issues:
The input seems to not have the same format on each line. In the example, the first line has a colon after the number (or a space and a colon -- as in your code), while other lines have a point after the number.
In either case, your current code does not remove those delimiters when they come right after the initial number. So there is no child-parent match when you just remove the final character from the id to identify the parent -- as you only remove that point or colon (when there is no space before it). If the input format is really that varying, then you could use a regular expression to identify the two parts in each line.
The code seems to want children to be a plain object whose keys are id values, but your desired output section does not specify such keys -- it represents children as if it is an array but with an invalid syntax.
As you want to identify each parent with accounts[parentId], you must make sure that every entry is logged in accounts, not just the top level. So the else should be revised. Maybe use a separate variable that serves for logging the top level accounts only.
Here is your code with the above issues handled:
function createChartOfAccountsTree(input) {
// Split the input by line, and immediately identify
// the numeric prefix, and exclude final point
const lines = input.matchAll(/^(\d+)[ .:]+(.*?)\.?[ ]*$/gm);
// Create an object to store all the accounts
const accounts = {};
// ...and one for the top-level accounts
const top = {};
// Loop through each line, grabbing the parts
for (const [, id, accountName] of lines) {
// Create an account object
let account = {
id,
accountName,
children: {}
};
// Check if the account is a child of an existing account
let parentId = id.slice(0, -1);
let parent = accounts[parentId];
if (parent) {
// If the account has a parent, add it as a child
parent.children[id] = account;
} else {
// If the account does not have a parent, it's a top level account
top[id] = account;
}
// Log all accounts:
accounts[id] = account;
}
return top;
}
const input = `1: Comptes de capitaux
10. Capital et Réserves.
101. Capital.
1011. Capital souscrit - non appelé.
1012. Capital souscrit - appelé, non versé.
1013. Capital souscrit - appelé, versé.
10131. Capital non amorti.
10132. Capital amorti.
1018. Capital souscrit soumis à une réglementation particulière.
105. Ecarts de réévaluation.
108. Compte de l'exploitant. `;
console.log(createChartOfAccountsTree(input));
Related
I will receive an array of string-like below.
In each string, there may be three signs: $,%,* in the string
For example,
“I would $rather %be $happy, %if working in a chocolate factory”
“It is ok to play tennis”
“Tennis $is a good sport”
“AO is really *good sport”
However, there may be no signs in it, maybe only one sign in it.
There are only five cases in string,
1. no sign at all,
2. having $,% ;
3. having only $,
4 having only %,
5 having only *
If there is no sign, I don’t need to process it.
Otherwise, I need to process it and add an indicator to the left of the first sign that occurs in the sentence.
For example:
“I would ---dollorAndperSign—-$rather %be $happy, %if working in a chocolate factory”
“Tennis --dollorSign—-$is a good sport”
This is my idea code.
So, I need to decide if the string contains any sign. If there is no sign, I don’t need to process it.
texts.map((text) => {
if (text.includes("$") || text.includes("%") || text.includes("*")) {
//to get the index of signs
let indexOfdollar, indexOfper, indexOfStar;
indexOfdollar = text.indexOf("$");
indexOfper = text.indexOf("%");
indexOfStar = text.indexOf("*");
//return a completed process text
}
});
Question:
how do I know which index is the smallest one in order to locate the position of the first sign occurring in the text? Getting the smallest value may not be the correct approach coz there may be the case that I will get -1 from the above code?
I focussed only on the "get the smallest index" part of your question... Since you will be able to do what you want with it after.
You can have the indexOf() in an array, filter it to remove the -1 and then use Math.min() to get the smallest one.
Edited to output an object instead, which includes the first index and some booleans for the presence each char.
const texts = [
"I would $rather %be $happy, %if working in a chocolate factory",
"It is ok to play tennis",
"Tennis $is a good sport",
"AO is really *good sport"
]
const minIndexes = texts.map((text,i) => {
//to get the signs
const hasDollard = text.indexOf("$") >= 0
const hasPercent = text.indexOf("%") >= 0
const hasStar = text.indexOf("*") >= 0
//to get the first index
const indexes = [text.indexOf("$"), text.indexOf("%"), text.indexOf("*")].filter((index) => index >= 0)
if(!indexes.length){
return null
}
return {
index: Math.min( ...indexes),
hasDollard,
hasPercent,
hasStar
}
});
console.log(minIndexes)
const texts = [
"I would $rather %be $happy, %if working in a chocolate factory",
"It is ok to play tennis",
"Tennis $is a good sport",
"AO is really *good sport"
]
texts.forEach(text => {
let sighs = ["%","$","*"];
let chr = text.split('').find(t => sighs.find(s => s==t));
if (!chr)
return;
text = text.replace(chr, "---some text---" + chr);
console.log(text);
})
const data = ['I would $rather %be $happy, %if working in chocolate factory', 'It is ok to play tennis', 'Tennis $is a good sport', 'AO is really *good sport']
const replace = s => {
signs = { $: 'dollar', '%': 'per', '*': 'star' },
characters = Array.from(s, (c,i)=> '$%*'.includes(c)? c:'').join('')
headText = [...new Set(Array.from(characters))].map(c => signs[c]).join('|')
s.replace(/[\$\%\*]/, `--${text}--$&`);
}
const result = data.map(replace)
I am a fresh with JavaScript. I just tried a lot, but did not get the answer and information to show how to count occurrence of multiple sub-string in a long string at one time.
Further information: I need get the occurrence of these sub-string and if the number of their occurrence to much, I need replace them at one time,so I need get the occurrence at one time.
Here is an example:
The long string Text as below,
Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
The sub-string is a question, but what I need is to count each word occurrence in this sub-string at one time. for example, the word "name","NFL","championship","game" and "is","the" in this string.
What is the name of NFL championship game?
One of problems is some sub-string is not in the text, and some have shown many times.(which I might replaced it)
The Code I have tried as below, it is wrong, I have tried many different ways but no good results.
$(".showMoreFeatures").click(function(){
var text= $(".article p").text(); // This is to get the text.
var textCount = new Array();
// Because I use match, so for the first word "what", will return null, so
this is to avoid this null. and I was plan to get the count number, if it is
more than 7 or even more, I will replace them.
var qus = item2.question; //This is to get the sub-string
var checkQus = qus.split(" "); // I split the question to words
var newCheckQus = new Array();
// This is the array I was plan put the sub-string which count number less than 7, which I really needed words.
var count = new Array();
// Because it is a question as sub-string and have many words, so I wan plan to get their number and put them in a array.
for(var k =0; k < checkQus.length; k++){
textCount = text.match(checkQus[k],"g")
if(textCount == null){
continue;
}
for(var j =0; j<checkQus.length;j++){
count[j] = textCount.length;
}
//count++;
}
I was tried many different ways, and searched a lot, but no good results. The above code just want to show what I have tried and my thinking(might totally wrong). But actually it is not working , if you know how to implement it,solve my problem, please just tell me, no need to correct my code.
Thanks very much.
If I have understood the question correctly then it seems you need to count the number of times the words in the question (que) appear in the text (txt)...
var txt = "Super Bowl 50 was an American ...etc... Arabic numerals 50.";
var que = "What is the name of NFL championship game?";
I'll go through this in vanilla JavaScript and you can transpose it for JQuery as required.
First of all, to focus on the text we can make things a little simpler by changing the strings to lowercase and removing some of the punctuation.
// both strings to lowercase
txt = txt.toLowerCase();
que = que.toLowerCase();
// remove punctuation
// using double \\ for proper regular expression syntax
var puncArray = ["\\,", "\\.", "\\(", "\\)", "\\!", "\\?"];
puncArray.forEach(function(P) {
// create a regular expresion from each punctuation 'P'
var rEx = new RegExp( P, "g");
// replace every 'P' with empty string (nothing)
txt = txt.replace(rEx, '');
que = que.replace(rEx, '');
});
Now we can create a cleaner array from str and que as well as a hash table from que like so...
// Arrays: split at every space
var txtArray = txt.split(" ");
var queArray = que.split(" ");
// Object, for storing 'que' counts
var queObject = {};
queArray.forEach(function(S) {
// create 'queObject' keys from 'queArray'
// and set value to zero (0)
queObject[S] = 0;
});
queObject will be used to hold the words counted. If you were to console.debug(queObject) at this point it would look something like this...
console.debug(queObject);
/* =>
queObject = {
what: 0,
is: 0,
the: 0,
name: 0,
of: 0,
nfl: 0,
championship: 0,
game: 0
}
*/
Now we want to test each element in txtArray to see if it contains any of the elements in queArray. If the test is true we'll add +1 to the equivalent queObject property, like this...
// go through each element in 'queArray'
queArray.forEach(function(A) {
// create regular expression for testing
var rEx = new RegExp( A );
// test 'rEx' against elements in 'txtArray'
txtArray.forEach(function(B) {
// is 'A' in 'B'?
if (rEx.test(B)) {
// increase 'queObject' property 'A' by 1.
queObject[A]++;
}
});
});
We use RegExp test method here rather than String match method because we just want to know if "is A in B == true". If it is true then we increase the corresponding queObject property by 1. This method will also find words inside words, such as 'is' in 'San Francisco' etc.
All being well, logging queObject to the console will show you how many times each word in the question appeared in the text.
console.debug(queObject);
/* =>
queObject = {
what: 0
is: 2
the: 17
name: 0
of: 2
nfl: 1
championship: 0
game: 4
}
*/
Hoped that helped. :)
See MDN for more information on:
Array.forEach()
Object.keys()
RegExp.test()
I want to filter out the following information out of a long piece of text. Which I copy
and paste in a textfield and then want to process into a table as a result. with
Name
Address
Status
Example snippet:(Kind of randomized the names and addresses etc)
Thuisprikindeling voor: Vrijdag 15 Mei 2015 DE SMART BON 22 afspraken
Pagina 1/4
Persoonlijke mededeling:
Algemene mededeling:
Prikpostgegevens: REEK-Eeklo extern, (-)
Telefoonnummer Fax Mobiel 0499/9999999 Email dummy.dummy#gmail.com
DUMMY FOO V Stationstreet 2 8000 New York F N - Sober BSN: 1655
THUIS Analyses: Werknr: PIN: 000000002038905
Opdrachtgever: Laboratorium Arts:
Mededeling: Some comments // VERY DIFFICULT
FO DUMMY FOO V Butterstreet 6 8740 Melbourne F N - Sober BSN: 15898
THUIS Analyses: Werknr: AFD 3 PIN: 000000002035900
Opdrachtgever: Laboratorium Arts:
Mededeling: ZH BLA / BLA BLA - AFD 3 - SOCIAL BEER
JOHN FOOO V Waterstreet 1 9990 Rome F N - Sober BSN: 17878
THUIS / Analyses: Werknr: K111 PIN: 000000002037888
Opdrachtgever: Laboratorium Arts:
Mededeling: TRYOUT/FOO
FO SMOOTH M.FOO M Queen Elisabethstreet 19 9990 Paris F NN - Not Sober BSN: 14877
What I want to get out of it is this:
DUMMY FOO Stationstreet 2 8000 New York Sober
FO DUMMY FOO Butterstreet 6 8740 Melbourne Sober
JOHN FOOO Waterstreet 1 9990 Rome Sober
FO SMOOTH M.FOO Queen Elisabethstreet 19 9990 Paris Not sober
My strategy for the moment is using the following:
Filter all the lines with at least two words in capitals at the beginning of the line. AND a 4 digit postal code.
Then discard all the other lines as I only need the lines with the names and adresses
Then I strip out all the information needed for that line
Strip the name / address / status
I use the following code:
//Regular expressions
//Filter all lines which start with at least two UPPERCASE words following a space
pattern = /^(([A-Z'.* ]{2,} ){2,}[A-Z]{1,})(?=.*BSN)/;
postcode = /\d{4}/;
searchSober= /(N - Sober)+/;
searchNotSober= /(NN - Not sober)+/;
adres = inputText.split('\n');
for (var i = 0; i < adres.length; i++) {
// If in one line And a postcode and which starts with at least
// two UPPERCASE words following a space
temp = adres[i]
if ( pattern.test(temp) && postcode.test(temp)) {
//Remove BSN in order to be able to use digits to sort out the postal code
temp = temp.replace( /BSN.*/g, "");
// Example: DUMMY FOO V Stationstreet 2 8000 New York F N - Sober
//Selection of the name, always take first part of the array
// DUMMY FOO
var name = temp.match(/^([-A-Z'*.]{2,} ){1,}[-A-Z.]{2,}/)[0];
//remove the name from the string
temp = temp.replace(/^([-A-Z'*.]{2,} ){1,}[-A-Z.]{2,}/, "");
// V Stationstreet 2 8000 New York F N - Sober
//filter out gender
//Using jquery trim for whitespace trimming
// V
var gender = $.trim(temp.match(/^( [A-Z'*.]{1} )/)[0]);
//remove gender
temp = temp.replace(/^( [A-Z'*.]{1} )/, "");
// Stationstreet 2 8000 New York F N - Sober
//looking for status
var status = "unknown";
if ( searchNotsober.test(temp) ) {
status = "Not soberr";
}
else if ( searchSober.test(temp) ) {
status = "Sober";
}
else {
status = "unknown";
}
//Selection of the address /^.*[0-9]{4}.[\w-]{2,40}/
//Stationstreet 2 8000 New York
var address = $.trim(temp.match(/^.*[0-9]{4}.[\w-]{2,40}/gm));
//assemble into person object.
var person={name: name + "", address: address + "", gender: gender +"", status:status + "", location:[] , marker:[]};
result.push(person);
}
}
The problem I have now is that:
Sometimes the names are not written in CAPITALS
Sometimes the postal code is not added so my code just stops working.
Sometimes they put a * in front of the name
A broader question is what strategy can you take to tackle these type of messy input problems?
Should I make cases for every mistake I see in these snippets I get? I feel like
I don't really know exactly what I will get out of this piece of code every time I run
it with different input.
Here is a general way of handling it:
Find all lines that are most likely matches. Match on "Sober" or whatever makes it unlikely to miss a match, even if it gives you false positives.
Filter out false positives, this you have to update and tweak as you go. Make sure you only filter out what isn't relevant at all.
Strict filtering of input, what doesn't match gets logged/reported for manual handling, what does match now conforms to a known strict pattern
Normalize and extract data should now be much easier since you have limited possible input at this stage
Given a predefined set of phrases, I'd like to perform a search based on user's query. For example, consider the following set of phrases:
index phrase
-----------------------------------------
0 Stack Overflow
1 Math Overflow
2 Super User
3 Webmasters
4 Electrical Engineering
5 Programming Jokes
6 Programming Puzzles
7 Geographic Information Systems
The expected behaviour is:
query result
------------------------------------------------------------------------
s Stack Overflow, Super User, Geographic Information Systems
web Webmasters
over Stack Overflow, Math Overflow
super u Super User
user s Super User
e e Electrical Engineering
p Programming Jokes, Programming Puzzles
p p Programming Puzzles
To implement this behaviour I used a trie. Every node in the trie has an array of indices (empty initially).
To insert a phrase to the trie, I first break it to words. For example, Programming Puzzles has index = 6. Therefore, I add 6 to all the following nodes:
p
pr
pro
prog
progr
progra
program
programm
programmi
programmin
programming
pu
puz
puzz
puzzl
puzzle
puzzles
The problem is, when I search for the query prog p, I first get a list of indices for prog which is [5, 6]. Then, I get a list of indices for p which is [5, 6] as well. Finally, I calculate the intersection between the two, and return the result [5, 6], which is obviously wrong (should be [6]).
How would you fix this?
Key Observation
We can use the fact that two words in a query can match the same word in a phrase only if one query word is a prefix of the other query word (or if they are same). So if we process the query words in descending lexicographic order (prefixes come after their "superwords"), then we can safely remove words from the phrases at the first match. Doing so we left no possibility to match the same phrase word twice. As I said, it is safe because prefixes match superset of phrase words what their "superwords" can match, and pair of query words, where one is not a prefix of the other, always match disjoint set of phrase words.
We don't have to remove words from phrases or the trie "physically", we can do it "virtually".
Implementation of the Algorithm
var PhraseSearch = function () {
var Trie = function () {
this.phraseWordCount = {};
this.children = {};
};
Trie.prototype.addPhraseWord = function (phrase, word) {
if (word !== '') {
var first = word.charAt(0);
if (!this.children.hasOwnProperty(first)) {
this.children[first] = new Trie();
}
var rest = word.substring(1);
this.children[first].addPhraseWord(phrase, rest);
}
if (!this.phraseWordCount.hasOwnProperty(phrase)) {
this.phraseWordCount[phrase] = 0;
}
this.phraseWordCount[phrase]++;
};
Trie.prototype.getPhraseWordCount = function (prefix) {
if (prefix !== '') {
var first = prefix.charAt(0);
if (this.children.hasOwnProperty(first)) {
var rest = prefix.substring(1);
return this.children[first].getPhraseWordCount(rest);
} else {
return {};
}
} else {
return this.phraseWordCount;
}
}
this.trie = new Trie();
}
PhraseSearch.prototype.addPhrase = function (phrase) {
var words = phrase.trim().toLowerCase().split(/\s+/);
words.forEach(function (word) {
this.trie.addPhraseWord(phrase, word);
}, this);
}
PhraseSearch.prototype.search = function (query) {
var answer = {};
var phraseWordCount = this.trie.getPhraseWordCount('');
for (var phrase in phraseWordCount) {
if (phraseWordCount.hasOwnProperty(phrase)) {
answer[phrase] = true;
}
}
var prefixes = query.trim().toLowerCase().split(/\s+/);
prefixes.sort();
prefixes.reverse();
var prevPrefix = '';
var superprefixCount = 0;
prefixes.forEach(function (prefix) {
if (prevPrefix.indexOf(prefix) !== 0) {
superprefixCount = 0;
}
phraseWordCount = this.trie.getPhraseWordCount(prefix);
function phraseMatchedWordCount(phrase) {
return phraseWordCount.hasOwnProperty(phrase) ? phraseWordCount[phrase] - superprefixCount : 0;
}
for (var phrase in answer) {
if (answer.hasOwnProperty(phrase) && phraseMatchedWordCount(phrase) < 1) {
delete answer[phrase];
}
}
prevPrefix = prefix;
superprefixCount++;
}, this);
return Object.keys(answer);
}
function test() {
var phraseSearch = new PhraseSearch();
var phrases = [
'Stack Overflow',
'Math Overflow',
'Super User',
'Webmasters',
'Electrical Engineering',
'Programming Jokes',
'Programming Puzzles',
'Geographic Information Systems'
];
phrases.forEach(phraseSearch.addPhrase, phraseSearch);
var queries = {
's': 'Stack Overflow, Super User, Geographic Information Systems',
'web': 'Webmasters',
'over': 'Stack Overflow, Math Overflow',
'super u': 'Super User',
'user s': 'Super User',
'e e': 'Electrical Engineering',
'p': 'Programming Jokes, Programming Puzzles',
'p p': 'Programming Puzzles'
};
for(var query in queries) {
if (queries.hasOwnProperty(query)) {
var expected = queries[query];
var actual = phraseSearch.search(query).join(', ');
console.log('query: ' + query);
console.log('expected: ' + expected);
console.log('actual: ' + actual);
}
}
}
One can test this code here: http://ideone.com/RJgj6p
Possible Optimizations
Storing the phrase word count in each trie node is not very memory
efficient. But by implementing compressed trie it is possible to
reduce the worst case memory complexity to O(n m), there n is the
number of different words in all the phrases, and m is the total
number of phrases.
For simplicity I initialize answer by adding all the phrases. But
a more time efficient approach is to initialize answer by adding
the phrases matched by the query word matching least number of
phrases. Then intersect with the phrases of the query word matching
second least number of phrases. And so on...
Relevant Differences from the Implementation Referenced in the Question
In trie node I store not only the phrase references (ids) matched by the subtrie, but also the number of matched words in these phrases. So, the result of the match is not only the matched phrase references, but also the number of matched words in these phrases.
I process query words in descending lexicographic order.
I subtract the number of superprefixes (query words of which the current query word is a prefix) from current match results (by using variable superprefixCount), and a phrase is considered matched by the current query word only when the resulting number of matched words in it is greater than zero. As in the original implementation, the final result is the intersection of the matched phrases.
As one can see, changes are minimal and asymptotic complexities (both time and memory) are not changed.
If the set of phrases is defined and does not contain long phrases, maybe you can create not 1 trie, but n tries, where n is the maximum number of words in one phrase.
In i-th trie store i-th word of the phrase. Let's call it the trie with label 'i'.
To process query with m words let's consider the following algorithm:
For each phrase we will store the lowest label of a trie, where the word from this phrase was found. Let's denote it as d[j], where j is the phrase index. At first for each phrase j, d[j] = -1.
Search the first word in each of n tries.
For each phrase j find the label of a trie that is greater than d[j] and where the word from this phrase was found. If there are several such labels, pick the smallest one. Let's denote such label as c[j].
If there is no such index, this phrase can not be matched. You can mark this case with d[j] = n + 1.
If there is such c[j] that c[j] > d[j], than assign d[j] = c[j].
Repeat for every word left.
Every phrase with -1 < d[j] < n is matched.
This is not very optimal. To improve performance you should store only usable values of d array. After first word, store only phrases, matched with this word. Also, instead of assignment d[j] = n + 1, delete index j. Process only already stored phrase indexes.
You can solve it as a Graph Matching Problem in a Bipartite Graph.
For each document, query pair define the graph:
G=(V,E) Where
V = {t1 | for each term t1 in the query} U { t2 | for each term t2 in the document}
E = { (t1,t2) | t1 is a match for t2 }
Intuitively: you have a vertex for each term in the query, a vertex for each term in the document, and an edge between a document term and a query term, only if the query term matches the document term. You have already solved this part with your trie.
You got yourself a bipartite graph, there are only edges between the "query vertices" and the "document vertices" (and not between two vertices of the same type).
Now, invoke a matching problem for bipartite graph, and get an optimal matching {(t1_1,t2_1), ... , (t1_k,t2_k)}.
Your algorithm should return a document d for a query q with m terms in the query, if (and only if) all m terms are satisfied, which means - you have maximal matching where k=m.
In your example, the graph for query="prog p", and document="Programming Jokes", you will get the bipartite graph with the matching: (or with Programming,p matched - doesn't matter which)
And, for the same query, and document="Programming Puzzles", you will get the bipartite graph with the matching:
As you can see, for the first example - there is no matching that covers all the terms, and you will "reject" the document. For the 2nd example - you were able to match all terms, and you will return it.
For performance issues, you can do the suggested algorithm only on a subset of the phrases, that were already filtered out by your initial approach (intersection of documents that have matching for all terms).
After some thought I came up with a similar idea to dened's - in addition to the index of a matched phrase, each prefix will refer to how many words it is a prefix of in that phrase - then that number can be reduced in the query process by the number of its superfixes among other query words, and the returned results include only those with at least the same number of matched words as the query.
We can implement an additional small tweak to avoid large cross-checks by adding (for the English language) a maximum of approximately 26 choose 2 + 26 choose 3 and even an additional 26 choose 4 special elements to the trie that refer to ordered first-letter intersections. When a phrase is inserted, the special elements in the trie referring to the 2 and 3 first-letter combinations will receive its index. Then match results from larger query words can be cross-checked against these. For example, if our query is "Geo i", the match results for "Geo" would be cross-checked against the special trie element, "g-i", which hopefully would have significantly less match results than "i".
Also, depending on the specific circumstances, large cross-checks could at times be more efficiently handled in parallel (for example, via a bitset &).
Hell All,
I have an odd problem:
//dataText hold current language data that's gathered from another function
//pick one to test it out
//if english data gathered
var dataText = ["Data uploads"];
//if french data gathered
var dataText = ["Envois de données"];
function lang_lib(lang) {
var data_fre = [13, 'Envois de données'];
var data_eng = [14, 'Data uploads'];
var data_lang, rep_lang;
switch(lang) {
case "English":
data_lang = data_eng;
data_rep = rep_eng;
break;
case "Français":
data_lang = data_fre;
data_rep = rep_fre;
break;
default:
$('table.infobox tbody').append('<tr><td id="lang-fail"><ul class="first last"><li>User language is not available</li></ul></td></tr>');
};
this.data_uploads = data_lang[1];
}
_lang = new lang_lib($('#toplinks-language').text());
//if lang_lib("English")
alert($.inArray(_lang.data_uploads, dataText)); // 0
//if lang_lib("Français")
alert($.inArray(_lang.data_uploads, dataText)); // -1
I shortened the code but it should give a general idea of what I'm trying to achieve.
I know it seems weird why I would be using the same data in two arrays but the data_fre and data_eng have language specific dataText info plus other language specific data as well. dataText will have non-specific language data which is why I'm testing it agains data_fre or data_eng to find which language to use.
I can't figure out why it would return -1 because I have other languages set (with special character too like Russian text) and they all return 0.
Appreciate the help :)
-1 means false. 0 means 'at position 0'. Without knowing more about the data coming in, I expect it is working properly.
Strings do not match numbers.
Simple Test
var arr = [13, 'Envois de données'];
console.log($.inArray(13,arr)); // 0 - matches as a number
console.log($.inArray("13",arr)); // -1 - matches as a string
Ok I figured out what it was.
I used $.trim() in the function that collects data for dataText. Since I couldn't see any leading or trailing spaces when I would alert() it was confusing me why it wouldn't work.
This explains why $.inArray() wouldn't match "Envois de données" with "Envois de données ".
Thanks again everybody for taking a look :)