How to make optional word in PEG.js - javascript

I'm trying to build a simple parser with PEG.js. I want the user to be able to input a series of keywords, with an optional "AND" between them, but I can't seem to get the optional and working. It always expects it, even though I've marked it with a ? (zero or one).
Paste this grammar into http://pegjs.majda.cz/online:
parse = pair+
pair = p:word and? { return p }
word = w:char+ { return w.join(""); }
char = c:[^ \r\n\t] { return c; }
and = ws* 'and'i ws*
ws = [ \t]
My goal is to have either of these inputs parse into an array of ["foo", "bar"]:
foo bar
foo and bar

Ok, nevermind. I figured it out. It was because I made the optional whitespace preceding the 'and' as part of the and rule, so it expected the rest of the rule. I just needed to move it out, into the pair rule, like so:
parse = pair+
pair = p:word ws* and? { return p }
word = w:char+ { return w.join(""); }
char = c:[^ \r\n\t] { return c; }
and = 'and'i ws*
ws = [ \t]

I know this is a very old question, but seeing how it wasn't answered and someone might stumble upon it I'd like to submit my answer:
Program
= w1:$Word _ wds:(_ "and"? _ w:$Word { return w; })* _ {
wds.unshift(w1);
return wds;
}
Word
= [a-zA-Z]+
_
= [ ,\t]*
The "and" is optional and you can add as many words as you want. The parser will skip the "and"'s and return a list of the words. I took the liberty of removing commas.
You can try it out https://pegjs.org/online with a string like:
carlos, peter, vincent, thomas, and shirly
Hope it helps someone.

Here is my answer:
start
= words
words
= head:word tail:(and (word))* {
var words = [head];
tail.forEach(function (word) {
word = word[1];
words.push(word);
})
return words;
}
and
= ' and '
/ $' '+
word
= $chars
chars 'chars'
= [a-zA-Z0-9]+

Related

Group array with two words, rather than one

CODE BELOW: When a word has been written, it stores that as its own array, meaning every single word is its own array, and then later checked for reoccurrences.
What i want: Instead of it creating an array of a word (after spacebar has been hit), i want it to do it after 2 words have been written.
IE: Instead of me writing "Hello" + spacebar, and the code creating "hello" as an array. I'd like it to wait until i've written "hello my" + spacebar and then create an array with those two numbers.
I am guessing this has something to do with the regular expression?
I've tried many different things (a little bit of a newbie) and i cannot understand how to get it to group 2 words together rather than one.
const count = (text) => {
const wordRegex = new RegExp(`([\\p{Alphabetic}\]+)`, 'gu');
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};
You may use
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
Details
\p{Alphabetic}+ - 1+ alphabetic chars
(?:\s+\p{Alphabetic}+)? - an optional sequence of:
\s+ - 1+ whitespaces
\p{Alphabetic}+ - 1+ alphabetic chars
The second word is matched optionally so that the final odd word could be matched, too.
See the JS demo below:
const count = (text) => {
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};
console.log(count("abc def ghi"))
A RegExp constructor way of defining this regex is
const wordRegex = new RegExp("\\p{Alphabetic}+(?:\\s+\\p{Alphabetic}+)?", "gu");
However, since the pattern is static, no variables are used to build the pattern, you can use the regex literal notation as shown at the top of the answer.

Javascript toArray to avoid cutting characters

I have a quick question.
This is my code, and problem with it is when i run some emojis thro this it displays them as ?, cause it cuts the emoji in half.
angular.module('Joe.filters').filter("initials", function() {
return function(string) {
if (string) {
var words = string.split(" ");
if (words.length) {
string = words[0].charAt(0);
if (words[1]) {
string += words[1].charAt(0);
}
}
}
return string;
};
});
Now, im thinking if i can solve this with toArray, and if yes, how?
Note: if i run in console the "fix" with array.
j = 'πŸ“Œ';
"πŸ“Œ"
j.length
2
s = _.toArray(j)
["πŸ“Œ"]
s.length
1
Thanks in advance!!
In ES6, .charAt and [indexing] still work with 16-bit units, but String.iterator is aware of 32-bit chars. So, to extract the first char which is possibly beyond the plane 0, you have to force iteration on the string, for example:
word = 'πŸ“ŒHELLO';
let b = [...word][0]
// or
let [c] = word
console.log(b, c)
Another option is to extract the first code point and convert it back to a character:
let a = String.fromCodePoint(word.codePointAt(0))
To answer the bonus question, I have this rather trivial function in my "standard repertoire"
let first = ([a]) => a
Using this func, your initials logic can be written as
let initials = str => str.split(' ').slice(0, 2).map(first).join('')

Regex replace if matching group not preceded by `\` exept if preceded by `\\`

My Goal
What I want to do is something similar to this:
let json_obj = {
hello: {
to: 'world'
},
last_name: {
john: 'smith'
},
example: 'a ${type}', // ${type} -> json_obj.type
type: 'test'
}
// ${hello.to} -> json_obj.hello.to -> "word"
let sample_text = 'Hello ${hello.to}!\n' +
// ${last_name.john} -> json_obj.last_name.john -> "smith"
'My name is John ${last_name.john}.\n' +
// ${example} -> json_obj.example -> "a test"
'This is just ${example}!';
function replacer(text) {
return text.replace(/\${([^}]+)}/g, (m, gr) => {
gr = gr.split('.');
let obj = json_obj;
while(gr.length > 0)
obj = obj[gr.shift()];
/* I know there is no validation but it
is just to show what I'm trying to do. */
return replacer(obj);
});
}
console.log(replacer(sample_text));
Until now this is pretty easy to do.
But if $ is preceded by a backslash(\) I don't want to replace the thing between brackets. For example: \${hello.to}would not be replaced.
The problem grows up when I want to be able to escape the backslashes. What I mean by escaping the backslashes is for example:
\${hello.to} would become: ${hello.to}
\\${hello.to} would become: \world
\\\${hello.to} would become: \${hello.to}
\\\\${hello.to} would become: \\${hello.to}
etc.
What I've tried?
I didn't try many thing so far cause I've absolutely no idea how to achieve that since from what I know there is no lookbehind pattern in javascript regular expressions.
I hope the way I explained it is clear enoughto be understood andI hope someone has a solution.
I recommend you to solve this problem in separate steps :)
1) First step:
Simplify backslashes of your text replacing all occurrences of "\\" for "". This will eliminate all redundancies and make the token replacement part easier.
text = text.replace(/\\\\/g, '');
2) Second step:
To replace the tokens of the text, use this regex: /[^\\](\${([^}]+)})/. This one will not permit tokens that have with \ before them. Ex: \${hello.to}.
Here is you code with the new expression:
function replacer(text) {
return text.replace(/[^\\](\${([^}]+)})/, (m, gr) => {
gr = gr.split('.');
let obj = json_obj;
while(gr.length > 0)
obj = obj[gr.shift()];
/* I know there is no validation but it
is just to show what I'm trying to do. */
return replacer(obj);
});
}
If you still have any problems, let me know :)

replace all commas within a quoted string

is there any way to capture and replace all the commas within a string contained within quotation marks and not any commas outside of it. I'd like to change them to pipes, however this:
/("(.*?)?,(.*?)")/gm
is only getting the first instance:
JSBIN
If callbacks are okay, you can go for something like this:
var str = '"test, test2, & test3",1324,,,,http://www.asdf.com';
var result = str.replace(/"[^"]+"/g, function (match) {
return match.replace(/,/g, '|');
});
console.log(result);
//"test| test2| & test3",1324,,,,http://www.asdf.com
This is very convoluted compared to regular expression version, however, I wanted to do this if just for the sake of experiment:
var PEG = require("pegjs");
var parser = PEG.buildParser(
["start = seq",
"delimited = d:[^,\"]* { return d; }",
"quoted = q:[^\"]* { return q; }",
"quote = q:[\"] { return q; }",
"comma = c:[,] { return ''; }",
"dseq = delimited comma dseq / delimited",
"string = quote dseq quote",
"seq = quoted string seq / quoted quote? quoted?"].join("\n")
);
function flatten(array) {
return (array instanceof Array) ?
[].concat.apply([], array.map(flatten)) :
array;
}
flatten(parser.parse('foo "bar,bur,ber" baz "bbbr" "blerh')).join("");
// 'foo "barburber" baz "bbbr" "blerh'
I don't advise you to do this in this particular case, but maybe it will create some interest :)
PS. pegjs can be found here: (I'm not an author and have no affiliation, I simply like PEG) http://pegjs.majda.cz/documentation

Why is my RegExp ignoring start and end of strings?

I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}

Categories

Resources