How to match a input parenthesis with regular expression in JavaScript? - javascript

I have no idea on matching the input parenthesis with JavaScript.
Input string example:
(pen)
((pen) and orange)
it should return false if the input string is like the following:
(pen
pen)
(pen) and orange)
((pen and orange )
((pen) and orange
)(pen and orange )(
(pen and )orange()

Regular expressions would be messy. It's much easier to go through with a simple counter.
function parenthesesBalanced(string) {
var count = 0;
for (var i = 0, l = string.length; i < l; i++) {
var char = string.charAt(i);
if (char == "(") {
// Opening parenthesis is always OK
count++;
} else if (char == ")") {
// If we're at the outer level already it's not valid
if (count == 0) return false;
count--;
}
}
return (count == 0);
}

replace every group of "left paren - some chars - right paren" with nothing, until there is no more groups. If the resulting string contains a parenthesis, the parens were not balanced.
balancedParens = function(str) {
var q;
do {
q = str;
str = str.replace(/\([^()]*\)/g, '');
} while(q != str);
return !str.match(/[()]/);
}
a = "foo ((and) bar and (baz) quux) and (blah)";
b = "(dddddd()";
alert(balancedParens(a))
alert(balancedParens(b))
http://jsfiddle.net/gvGGT/
It's not possible to match a balanced string with a single regexp in javascript, because JS dialect doesn't support recursive expressions.

It is a known hard problem to match parens with regular expressions. While it is possible, it's not particularly efficient.
It's much faster simply to iterate through the string, maintaining a counter, incrementing it every time you hit an open paren, and decrementing it every time you hit a close paren. If the counter ever goes below zero, or the counter is not zero at the end of the string, it fails.

I made a node library call balanced to make this a bit more sane, if you wanted to just get balanced outer matches you can do this
balanced.matches({source: source, open: '(', close: ')'})
my use case was a bit more complicated, and required me to do replacements and support comments.

Related

Generating acronyms using JavaScript

This code below with its function to generate acronyms was extracted from Stanford Uni's lecture notes. The code checks for every character and correctly handles strings that have leading, trailing, or multiple spaces, or even hyphens. However, I have difficulty understanding a line of code.
function acronym(str) {
let result = "";
let inWord = false;
for (let i = 0; i < str.length; i++) {
let ch = str.charAt(i);
if (isLetter(ch)) {
if (!inWord) result += ch;
inWord = true;
} else {
inWord = false;
}
}
return result;
}
function isLetter(ch) {
return ch.length === 1 &&
ALPHABET.indexOf(ch.toUpperCase()) !== -1;
}
As shown in the code above, I'm not quite sure how the "inWord" variable works. I'm not sure how it sets word boundaries that are indicated by sequences of nonletters. If you don't mind, can someone please enlighten me?
Your help is much appreciated. Thanks!
The code tries to make an acronym, i.e. take the first letter of every word to create to create a new word.
Translation of the loop:
If the current character is a letter check if boolean flag is false
If the boolean is false, add the character to the current acronym value
Set the boolean flag to true, so the other letters of the word will not be executed until a separator is found
Start from step 1 when a separator is found (non-alphabetic character).
So basically it just aggregates the first letters of every word into a new string.

Finding characters of a word in a string, optimized

I am doing doing a few code challenges in the hope to learn some new stuff. Currently I have written a piece of code that find charachter of a given word in a string of random letters.
I thought regexp might be the best performance-wise (it's one of the objectives). This code passes the checks but takes too long with absurd long strings. Is there any way I can improve this? It's really ugly as is honestly. I've tried several approaches but I am probably just really a newbie at reg exp etc.
before all the if statements I only used regexp but if str2 which is the word I am looking for had double characters it would come back 'true' because it would count already counted characters. That is why I am using replace to exclude them. That's all I could get.
the goal is to return true if a portion of str1 can be rearranged to form str2, otherwise return false. Only lower case letters will be used (a-z). No punctuation or digits will be included. for example scramble('aabbcamaomsccdd','commas') should return true
function scramble (str1, str2)
{
var o = 0; // tracks amount of matched letters.
for(i = 0; i < str2.length; i++)
{
var regex1 = new RegExp (str2[i]) ; // select letter from word that needs to be found
if( regex1.test(str1) == true)// if selected character is found us replace to remove it from the random characters string for next iteration.
{
str1 = str1.replace(regex1 ,"");
o++; // increment o if character is removed from random string.
}
}
//check if amount of removed characters equals total characters of word that we want.
if ( o == str2.length)
{
return true
}
if (o !== str2.length)
{
return false
}
}
Update: I flagged the hash table as answer because afaik this was not doable with regexp it seems also I was able to achieve a proper result with .split and loops myself plus the hash table also achieved this.
if-less methodology!
i didnt stress it on tests but looks fine
function scramble(str1, str2) {
str1 = [...str1];
return [...str2].filter((str => (str == str1.splice(str1.indexOf(str), 1)))).join('') == str2;
}
You could take a hash table, count the wanted characters and return if no count is not more necessary.
function scramble (str1, str2) {
var counter = {},
keys = 0;
for (let i = 0; i < str2.length; i++) {
if (!counter[str2[i]]) {
counter[str2[i]] = 0;
keys++;
}
counter[str2[i]]++;
}
for (let i = 0; i < str1.length; i++) {
if (!counter[str1[i]]) continue;
if (!--counter[str1[i]] && !--keys) return true;
}
return false;
}
console.log(scramble('abc', 'cba'));
console.log(scramble('abc', 'aba'));
console.log(scramble('abcdea', 'aba'));
console.log(scramble('aabbcamaomsccdd', 'commas'));

Without using the reverse() method. How do I maintain the original string order, space and punctuation on string that was reverse?

I am able to use a for loop without using a helper method to reverse the string. But, how do I maintain the original order, space, and punctuation on the string?
Without using the reverse() helper method I am able to reverse the string but I cannot maintain the order of the words and punctuations.
// Reverse preserving the order, punctuation without using a helper
function reverseWordsPreserveOrder(words) {
let reverse = '';
for (let i = words.length -1; i >= 0; i--) {
reverse += words[i];
}
return reverse;
}
console.log(reverseWordsPreserveOrder('Javascript, can be challenging.'))
// output-> .gnignellahc eb nac ,tpircsavaJ
I expect the result to be like this:
// output-> tpircsavaJ, nac eb gnignellahc.
I'd use a regular expression and a replacer function instead: match consecutive word characters with \w+, and in the replacer function, use your for loop to reverse the substring, and return it:
function reverseSingleWord(word) {
let reverse = '';
for (let i = word.length -1; i >= 0; i--) {
reverse += word[i];
}
return reverse;
}
const reverseWordsPreserveOrder = str => str.replace(/\w+/g, reverseSingleWord);
console.log(reverseWordsPreserveOrder('Javascript, can be challenging.'))
If you are trying to do it manually — no reverse() of regexs, you could:
• Defined what you mean by punctuation. This can just be a set, or using an ascii range for letters, etc. But somehow you need to be able to tell letters from non letters.
• Maintain a cache of the current word because you are not reversing the whole sentence, just the words so you need to treat them individually.
With that you can loop through once with something like:
function reverseWordsPreserveOrder(s){
// some way to know what is letter and what is punt
let punct = new Set([',',' ', '.', '?'])
// current word reversed
let word = ''
// sentence so far
let sent = ''
for (let l of s){
if (punct.has(l)) {
sent += word + l
word = ''
} else {
word = l + word
}
}
sent += word
return sent
}
console.log(reverseWordsPreserveOrder('Javascript, can be challenging.'))
Having said this, it's probably more efficient to use a regex.
If you are only averse to reverse because you think it can't do the job, here is a more semantic version (based on #CertainPerformance's), in ES6 you can use the spread syntax (...) with the word string (as strings are iterable):
function reverseSingleWord(word) {
return [...word].reverse().join('');
}
const reverseWordsPreserveOrder = str => str.replace(/\w+/g, reverseSingleWord);
console.log(reverseWordsPreserveOrder('Javascript, can be challenging.'))

How to remove string between two characters every time they occur [duplicate]

This question already has answers here:
Strip HTML from Text JavaScript
(44 answers)
removing html tags from string
(3 answers)
Closed 7 years ago.
I need to get rid of any text inside < and >, including the two delimiters themselves.
So for example, from string
<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>​
I would like to get this one
that
This is what i've tried so far:
var str = annotation.split(' ');
str.substring(str.lastIndexOf("<") + 1, str.lastIndexOf(">"))
But it doesn't work for every < and >.
I'd rather not use RegEx if possible, but I'm happy to hear if it's the only option.
You can simply use the replace method with /<[^>]*>/g.It matches < followed by [^>]* any amount of non> until > globally.
var str = '<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>';
str = str.replace(/<[^>]*>/g, "");
alert(str);
For string removal you can use RegExp, it is ok.
"<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>​".replace(/<\/?[^>]+>/g, "")
Since the text you want is always after a > character, you could split it at that point, and then the first character in each String of the array would be the character you need. For example:
String[] strings = stringName.split("<");
String word = "";
for(int i = 0; i < strings.length; i++) {
word += strings[i].charAt(0);
}
This is probably glitchy right now, but I think this would work. You don't need to actually remove the text between the "<>"- just get the character right after a '>'
Using a regular expression is not the only option, but it's a pretty good option.
You can easily parse the string to remove the tags, for example by using a state machine where the < and > characters turns on and off a state of ignoring characters. There are other methods of course, some shorter, some more efficient, but they will all be a few lines of code, while a regular expression solution is just a single replace.
Example:
function removeHtml1(str) {
return str.replace(/<[^>]*>/g, '');
}
function removeHtml2(str) {
var result = '';
var ignore = false;
for (var i = 0; i < str.length; i++) {
var c = str.charAt(i);
switch (c) {
case '<': ignore = true; break;
case '>': ignore = false; break;
default: if (!ignore) result += c;
}
}
return result;
}
var s = "<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>";
console.log(removeHtml1(s));
console.log(removeHtml2(s));
There are several ways to do this. Some are better than others. I haven't done one lately for these two specific characters, so I took a minute and wrote some code that may work. I will describe how it works. Create a function with a loop that copies an incoming string, character by character, to an outgoing string. Make the function a string type so it will return your modified string. Create the loop to scan from incoming from string[0] and while less than string.length(). Within the loop, add an if statement. When the if statement sees a "<" character in the incoming string it stops copying, but continues to look at every character in the incoming string until it sees the ">" character. When the ">" is found, it starts copying again. It's that simple.
The following code may need some refinement, but it should get you started on the method described above. It's not the fastest and not the most elegant but the basic idea is there. This did compile, and it ran correctly, here, with no errors. In my test program it produced the correct output. However, you may need to test it further in the context of your program.
string filter_on_brackets(string str1)
{
string str2 = "";
int copy_flag = 1;
for (size_t i = 0 ; i < str1.length();i++)
{
if(str1[i] == '<')
{
copy_flag = 0;
}
if(str1[i] == '>')
{
copy_flag = 2;
}
if(copy_flag == 1)
{
str2 += str1[i];
}
if(copy_flag == 2)
{
copy_flag = 1;
}
}
return str2;
}

Why is my RegExp ignoring start and end of strings?

I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}

Categories

Resources