I want to identify whether, in a string, there is no whitespace after "."
I tried a nested if statement, but it doesn't work. I guess I am missing something really simple.
Also, I read Regex might do this, but I couldn't wrap my head around the syntax.
(function() {
'use strict';
var invocationInitial = document.getElementById('spokenNames');
if(invocationInitial) {
var invocation = invocationInitial.innerHTML.trim();
}
var counter = 1;
var message = '';
if(invocation.indexOf('.') !== -1) {
if(/\s/.test(invocationInitial) === false)
{
message = counter + ". No dot in string without subsequent whitespace";
counter = counter +1;
}
}
if(message) {
alert(message);
}
})();
A browser warning ("message") should be displayed if "invocationInitial" does not have every every occuring dot (".") followed by a whitespace.
var counter is introduced here, because in the full version, multiple browser warnings will be shown based on different conditions.
The RegEx you need here is pretty simple: /\.\S/ . That says "match a dot not followed by a whitespace character". Note that \s means "match a whitespace character" while \S (capital S) means "match anything that is NOT a whitespace character".
So you can simply do this:
if (/\.\S/.test(invocation)) {
// There's a dot followed by non-whitespace!
}
else {
// There is no dot followed by non-whitespace.
}
Related
Hi, I'm trying to create a javascript code editor, I assume that you know that how string data types are created in javascript, so i want every word inside the symbol (") "double quotes" or (') 'single quotes' into the i html tag. so i made a chaos situation and try to match some words that should be inside the quotes like my expectations.this is my string string :
`that's "that's" 'that's'. Hanna "loves" 'ice "cream`
expect output like this:
that's <i>that's</i> <i>that's</i>. Hanna <i>loves</i> ice cream
let str_ = `that's "that's" 'that's'. Hanna "loves" 'ice "cream`;
// expected output :
// that's <i>that's</i> <i>that's</i>. Hanna <i>loves</i> ice cream
An approach which takes from the OP's example for granted that the beginning and the end of a valid quotation is always accompanied by at least one leading and at least one trailing space might come up with a regex like this /(?<space>\s+)(?<quote>['"])(?<text>\S+)\2/gm. It uses capturing groups in its named form. The following regex is identical to the first one, just without the naming ... /(\s+)(['"])(\S+)\2/gm. It reads like this ...
1st Capturing Group (\s+)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (['"])
Match a single character present in the list below ['"]
'" matches a single character in the list (case sensitive)
3rd Capturing Group (\S+)
\S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\2 matches the same text as most recently matched by the 2nd capturing group
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
For each match the replacer callback function of String.prototype.replace does provide through its arguments the matched/captured data which will be necessary for the returned replacement value. In this case one can access (match, space, quote, text) which makes it possible to return ${ space }<i>${ text }</i> as computed replacement value.
Since this replacement does not sanitize any crippled/invalid quotes one has to go after them with an extra replacement (based on /(\s+)(['"])/gm) as shown with the loggings of the beneath example code.
// // [https://regex101.com/r/frH2xQ/1]
// const regXSpacedAndQuotedChars = (/(?<space>\s+)(?<quote>['"])(?<text>\S+)\2/gm);
// [https://regex101.com/r/frH2xQ/2]
const regXSpacedAndQuotedChars = (/(\s+)(['"])(\S+)\2/gm);
// [https://regex101.com/r/frH2xQ/3/]
const regXSpacedAndSoleQuote = (/(\s+)(['"])/gm);
console.log(
`that's "that's" 'that's'. Hanna "loves" 'ice "cream`
.replace(regXSpacedAndQuotedChars, (match, space, quote, text) => `${ space }<i>${ text }</i>`)
);
console.log(
`that's "that's" 'that's'. Hanna "loves" 'ice "cream`
.replace(regXSpacedAndQuotedChars, (match, space, quote, text) => `${ space }<i>${ text }</i>`)
.replace(regXSpacedAndSoleQuote, (match, space, quote) => space)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Regular expressions are indeed powerful and handy, but for "serious business" is it often recommended to use more robust approach and avoid regexp for important heavy-lifting (*), i.e. best to implement something like simple state machine parser.
As an exercise I tried to make something akin this approach, with rules abstracted from OP's examples, where only "root level, words enclosing quotes" are considered relevant and "dangling" quotes are discarded. So rules are:
first occurrence of any quote after non-word character enters "quote" state where:
other quotes are transferred verbatim,
same quotes not followed by non-word character are transferred verbatim
same quote followed by non-word character ends quote and exits quote state.
Opened but unfinished quote is discarded. (This one breaks OP's use case, because there are discarded all dangling "nested" quotes, what would IMO massively complicate the state machine. And I've left it to make this case more apparent. Such occurrence is detected and reported in the console.)
For sake of example I've deliberately avoided regexp even for definition of non-word character, which I've kept short (space and ,.!?\/-, plus quotes " and ' that are handled separately).
Please take this more as an POC: I haven't written any FSM before, so most probably it is not a best learning material.
// #ts-check
function state_machine(input_str) {
var input_buffer = [...input_str];
var out = [];
var char = '';
var start_quote_adept_index = -1;
var quote_kind;
function is_word_char() {
if (!char) {
return false;
}
if (" ,.!?\\/-".includes(char)) {
return false;
}
return true;
}
function is_quote() {
if (char == "'" || char == '"') {
return true
}
return false
}
function add(ch) {
out.push(ch);
}
var states = {
outside_word: function() {
if (is_quote()) {
state = states.in_quote;
quote_kind = char;
start_quote_adept_index = i;
add('');
} else if (is_word_char()) {
state = states.in_word
add(char);
} else {
add(char);
}
},
in_quote: function() {
if (char == quote_kind) {
state = states.maybe_after_quote;
} else {
add(char);
}
},
maybe_after_quote: function() {
if (is_word_char()) {
// false alarm: quote inside word
state = states.in_quote;
add(quote_kind);
add(char);
} else {
// yup, "proper" quote
state = states.outside_word;
out[start_quote_adept_index] = '<i>';
add('</i>');
start_quote_adept_index = -1;
add(char);
}
},
in_word: function() {
if (is_word_char()) {
add(char);
} else {
state = states.outside_word;
add(char);
}
}
}
var state = states.outside_word;
var i = -1;
while (++i <= input_buffer.length) {
// <= for one extra assignment of `undefined` for finishing state
char = input_buffer[i];
state();
}
if (start_quote_adept_index > -1) {
console.warn('Non-balanced quote `%s` at index %s', quote_kind, start_quote_adept_index);
var err_info = [...input_str];
err_info.splice(start_quote_adept_index + 1, 0, '%c');
err_info.splice(start_quote_adept_index, 0, '%c');
console.info(err_info.join(''), 'border: 1em solid red');
}
return out.join('');
}
input.oninput()
<input id="input" oninput="console.clear();output.value=state_machine(this.value)" size="80" value="that hat's "that hat's" 'that hat's'. Hanna "loves" '' 'ice "cream">
<br><output id="output" style="white-space: pre-wrap"></output>
Notice the handling of the last " from initial value violates OP's rules. It is already in "single quote" state (that is unfinished thus discarded, so no ' get to output) and where double quotes are simply considered as word characters, so " slips into output.
OTOH, this approach handles words and even sentences inside quotes without problem.
Core of this approach is in the loop that gradually invokes state function for each character from input string (plus one extra) that checks the character, eventually ads it to the output buffer and sets next state. For opening quotes it just saves its index and adds '' to the output until there is matching closing counterpart, then opening tag is inserted to saved index position. I'm not sure it is best practice, but I think doing it any other way would introduce unnecessary look-arounds. This way it stays in (I think) desirable O(n) land.
(*) Main advantages of simple FSM over regexp are mainly prevention of various catastrophic backtracking dangers and really difficult readability and 'debugability' of regex literals.
this is my previous solution. but i found the best solution in the
Peter Seliger answer. Thank you
function isValidQuotes(word, dir_){
const wlength = word.length;
for(let n=0; n<wlength; n++){
let a = (dir_==='<') ? wlength-n-1 : (dir_==='>') ? n:false;
if(a === false) return false;
if( isLetter(word[a]) === false ){
if( ["'",'"'].indexOf(word[a]) !== -1 ){ return true; }
}else{
return false;
}
}
}
function isLetter(n) {
return (n.match(/[A-Za-z0-9]/i) !== null && n !== ' ') ?true:false;
}
let str_ = `that's "that's" 'that's'. Hanna "loves" 'ice "cream`;
// expected output :
// that's <i>that's</i> <i>that's</i>. Hanna <i>loves</i> ice cream
let res = "";
for( let a of str_.split(" ") ){
if(
isValidQuotes(a, '>') === true && isValidQuotes(a, '<') === true
){
res += '<b>'+ a +'</b> ';
}else{
res += a+' ';
}
}
console.log(res);
It's not what you want but it makes sense https://regex101.com/r/PdsWVL/2.
Basically, go through the string and find the first occurrence of either a " or a ' and then continue until you find a closing one, then repeat from that position.
Who can help me with the following
I create a rule with regex and I want remove all characters from the string if they not allowed.
I tried something by myself but I get not the result that I want
document.getElementById('item_price').onkeydown = function() {
var regex = /^(\d+[,]+\d{2})$/;
if (regex.test(this.value) == false ) {
this.value = this.value.replace(regex, "");
}
}
The characters that allowed are numbers and one komma.
Remove all letters, special characters and double kommas.
If the user types k12.40 the code must replace this string to 1240
Who can help me to the right direction?
This completely removes double occurrences of commas using regex, but keeps single ones.
// This should end up as 1,23243,09
let test = 'k1,23.2,,43d,0.9';
let replaced = test.replace(/([^(\d|,)]|,{2})/g, '')
console.log(replaced);
I don't believe there's an easy way to have a single Regex behave like you want. You can use a function to determine what to replace each character with, though:
// This should end up as 1232,4309 - allows one comma and any digits
let test = 'k12,3.2,,43,d0.9';
let foundComma = false;
let replaced = test.replace(/(,,)|[^\d]/g, function (item) {
if (item === ',' && !foundComma) {
foundComma = true;
return ',';
} else {
return '';
}
})
console.log(replaced);
This will loop through each non-digit. If its the first time a comma has appeared in this string, it will leave it. Otherwise, if it must be either another comma or a non-digit, and it will be replaced. It will also replace any double commas with nothing, even if it is the first set of commas - if you want it to be replaced with a single comma, you can remove the (,,) from the regex.
I may just be being thick here but I don't understand why I am receiving this error. Outside of the function the .test() works fine. But inside, I get the error. Was thinking it was something to do with the scope of the .test() function but am I just missing something blindingly obvious here?
function cFunctionfirst() {
firstField = document.getElementById("sname_input_first").value;
document.getElementById("demo").innerHTML = "first: " + firstField;
console.log(firstField);
var regex = "!##$£%^&*()+=[]\\\';,./{}|\":<>?";
if(regex.test(firstField)){
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
That's because regex is not a RegExp object, but just a string. It should be declared as such (remember to escape special characters using \):
var regex = /[!##\$£%\^&\*\(\)\+=\[\]\\\';,\.\/\{\}\|":<>\?]/;
Not only have I escaped some special regex characters, but you will need to wrap the entire selection inside unescaped [ and ] brackets, so that you test against a set of characters.
p/s: These are the set characters that need to be escaped: \ ^ $ * + ? . ( ) | { } [ ]
See proof-of-concept example:
function cFunctionfirst(value) {
var regex = /[!##\$£%\^&\*\(\)\+=\[\]\\\';,\.\/\{\}\|":<>\?]/;
if(regex.test(value)){
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
cFunctionfirst('Legal string');
cFunctionfirst('Illegal string #$%');
Alternatively, if you don't want to manually escape the characters, you can either use a utility method to do it, or use an ES6 non-regex approach, which is probably a lot less efficient: checkout the JSPerf test I have made. Simply add the blacklisted characters literally in a string, split it, and then use Array.prototype.some to check if the incoming string contains any of the blacklisted characters:
function cFunctionfirst(value) {
var blacklist = '!##$£%^&*()+=[]\\\';,./{}|":<>?'.split('');
if (blacklist.some(char => value.includes(char))) {
console.log('illegal characters used');
} else {
console.log('Character okay');
};
};
cFunctionfirst('Legal string');
cFunctionfirst('Illegal string #$%');
This should pass the condition:
syntax_search = (){return 0;}
syntax_search = ( ) { fsf return 0;}
syntax_search = ( ) { return 0; }
syntax_search = (){ return; }
syntax_search = (){ if(x){ sdfsdf } return 0;}
syntax_search = (){ char x[20]; return };
It is not passing all the combinations above, What is the right way?
if( /^\s*(\s*)\s*{[\s\S]*\s+return\s*[0-9]*\s*;\s*}\s*/.test(syntax_search) )
You regular expression contains many unneeded complexities and there are some characters that need escaping such as { and }.
Anyway you can use this modified version of your regex and it should work.
^\s*\(\s*\)\s*\{(.*(return\s*\d*\s*;)\s*)\}\s*;?$
// ^
// |
// There was a ? here
Regex 101 Demo
Some issues:
As M42 pointed out, you need to escape the curly brackets
The parentheses at the begining also need to be escaped (otherwise you are defining a capture group)
"return" is required by the expression. Your first 2 test cases don't contain the word return and will fail. Is that on purpose?
Same as #3 for ;.
[\s\S]* Anything which is a space and everything which isn't. Replace by a dot .* If you need to also match a newline, use [^]*
This regex is not anchored to the end of the string so it will allow invalid strings. (You can put anything you want after the last }
/^\s*(\s*)\s*{[^]return\s\d*\s*;\s*}\s*$/
I made this helper function to find single words, that are not part of bigger expressions
it works fine on any word that is NOT first or last in a sentence, why is that?
is there a way to add "" to regexp?
String.prototype.findWord = function(word) {
var startsWith = /[\[\]\.,-\/#!$%\^&\*;:{}=\-_~()\s]/ ;
var endsWith = /[^A-Za-z0-9]/ ;
var wordIndex = this.indexOf(word);
if (startsWith.test(this.charAt(wordIndex - 1)) &&
endsWith.test(this.charAt(wordIndex + word.length))) {
return wordIndex;
}
else {return -1;}
}
Also, any improvement suggestions for the function itself are welcome!
UPDATE: example: I want to find the word able in a string, I waht it to work in cases like [able] able, #able1 etc.. but not in cases that it is part of another word like disable, enable etc
A different version:
String.prototype.findWord = function(word) {
return this.search(new RegExp("\\b"+word+"\\b"));
}
Your if will only evaluate to true if endsWith matches after the word. But the last word of a sentence ends with a full stop, which won't match your alphanumeric expression.
Did you try word boundary -- \b?
There is also \w which match one word character ([a-zA-Z_]) -- this could help you too (depends on your word definition).
See RegExp docs for more details.
If you want your endsWith regexp also matches the empty string, you just need to append |^$ to it:
var endsWith = /[^A-Za-z0-9]|^$/ ;
Anyway, you can easily check if it is the beginning of the text with if (wordIndex == 0), and if it is the end with if (wordIndex + word.length == this.length).
It is also possible to eliminate this issue by operating on a copy of the input string, surrounded with non-alphanumerical characters. For example:
var s = "#" + this + "#";
var wordIndex = this.indexOf(word) - 1;
But I'm afraid there is another problems with your function:
it would never match "able" in a string like "disable able enable" since the call to indexOf would return 3, then startsWith.test(wordIndex) would return false and the function would exit with -1 without searching further.
So you could try:
String.prototype.findWord = function (word) {
var startsWith = "[\\[\\]\\.,-\\/#!$%\\^&\*;:{}=\\-_~()\\s]";
var endsWith = "[^A-Za-z0-9]";
var wordIndex = ("#"+this+"#").search(new RegExp(startsWith + word + endsWith)) - 1;
if (wordIndex == -1) { return -1; }
return wordIndex;
}