indexOf("-") returning - 1 every time

indexOf("-") returning - 1 every time - javascript

var shortLocString = locationString;
console.log("Index of dash in '" + shortLocString + "': " + shortLocString.indexOf('-'));
prints this every time:
Index of dash in '325–333 Avenue C, New York, NY': -1
Why do I feel like this is some stupid thing I'm missing at the end of a long day?
Thanks for the help!
Mike

You have some sort of not-hyphen character in your string. I'm not sure which one; there are a bunch that look like that.

There are many hyphen-like characters available.
If you type in "-", it might not be the same as the character which is being put in the string you're testing.
If you have the string you want to test (ie: that one), an easy solution might be to copy the dash which exists inside of the string, paste it into your .indexOf and try again.
For a more-robust solution, get the Unicodes for all hyphens (include the actual minus sign), and if you're dealing with phone-numbers or serial-numbers which have to be formatted the same way, every time, then manually do a RegEx replace of any of those dashes, into the format you will use for your checks.
Also, be prepared for headaches with MS Word or other rich-text editors.
They will swap your " for fancy quotes in a heartbeat, and then all of your strings are broken. They may also use different character-mappings for accents, et cetera.

It's because there's more than one type of dash - you're checking for a hyphen, but the address contains an en dash.
console.log("Index of dash in '" + shortLocString + "': " + shortLocString.indexOf('\u2013'));
should print
Index of dash in '325–333 Avenue C, New York, NY': 3
If your addresses are likely to include em and/or en dashes you'll want to check for the characters \u2013 and \u2014

Related

Modify (bold) all words before a certain word

I have a list of near 1000 awards to appear on a site. Each award is in it's own span, and follows the format
x for y
example:
<span>Broadcast Film Critics Association Award for Best Director</span>
For each of these spans, I would like to bold all the text before "for". How can I search for an unlimited possible number of words before (and not including) the word(not just characters) "for", and bold them?
I know with an expression like
\S+\s+\S+\s+for
I am searching up to 2 words before (and including) the characters f, o, and r. But I want to match the word "for", and not just the characters, and I don't want to include "for" in what is being bolded.

Regex would seem to be the best solution here, however I would suggest using a Regex which matches groups so you can rebuild the string when adding the <b> tags in. Try this:
$('span').html(function(i, v) {
var matches = /(.+)(for.+)/gi.exec(v);
return '<b>' + matches[1] + '</b>' + matches[2];
});
Example fiddle

This assumes that everything before the very first for is what you need to match. If you worry about more than one for in the string, you'll need a different solution.
(.*?\b)for
This is "reluctantly match and capture everything up to the first word break followed by for."
http://rubular.com/r/OLNZjkA2ck
You could also use
.*?\b(?=for)

You can also do regex replace to achieve it as below.
$('span').html(function(index, content) {
return content.replace(/(.+)(for.+)/gi, '<strong>$1</strong>$2');
});
Demo#Fiddle

Logic on a Complicated Regex on My Napkin Math Parser

So I'm writing my own napkin math parser[1] for fun. Napkin math is a part of a custom text editor I am building - when you press space after an equals sign it will detect the equation prior to the equals sign and does the math for you. E.g., you type 1+1= and it kicks out the 2 for you magically so you have 1+1=2 on the screen.
I'm really struggling with getting a regex to match equations prior to the equals sign. My brain is shot and I am in desparate need of help from a lord of regex. Below are my test strings, with the highlighted sections being what I want the regex to match.
Particularly, I'm having trouble getting the regex to match the true start of an equation. My current regex gets thrown off if there are numbers or words before the equation starts. I feel like I need to somehow work backwards from the equals sign. I started reversing my string characters around and that's when I threw my hands up and came to you for help.
My test strings with desired matches (the true equation part of the string):
test text (1) pi=
test text (1 pi=
test text 1) pi=
test text (2) pi + 10 / 20=
test text (3) test pi ^ 10 / 20=
test text (30) 10 + 5=
test text (500) abs(10 + 5)=
test text (1) pi + 10 / 20=
test text 10*5=
test text pi / phi=
test text 10 mod phi=
test text 50 10 mod phi=
test text pi mod abs(phi)=
apple banana cherry apple 10 apple cherry banana hi 10 99+1=
Here are all of my special key words allowed and useful in napkin math:
var napkinMathKeyWords = [
'pi',
'\\u03C0',
'\\u03A0',
'phi',
'\\u03C6',
'\\u03A6',
'abs',
'acos',
'cos',
'atan',
'tan',
'asin',
'sin',
'ceil',
'floor',
'sqrt',
'log',
'ln',
'mod',
];
EDIT
Here's the regex I've got so far:
(\d|pi|\\u03C0|\\u03A0|phi|\\u03C6|\\u03A6|abs|acos|cos|atan|tan|asin|sin|ceil|floor|sqrt|log|ln|mod).*?=
It's hitting most of my cases, there are just a couple of cases throwing it off:
http://i.imgur.com/QbueeFC.png
[1] http://blogs.msdn.com/b/onenotetips/archive/2008/05/09/napkin-math-in-onenote.aspx

As you implied, this regex gets a little messy. But I have one that works (see regex101). On regex101, I used the PCRE mode, but only so I could use the "extended" option to make the regex more readable. In JavaScript, just collapse all the line-breaks and spaces and it will work.
I realized that your list of keywords contained two distinct groups:
keywords that are functions, to be followed by operations in parens
var regex_function = "(abs|acos|cos|atan|tan|asin|sin|ceil|floor|sqrt|log|ln)\([^)]+)";
keywords that are constants (stand-alone)
var regex_constant = "(\d+|pi|\\u03C0|\\u03A0|phi|\\u03C6|\\u03A6)";
Then the combination (alternation) of the above parts represents an operand:
var regex_operand = "(" + regex_constant + "|" + regex_function + ")";
Each operand must have an operator in-between:
var regex_operator = "([+\-*\/^]|mod)";
The entire regex can be put together like this:
var regex = new RegExp(regex_operand + "(\s*" + regex_operator + "\s*" + regex_operand + ")*\s*=");
Basically, you have an operand followed by any number of operator/operand pairs, then the equal sign.

Looks to me like your best bet is asserting that spaces must always be followed by some sort of operator . For instance, "5 5 =" is invalid while "5 + 5 =" and "5 =" are valid. Thus, I'd say your regex should look something like this:
([numbers and special names like pi][ ]?[operators][ ]?)*[numbers and special names like pi][ ]?\=[ ]?
I may have written the spaces wrong, what I mean to say is that they can appear at most one time. The other things in brackets would just be big or statements. Parentheses might get tricky, if I get the chance I'll edit my answer to properly handle them.

RegEx: Match odd amount of repeat between other characters in each line

I was wondering how to match and replace an odd amount of slashes (\) in every line at javascript.
They are used in escaping a string, but sometimes the string is wrapped into lines, so the slash have to move to the next line.
Here is an example: http://regex101.com/r/iI9vO9
I want to match the lines which are marked via "Yes" and ignore the lines marked with "No".
For Example:
"Yes 1\" +
"No 2\\" +
"Yes 3\\\" +
"No 4\\\\" +
"No"
Should be changed to:
"Yes 1" +
"\No 2\\" +
"Yes 3\\" +
"\No 4\\\\" +
"No"
Notice there is characters before and after the slashes in each line, and the slash is moved to the next line when it is repeated an odd time.
I couldn't get it working with (\\)(\\\\)* or look-around.
This is what I have in mind if this work:
text.replace(/([^\\])\\" \+ \n"(.)/gm, '$1\\$2"+ \n "')
If this is not possible with RegEx, I would appreciate any other way to make this possible.
Thanks for your help.
EDIT:
For whoever look this up on Google, this is exactly what solves the problem:
text.replace(/([^\\])((\\{2})*)\\" \+ \n"/g, '$1$2" + \n"\\')
http://jsfiddle.net/5mGWF/1/

This seems to do what you want:
text = text.replace(/([^\\])((\\{2})*)\\\n/g, "$1$2\n\\")
http://jsfiddle.net/5mGWF/

Line continuation characters in JavaScript

What is the best practice for line continuation in JavaScript? I know that you can use \ for strings. But how would you split the following code?
var statement = con.createStatement("select * from t where
(t.a1 = 0 and t.a2 >=-1)
order by a3 desc limit 1");

If I properly understood your question:
var statement = con.createStatement('select * from t where '
+ '(t.a1 = 0 and t.a2 >=-1) '
+ 'order by a3 desc limit 1');
For readability, it is fine to align + operator on each row:
Anyway, unless you're using Ecmascript 2015, avoid to split a multiline string with \, because:
It's not standard JavaScript
A whitespace after that character could generate a parsing error

I like using backslashes for JavaScript line continuation, like so:
// validation
$(".adjustment, .info input, .includesAndTiming input, \
.independentAdj, .generalAdj, .executiveAdj \
#officeExpense, #longDistanceExpense, #digitalImages, #milesReimbursment, #driveTime, #statementTranscription").keypress(function (event) {

My personal preference is similar to your first response there, but for my eyes its readability is easier:
var statement = con.createStatement
(
'select * from t where ' +
'(t.a1 = 0 and t.a2 >=-1) ' +
'order by a3 desc limit 1'
);
It strikes a close similarity to the SQL syntax format I've been using for nearly 20 years:
SELECT *
FROM t
WHERE
t.a1 = 0 AND
t.a2 >=-1
ORDER BY a3 DESC
LIMIT 1
Keeping the continuation (+ in JavaScript or AND in SQL) on the far right permits the eye to slide evenly down the left edge, checking lvalues & syntax. That's slightly more difficult to do with the continuation on the left - not important unless you do a LOT of this stuff, at which point every calorie you spend is a calorie that might've been saved by a slight improvement in format.
Since this query is so simple, breaking it all out to SQL format is wasteful of space and bandwidth, which is why the suggested JavaScript is on six lines instead of ten. Collapsing the curlies up one line each brings it to four lines, saving whitespace. Not quite as clear or as simple to edit, though.

The "+" is for concatenation of strings and most of the examples are dealing with strings. What if you have a command you need to string across multiple lines, such as a compound "if" statement? You need a backslash at the end of each line that is to be continued. This escapes the invisible next line character so that it will not delimit the command in mid statement.

Regex to strip BBCode

I need a regular expression to strip out any BBCode in a string. I've got the following (and an array with tags):
new RegExp('\\[' + tags[index] + '](.*?)\\[/' + tags[index] + ']');
It picks up [tag]this[/tag] just fine, but fails when using [url=http://google.com]this[/url].
What do I need to change? Thanks a lot.

I came across this thread and found it helpful to get me on the right track, but here's an ultimate one I spent two hours building (it's my first RegEx!) for JavaScript and tested to work very well for crazy nests and even incorrectly nested strings, it just works!:
string = string.replace(/\[\/?(?:b|i|u|url|quote|code|img|color|size)*?.*?\]/img, '');
If string = "[b][color=blue][url=www.google.com]Google[/url][/color][/b]" then the new string will be "Google". Amazing.
Hope someone finds that useful, this was a top match for 'JavaScript RegEx strip BBCode' in Google ;)

You have to allow any character other than ']' after a tag until you find ' ]'.
new RegExp('\\[' + tags[index] + '[^]]*](.*?)\\[/' + tags[index] + ']');
You could simplify this to the following expression.
\[[^]]*]([^[]*)\[\\[^]]*]
The problem with that is, that it will match [WrongTag]stuff[\WrongTag], too. Matching nested tags requires using the expression multiple times.

You can check for balanced tags using a backreference:
new RegExp('\\[(' + tags.Join('|') + ')[^]]*](.*?)\\[/\\1]');
The real problem is that you cant't match arbitrary nested tags in a regular expression (that's the limit of a regular language). Some languages do allow for recursive regular expressions, but those are extensions (that technically make them non-regular, but doesn't change the name that most people use for the objects).
If you don't care about balanced tags, you can just strip out any tag you find:
new RegExp('\\[/?(?:' + tags.Join('|') + ')[^]]*]');

To strip out any BBCode, use something like:
string alltags = tags.Join("|");
RegExp stripbb = new RegExp('\\[/?(' + alltags + ')[^]]*\\]');
Replace globally with the empty string. No extra loop necessary.

I had a similar problem - in PHP not Javascript - I had to strip out BBCode [quote] tags and also the quotes within the tags. Added problem in that there is often arbitrary additional stuff inside the [quote] tag, e.g. [quote:7e3af94210="username"]
This worked for me:
$post = preg_replace('/[\r\n]+/', "\n", $post);
$post = preg_replace('/\[\s*quote.*\][^[]*\[\s*\/quote.*\]/im', '', $post);
$post = trim($post);
lines 1 and 3 are just to tidy up any extra newlines, and any that are left over as a result of the regex.

I think
new RegExp('\\[' + tags[index] + '(=[^\\]]+)?](.*?)\\[/' + tags[index] + ']');
should do it. Instead of group 1 you have to pick group 2 then.

Remember that many (most?) regex flavours by default do not let the DOT meta character match line terminators. Causing a tag like
"[foo]dsdfs
fdsfsd[/foo]"
to fail. Either enable DOTALL by adding "(?s)" to your regex, or replace the DOT meta char in your regex by the character class [\S\s].

this worked for me, for every tag name. it also supports strings like '[url="blablabla"][/url]'
str = str.replace(/\[([a-z]+)(\=[\w\d\.\,\\\/\"\'\#\,\-]*)*( *[a-z0-9]+\=.+)*\](.*?)\[\/\1\]/gi, "$4")

Develop Reference

JavaScript is the programming language of the Web.

indexOf("-") returning - 1 every time - javascript

You have some sort of not-hyphen character in your string. I'm not sure which one; there are a bunch that look like that.

Related

Modify (bold) all words before a certain word

Logic on a Complicated Regex on My Napkin Math Parser

RegEx: Match odd amount of repeat between other characters in each line

Line continuation characters in JavaScript

Regex to strip BBCode

Categories

Resources