Regular expression for apostrophes/ single quotes with double - javascript

I'm currently working with a regular expression (in Javascript) for replacing double quotes with smart quotes:
// ie: "quotation" to “quotation”
Here's the expression I've used for replacing the double quotes:
str = str.replace(/"([A-Za-z ]*)"/ig, "“$1”")
The above works perfectly if the phrase inside the quotes contains no additional punctuation, however, I also need to replace any apostrophes:
// ie: replace "It's raining again" with “It’s raining again!”
The expression for replacing single quotes/ apostrophes works fine if not encapsulated:
str.replace(/\'\b/g, "’"); // returns it's as it’s correctly
// Using both:
str.replace(/"([A-Za-z ]*)"/ig, "“$1”").replace(/\'\b/g, "’");
// "It's raining again!" returns as "It’s raining again!"
// Ignores double quotes
I know this is because the expression for replacing the double quotes is being matched to letters only, but my limited experience with regular expressions has me flummoxed at how to create a match for quotations that may also contain single quotes!
Any help would be HUGELY appreciated! Thanks in advance.

You can include in quotes all except quotes:
str = str.replace(/"([^"]*)"/ig, "“$1”")
Another option: use non-greedy search:
str = str.replace(/"(.*?)"/ig, "“$1”")
Also I'm not sure that you need to change only single quotes that are at the end of a word. May be it were better to change all of them?
replace(/\'/g, "’");

You can search for anything not a ". I would also make a lazy match with ? in case you had something like "Hey," she said, "what's up?" as your str:
str.replace(/"([^"]*?)"/ig, "“$1”").replace(/\'\b/g, "’");

Just to add to the current answers, you are performing a match on [A-Za-z ]* for the double quote replace, which means "match uppercase, lowercase or a space". This won't match It's raining, since your match expression does not contain the single quote.
Follow the advice of matching "anything but another double quote", since with your original regex a string like She said "It's raining outside." He said "really?" will result in She said ”It's raining outside." He said "really?” (the greedy match will skip past the 'inner' double quotes.)

It's a good idea to limit the spesific characters left and right of the quotes, especially if this occurs in a html file. I am using this.
str = str.replace(/([\n >*_-])"([A-Za-z0-9 ÆØÅæøå.,:;!##]*)"([ -.,!<\n])/ig, "$1«$2»$3");
In this way, you avoid replacing quotes inside html-tags like href="http.....
Normaly, there is an space left of the opening quote, and another right of the closing quote. In html document, it might be a closing bracket, a new line, etc. I have also included the norwegian characters. :-)

Related

The second pattern of a regex not replacing apostrophe

I'm creating a regex that matches straight apostrophes and replaces them with a curly ones. Sometimes an apostrophe goes in the middle of two characters. Other times goes at the end of a character/word (e.g. ellipsis').
So I have two regexes that handle both situations (separated by an or statement).
However, only the first case is being replaced, not the second. In other words, this:
"Wor'd word'".replace(/(?<=\w)\'(?=\w)|(?<=\w)\'(?=\s)/, '’')
Becomes this:
"Wor’d word'"
This confuses me because both types of apostrophes are matching: https://regexr.com/4td7p
Why is this, and how to fix it?
Update: I figured the problem was that there's no space after the last apostrophe, so I changed the second part of the regex to this: (?<=\w)\'(?!\w) (don't match if there's a character after the apostrophe). But I'm getting the same result.
If you want to match (?<=\w)\' followed by a character and also match (?<=\w)\' not followed by a character, why not just drop the logic after it altogether and just use (?<=\w)'? (no need to escape 's in a regex)
You also need the global flag to replace more than one thing at a time:
console.log(
"Wor'd word'".replace(/(?<=\w)'/g, '’')
);
updated
var str = "Wor'd word' that's a good thing'";
var afterReplace = str.replace(/'\b/g, '’')
console.log(afterReplace);

Why do I have to add double backslash on javascript regex?

When I use a tool like regexpal.com it let's me use regex as I am used to. So for example I want to check a text if there is a match for a word that is at least 3 letters long and ends with a white space so it will match 'now ', 'noww ' and so on.
On regexpal.com this regex works \w{3,}\s this matches both the words above.
But on javascript I have to add double backslashes before w and s. Like this:
var regexp = new RegExp('\\w{3,}\\s','i');
or else it does not work. I looked around for answers and searched for double backslash javascript regex but all I got was completely different topics about how to escape backslash and so on. Does someone have an explanation for this?
You could write the regex without double backslash but you need to put the regex inside forward slashshes as delimiter.
/^\w{3,}\s$/.test('foo ')
Anchors ^ (matches the start of the line boundary), $ (matches the end of a line) helps to do an exact string match. You don't need an i modifier since \w matches both upper and lower case letters.
Why? Because in a string, "\" quotes the following character so "\w" is seen as "w". It essentially says "treat the next character literally and don't interpret it".
To avoid that, the "\" must be quoted too, so "\\w" is seen by the regular expression parser as "\w".

Javascript regex ends with pattern

I'm trying to write a regex in javascript to identify string representations of arbitrary javascript functions found in json, ie. something like
{
"key": "function() { return 'I am a function'; }"
}
It's easy enough to identify the start, but I can't figure out how to identify the ending double quotes since the function might also contain escaped double quotes. My best try so far is
/"\s*function\(.*\)[^"]*/g
which works nicely if there are no double quotes in the function string. The end of a json key value will end with a double quote and a subsequent comma or closing bracket. Is there some way to retrieve all characters (including newline?) until a negated pattern such as
not "/s*, and not "/s*}
... or do I need to take a completely different approach without regex?
Here's is the current test data I'm working with:
http://regexr.com/39pvi
Seems like you want something like this,
"\s*function\(.*\)(?:\\.|[^\\"])*
It matches also the inbetween \" escaped double quotes.
DEMO

Remove white spaces from everywhere except quotation marks

I'm using '/\s+/' switch in preg_replace to remove all the white spaces from Javascript code, that I pass in PHP array:
preg_replace('/\s+/', '',
("
function(uploader)
{
if(uploader.files.length > 1)
{
uploader.files.splice(1, uploader.files.length);
apprise('You can not update more than one file at once!', {});
}
}
"))
This is for an very basic Javascript minification. In PHP file (source) I can have fully readable function code, while in browser, it page body it ends up like single-line string:
function(uploader,files){console.log('[PluploadUploadComplete]');console.log(files);},'QueueChanged':function(uploader){if(uploader.files.length>1){uploader.files.splice(1,uploader.files.length);apprise('Youcannotupdatemorethanonefileatonce!',{});}}
As you can see (or expect) this does affects strings in quotation marks and produce message without spaces (like: Youcannotupdatemorethanonefileatonce!).
Is there a workaround for this problem? Can I remove all whitespaces from everywhere in my string, except for part embedded in single quotation marks?
To match space characters except within single quotes, use this:
$regex = "~'[^']*'(*SKIP)(*F)|\s+~";
You can pop that straight into your preg_replace().
For instance: $replaced = preg_replace($regex,"",$input);
Option 2: Multiple Kinds of Quotes
If you may have single or double quotes, use this:
$regex = "~(['"]).*?\1(*SKIP)(*F)|\s+~";
How Does this Work?
On the left side of the | alternation, we match full "quoted strings", then deliberately fail, and skip to the next position in the string. On the right side, we match any whitespace, and we know it is the right whitespace because it was not matched by the expression on the left.
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
You could simply capture the single quoted strings and reference it in your replacement. Be aware this will only work for a simple case of single quoted strings, not nested...
$code = preg_replace("/('[^']*')|\s+/", '$1', $code);

regular expressions explanation in javascript

Can somebody explain what this regular expression does?
document.cookie.match(/cookieInfo=([^;]*).*$/)[1]
Also it would be great if I can strip out the double quotes I'm seeing in the cookieInfo values. i.e. when cookieInfo="xyz+asd" - I want to strip out the double quotes using the above regular expression.
It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo='
Try this to eliminate the double quotes:
document.cookie.match(/cookieInfo="([^;]*)".*$/)[1]
It searches the document.cookie string for cookieInfo=.
Next it grabs all of the characters which are not ; (until it hits the first semicolon).
[...] set of all characters included inside.
[^...] set of all characters which don't match
Then it lets the RegEx search through all other characters.
.* any character, 0 or more times.
$ end of string (or in some special cases, end of line).
You could replace " a couple of different ways, but rather than stuffing it into the regex, I'd recommend doing a replace on it after the fact:
var string = document.cookie.match(...)[1],
cleaned_string = string.replace(/^"|"$/g, "");
That second regex says "look at the start of the string and see if there's a ", or look at the end of the string and see if there's a ".
Normally, a RegEx would stop after it did the first thing it found. The g at the end means to keep going for every match it can possibly find in the string that you gave it.
I wouldn't put it in the original RegEx, because playing around with optional quotes can be ugly.
If they're guaranteed to always, always be there, then that's great, but if you assume they are, and you hit one that doesn't have them, then you're going to get a null match.
The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'.
To strip out the double quotes you can use the regex /"/ and replace it with an empty string.

Categories

Resources