I'm using '/\s+/' switch in preg_replace to remove all the white spaces from Javascript code, that I pass in PHP array:
preg_replace('/\s+/', '',
("
function(uploader)
{
if(uploader.files.length > 1)
{
uploader.files.splice(1, uploader.files.length);
apprise('You can not update more than one file at once!', {});
}
}
"))
This is for an very basic Javascript minification. In PHP file (source) I can have fully readable function code, while in browser, it page body it ends up like single-line string:
function(uploader,files){console.log('[PluploadUploadComplete]');console.log(files);},'QueueChanged':function(uploader){if(uploader.files.length>1){uploader.files.splice(1,uploader.files.length);apprise('Youcannotupdatemorethanonefileatonce!',{});}}
As you can see (or expect) this does affects strings in quotation marks and produce message without spaces (like: Youcannotupdatemorethanonefileatonce!).
Is there a workaround for this problem? Can I remove all whitespaces from everywhere in my string, except for part embedded in single quotation marks?
To match space characters except within single quotes, use this:
$regex = "~'[^']*'(*SKIP)(*F)|\s+~";
You can pop that straight into your preg_replace().
For instance: $replaced = preg_replace($regex,"",$input);
Option 2: Multiple Kinds of Quotes
If you may have single or double quotes, use this:
$regex = "~(['"]).*?\1(*SKIP)(*F)|\s+~";
How Does this Work?
On the left side of the | alternation, we match full "quoted strings", then deliberately fail, and skip to the next position in the string. On the right side, we match any whitespace, and we know it is the right whitespace because it was not matched by the expression on the left.
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
You could simply capture the single quoted strings and reference it in your replacement. Be aware this will only work for a simple case of single quoted strings, not nested...
$code = preg_replace("/('[^']*')|\s+/", '$1', $code);
Related
This is very similar to
Regular expression to find unescaped double quotes in CSV file
However, the solutions presented don't work with Node.js's regex engine. Given a CSV string where columns are quoted with double quotes, but some columns have unescaped double quotes in them, what regex could be used to match these unescaped quotes and just remove them.
Example rows
"123","","SDFDS SDFSDF EEE "S"","asdfas","b","lll"
"123","","SDFDS SDFSDF EEE "S"","asdfas","b","lll"
So the two double quotes surrounding the S in the third column would get matched and removed. Needs to work in Node.js (14.16.1)
I have tried (?m)""(?![ \t]*(,|$)) but get a Invalid regular expression: /(?m)""(?![ \t]*(,|$))/: Invalid group exception
I don't know much about node.js, but assuming it is like the JavaScript flavor of regex then I have the following comments about the example you took from the prior answer:
I think your example is choking on the first element, (?m) which is unsupported in Javascript. However, that part is not essential to your task. It only turns on multiline processing and you don't need that if you feed the regex engine each line individually. If you find you still want to feed it a multiline string, then you can still turn on multiline in JavaScript - you do it with the "m" flag after the final delimiter, "/myregex/m". All of the other elements, including the negative lookahead are supported by JavaScript and probably by your engine as well. So, drop the (?m) part of your expression and try it again.
Even after you get it to work, the example row you provided will not be parsed according to your expectations by the sample regular expression. Its function is to identify all occurrences of two double-quotes that are not followed by a comma (or end of string). The ONLY two occurrences of doubled quotes in your example each have a comma after, so you will get no matches on this regex in your example.
It seems like you want some context-sensitive scanning to match and remove the inner pairs of double quotes while leaving the outer ones in place and handling commas inside your strings and possibly correctly quoted double quotes. Regular expression engines are really bad at this kind of processing and I don't think you are going to get satisfactory results whatever you come up with.
You can get an approximate solution to your problem by using regex once to parse the individual elements of the .csv stripping the outer quotes as you go and then running a second regex against each parsed element to either remove single occurrences of double quote or adding a second double-quote, where necessary. Then you can reassemble the string under program control.
This still will break if someone embeds a "", sequence in a data field string, so it's not perfect but it might be good enough for you.
The regex for splitting the .csv and stripping the double quotes is:
/(("(.*?)")|([^,]*))(,|$)/gm
This will accept either a "anything", OR a anything, repeatedly until the source is exhausted. Because of the capturing groups, the parsed text will either by in $3 (if the field was quoted) or $4 (if it was not quoted) but not both.
Here's a regexpReplace of your string with $3&$4 and a semicolon after each iteration (I took the liberty of adding a numeric field without the quotes so you could see that it handles both cases):
"123","","SDFDS SDFSDF EEE "S"",456,"asdfas","b","lll"
RegexpReplace(<above>,"((""(.*?)"")|([^,]*))(,|$)","$3$4;")
=> 123;;SDFDS SDFSDF EEE "S";456;asdfas;b;lll;;
See how the outer quotes have been stripped away. Now it's a simple thing to go through all the matches to remove all the remaining quotes, and then you can reconstruct the string from the array of matches.
I'm trying to write a regex in javascript to identify string representations of arbitrary javascript functions found in json, ie. something like
{
"key": "function() { return 'I am a function'; }"
}
It's easy enough to identify the start, but I can't figure out how to identify the ending double quotes since the function might also contain escaped double quotes. My best try so far is
/"\s*function\(.*\)[^"]*/g
which works nicely if there are no double quotes in the function string. The end of a json key value will end with a double quote and a subsequent comma or closing bracket. Is there some way to retrieve all characters (including newline?) until a negated pattern such as
not "/s*, and not "/s*}
... or do I need to take a completely different approach without regex?
Here's is the current test data I'm working with:
http://regexr.com/39pvi
Seems like you want something like this,
"\s*function\(.*\)(?:\\.|[^\\"])*
It matches also the inbetween \" escaped double quotes.
DEMO
I need to get rid of unwanted symbols, such as the multiple spaces, the leading and trailing whitespaces, as well as escape single and double quotes and other characters that may pose problems in my Neo4J Cypher query.
I currently use this (string.js Node module and jsesc Node module)
result = S(result).trim().collapseWhitespace().s;
result = jsesc(result, { 'quotes': 'double' });
They work fine, however,
1) I want to find a better, easier way to do it (preferably without those libraries) ;
2) When I use other encodings (e.g. Russian), jsesc seems to translate it into some other encoding than UTF-8 that the other parts of my script don't understand.
So I wanted to ask you if you could recommend me a RegExp that would do the job above without me having to use those modules.
Thank you!
I have a series of regex replace calls that do what you seem to be looking for, or at least the issues you mentioned. I put together a test string with several items you mentioned.
var testString = ' I start with \"unwanted items and" end with a space". Also I have Quotes ';
var cleanedString = testString.replace(/\s\s+/g, ' ').replace(/^\s|\s$/g, '').replace(/([^\\])(['"])/g, "$1\\$2");
console.log(cleanedString);
This will escape quotes (single or double) that have not yet been escaped, though you would have to worry about the case where the item is preceded by an escaped escape symbol. For example \\' would not be turned into \\\' as it should be. If you want to escape more characters you just need to add them to the final .replace regex. Let me know if there are specific examples you are looking for.
Can somebody explain what this regular expression does?
document.cookie.match(/cookieInfo=([^;]*).*$/)[1]
Also it would be great if I can strip out the double quotes I'm seeing in the cookieInfo values. i.e. when cookieInfo="xyz+asd" - I want to strip out the double quotes using the above regular expression.
It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo='
Try this to eliminate the double quotes:
document.cookie.match(/cookieInfo="([^;]*)".*$/)[1]
It searches the document.cookie string for cookieInfo=.
Next it grabs all of the characters which are not ; (until it hits the first semicolon).
[...] set of all characters included inside.
[^...] set of all characters which don't match
Then it lets the RegEx search through all other characters.
.* any character, 0 or more times.
$ end of string (or in some special cases, end of line).
You could replace " a couple of different ways, but rather than stuffing it into the regex, I'd recommend doing a replace on it after the fact:
var string = document.cookie.match(...)[1],
cleaned_string = string.replace(/^"|"$/g, "");
That second regex says "look at the start of the string and see if there's a ", or look at the end of the string and see if there's a ".
Normally, a RegEx would stop after it did the first thing it found. The g at the end means to keep going for every match it can possibly find in the string that you gave it.
I wouldn't put it in the original RegEx, because playing around with optional quotes can be ugly.
If they're guaranteed to always, always be there, then that's great, but if you assume they are, and you hit one that doesn't have them, then you're going to get a null match.
The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'.
To strip out the double quotes you can use the regex /"/ and replace it with an empty string.
I'm currently working with a regular expression (in Javascript) for replacing double quotes with smart quotes:
// ie: "quotation" to “quotation”
Here's the expression I've used for replacing the double quotes:
str = str.replace(/"([A-Za-z ]*)"/ig, "“$1”")
The above works perfectly if the phrase inside the quotes contains no additional punctuation, however, I also need to replace any apostrophes:
// ie: replace "It's raining again" with “It’s raining again!”
The expression for replacing single quotes/ apostrophes works fine if not encapsulated:
str.replace(/\'\b/g, "’"); // returns it's as it’s correctly
// Using both:
str.replace(/"([A-Za-z ]*)"/ig, "“$1”").replace(/\'\b/g, "’");
// "It's raining again!" returns as "It’s raining again!"
// Ignores double quotes
I know this is because the expression for replacing the double quotes is being matched to letters only, but my limited experience with regular expressions has me flummoxed at how to create a match for quotations that may also contain single quotes!
Any help would be HUGELY appreciated! Thanks in advance.
You can include in quotes all except quotes:
str = str.replace(/"([^"]*)"/ig, "“$1”")
Another option: use non-greedy search:
str = str.replace(/"(.*?)"/ig, "“$1”")
Also I'm not sure that you need to change only single quotes that are at the end of a word. May be it were better to change all of them?
replace(/\'/g, "’");
You can search for anything not a ". I would also make a lazy match with ? in case you had something like "Hey," she said, "what's up?" as your str:
str.replace(/"([^"]*?)"/ig, "“$1”").replace(/\'\b/g, "’");
Just to add to the current answers, you are performing a match on [A-Za-z ]* for the double quote replace, which means "match uppercase, lowercase or a space". This won't match It's raining, since your match expression does not contain the single quote.
Follow the advice of matching "anything but another double quote", since with your original regex a string like She said "It's raining outside." He said "really?" will result in She said ”It's raining outside." He said "really?” (the greedy match will skip past the 'inner' double quotes.)
It's a good idea to limit the spesific characters left and right of the quotes, especially if this occurs in a html file. I am using this.
str = str.replace(/([\n >*_-])"([A-Za-z0-9 ÆØÅæøå.,:;!##]*)"([ -.,!<\n])/ig, "$1«$2»$3");
In this way, you avoid replacing quotes inside html-tags like href="http.....
Normaly, there is an space left of the opening quote, and another right of the closing quote. In html document, it might be a closing bracket, a new line, etc. I have also included the norwegian characters. :-)