Replace part of a string and upper case the result - javascript

everything good?
I would like some help from you, I have the following scenario:
STRING_ONE:
string_two
STRING_three :
string_four:
stringfive:
I need to identify words from the beginning of the line that end with :
after identifying the words, I need to erase the spaces and convert them to uppercase,
i tried doing some regex but as the words change I can't set the default for the replacement because i need to keep the same word, just removing the spaces and converting to capital letters
The result I'm trying to get is this:
STRING_ONE:
string_two
STRING_three :
STRING_FOUR:
STRINGFIVE:
I can capture the words that match this pattern, with the following regex, but I don't know how to replace it by just erasing the spaces, keeping the rest of the string the same, and doing the upper case
^.*\b:
I tried to replace like this but it didn't work
"$1".toUpperCase()
Can anyone help please?
Thanks!

As you are not really replacing anything but instead you are looking for a pattern I used the pattern in a String.prototype.match() to identify the lines in which .trim() and .toLowerCase() need to be applied. The .split("\n") turns the initial string into an array over which I can then .map() the individual lines. At the end I .join() everything together again.
const str=`STRING_ONE:
string_two
STRING_three :
string_four:
stringfive:`;
console.log(str
.split("\n")
.map(s=>s.match(/^\s*\w+:\s*$/) && s.trim().toUpperCase() || s )
.join("\n")
);
Our regex patterns differ slightly:
while yours (/^.*\b:/) will match any line that has at least one word-end followed by a colon in it
mine (/^\s*\w+:\s*$/) is stricter and demands that there is exactly one word followed by a colon in a line that can optionally be padded by any number of whitespace characters on either side.

Related

Is it possible to replace only in a match group - REGEX

I have always tried to avoid regex because I simply can't get my head around how it really works. Most of the time I manage to get the expected result by luck more than actual skill.
However, I am trying to replace any whitespace character in a bundled webpack source with the string-replace-loader or the String-Replace-Plugin (which ever turns out easier). But before I try to do this on the actual source, I want to understand the regex which I am trying to perform.
The problem
I have query strings which always start with dqlParse followed by \n then maybe some \t and other whitespace characters. I have already managed to get my whitespace characters removed in a test string if I match this
/\s+\s/g
and simply replace it with " ".
Since I don't have control over all the strings within my bundle, I thought I can indicate which string is set for replacement by adding dqlParse infront of the string and then match and replace by groups. Unfortunately no luck so far.
What I have tried
So far I have tried something like this
/(^dqlParse)(.*)/g
which basically does what it should since match group $1 is dqlParse and match group $2 is the rest of the string where I would like to do the replacement.
Is it possible to replace only in the second match group?
Thanks! Any help appreciated!
Yes, you can do that with String#replace:
text = text.replace(/^(dqlParse)(.*)/g, function(_, x, y) {return x + y.replace(/\s{2,}/g, ' ');})
This will match and capture dqlParse into Group 1 (x variable in the callback function), and the rest of the line will get captured into Group 2 (y in the callback function). So, once the match is found, the replacement will be the concatenation of x and y with all two or more whitespace chunks replaced with a single space.

The second pattern of a regex not replacing apostrophe

I'm creating a regex that matches straight apostrophes and replaces them with a curly ones. Sometimes an apostrophe goes in the middle of two characters. Other times goes at the end of a character/word (e.g. ellipsis').
So I have two regexes that handle both situations (separated by an or statement).
However, only the first case is being replaced, not the second. In other words, this:
"Wor'd word'".replace(/(?<=\w)\'(?=\w)|(?<=\w)\'(?=\s)/, '’')
Becomes this:
"Wor’d word'"
This confuses me because both types of apostrophes are matching: https://regexr.com/4td7p
Why is this, and how to fix it?
Update: I figured the problem was that there's no space after the last apostrophe, so I changed the second part of the regex to this: (?<=\w)\'(?!\w) (don't match if there's a character after the apostrophe). But I'm getting the same result.
If you want to match (?<=\w)\' followed by a character and also match (?<=\w)\' not followed by a character, why not just drop the logic after it altogether and just use (?<=\w)'? (no need to escape 's in a regex)
You also need the global flag to replace more than one thing at a time:
console.log(
"Wor'd word'".replace(/(?<=\w)'/g, '’')
);
updated
var str = "Wor'd word' that's a good thing'";
var afterReplace = str.replace(/'\b/g, '’')
console.log(afterReplace);

Regex: Match number between two strings with line breaks

I want to match only the value of this string that ends with ZW-Summe with RegEx (JavaScript, so no lookbehind - please consider: I must use regex):
[here is a lot more data with line breaks and so on...]
2,550%Zinsen 83,72ZW-Summe U
St 83,72Umsatzs? [more lines...]
Problem: There can be line breaks everyvery, even this could happen:
[data....] 2,550%Zinsen 83,7
2ZW-Summe U
St 83,72Umsatzs? [more lines...]
My goal is to match 83,72 only, without ZW-Summe and of course the value can change. Possible values:
1.000,22
0,22
222,22
100.000,22 and so on.
I have to identify the value with the ZW-Summe String because there can be more occurrences of values.
My first attempt is ((\d{0,3}((\.\d{3})){0,2}),\d{2})ZW-Sum but this does also match ZW-Sum and I am not able to access groups - and the main problem is that I does not ignore possible line breaks.
I hope this is even possible to match something (-VALUE-ZW-Summe) and than ignore the ZW-Summe in the result?
Thank you for any suggestions.
This worked for me, and is quite simple: /([\d,.\n]+)(?=ZW-Summe)/g.
It basically just matches a series of digits, commas, periods, and newlines where after said series is ZW-Summe (?=, positive lookahead). The g flag makes it ignore line breaks.
After you run it, though, make sure to strip the newlines (i.e. match = match.replace('\n', '');).

Replace comma as separator

I am trying to build a Regex to replace commas used as a separator in between normal text.
Different ways I can replace that is valid:
Space before comma
Comma is between text and/or numbers, without any space
Several commas after each other
Example:
"This is a text separated with comma, that I try to fix. , It can be split in several ways.,1234321 , I try to make all the examples in one string,,4321,"
Results:
This is a text separated with comma, that I try to fix.
It can be split in several ways.
1234321
I try to make all the examples in one string
4321
This is the code I have so far using Node.js / Javascript:
data.replace(/(\S,\S)|( ,)|(,,)|(,([a-z0-9]))/ig,';')
The answer from #torazaburo work best, except for several commas with space in-between (, , , ,)
console.log(str.split(/ +, *|,(?=\w|,|$)/));
var str = "This is a text separated with comma, that I try to fix. , It can be split in several ways.,1234321 , I try to make all the examples in one string,,4321,";
console.log(str.split(/ +, *|,(?=\w|,|$)/));
This will split on any comma preceded by one or more spaces, no matter what follows (and eat the preceding spaces, and following spaces if any); or, any comma followed by an alphanumeric or comma or end-of-string.
There is no easy way with the regexp to get rid of the final empty string in the result, caused by the comma at the very end of the input. You can get rid of that yourself if you don't want it.
To rejoin with semi-colon, add .join(';').
data.replace(/\s*,+\s*/g, ';');
This will yield:
This is a text separated with comma;that I try to fix.;It can be split in several ways.;1234321;I try to make all the examples in one string;4321;
There are three parts to this:
\s*: Match zero or more whitespace characters.
,+: Match one or more commas.
\s*: Match zero or more whitespace characters.
If, instead, you want to replace any number of consecutive commas with a single semi-colon:
data.replace(/,+/g, ';');
Honestly, I'm not sure I understood your requirements. If I did misunderstand, please provide the output string you're expecting.

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Categories

Resources