Remove part of word from a string with Javascript - javascript

I would like to find words in a sentence starting with a prefix and remove the rest of the characters.
Example:
this sentence Type_123 contains a Type_uiy
I would like to remove the characters that come after Type so I can have:
this sentence Type contains a Type
I know how I would go to remove the prefix with regex str.replace(/Type_/g,'') but how do I do the opposite action?
N.B. js prior ES6 if possible

Use the expression \b(Type)\w+ to capture the Type prefix.
Explanation:
\b | Match a word boundary (beginning of word)
(Type) | Capture the word "Type"
\w+ | Match one or more word characters, including an underscore
var str = 'this sentence Type_123 contains a Type_uiy';
var regex = /\b(Type)\w+/g;
console.log(str.replace(regex, '$1'));
The $1 in the replace() method is a reference to the captured characters. In this case, $1 stands for Type. So anywhere in the sentence, Type_xxx will be replaced with Type.
See MDN's documentation on the replace() method.

Install: https://github.com/icodeforlove/string-saw
let str = "Here's a sentence that contains Type_123 and Type_uiy";
let result = saw(str)
.remove(/(?<=Type_)\w+/g)
.toString();
The above would result in:
"Here's a sentence that contains Type_ and Type_"

Related

How to remove string from certain positions and add another string at that place using javascript?

I am new to programming and I want to replace string from certain position and replace it with another string.
Consider I have string "i am #hello".
I want to replace string starting from # to cursor position.
I have the indexes to the # character and cursor position but I don't know how to do it with replace method.
Below is the snippet:
replace_string = (original_string, "someuser") => {
let new_value;
const string_to_cursor_position = original_string.substring(0,
this.state.cursor_position);
const at_char_position = string_to_cursor_position.lastIndexOf('#');
return this.value.replace(this.state.cursor_position,
value_selected);
}
But this doesn't replace the string. The final output should be
"i am someuser".
Could someone help me fix this thanks.
You can simply use replace and it's callback
let replace_string = (original_string, replaceBy) => {
return original_string.replace(/#\w+/, replaceBy)
}
console.log(replace_string("i am #hello", 'someuser'))
I have deliberately left g flag, if you want to replace all the strings after which are preceded by # you can use /#\w+/g
You can use replace with regex.
var myString = "i am #hello";
var replacingString = "someuser";
console.log(myString.replace(/#(\w+)/g, replacingString));
What the /#(\w+)/g regex expression does is:
Finds the # character
Once the # is found, gets the one word after that character. (\w+)
Repeat any time it finds an # followed by a word. g flag
All the matches of this expression are replaced by the replace funciton.
Edit:
As #Jan pointed out in the comments, using \S+ instead of \w+ might work better in your case.
The difference between the two expressions is that \S+ matches matches any non-whitespace character (basically words with "weird" characters between letters, like -)
I would personally use regular expressions for this kind of tasks.
if you want to replace from the # character to the next whitespace character you can simply do
yourstring.replace(/#[^\s]+/, "Replacement String")
or
yourstring.replace(/#[\S]+/, "Replacement String")
for example
const template = "I am #username";
const result = template.replace(/#[^\s]+/, "Ki Jéy")

Remove hashtag symbol js, by regex

Tried to search on the forum but could not find anything that would precisely similar to what i need. Im basically trying to remove the # symbol from results that im receving, here is the dummy example of the regex.
let postText = 'this is a #test of #hashtags';
var regexp = new RegExp('#([^\\s])', 'g');
postText = postText.replace(regexp, '');
console.log(postText);
It gives the following result
this is a est of ashtags
What do i need to change around so that it removes just the hashtags without cutting the first letter of each word
You need a backreference $1 as the replacement:
let postText = 'this is a #test of #hashtags';
var regexp = /#(\S)/g;
postText = postText.replace(regexp, '$1');
console.log(postText);
// Alternative with a lookahead:
console.log('this is a #test of #hashtags'.replace(/#(?=\S)/g, ''));
Note I suggest replacing the constructor notation with a regex literal notation to make the regex a bit more readable, and changing [^\s] with a shorter \S (any non-whitespace char).
Here, /#(\S)/g matches multiple occurrences (due to g modifier) of # and any non-whitespace char right after it (while capturing it into Group 1) and String#replace will replace the found match with that latter char.
Alternatively, to avoid using backreferences (also called placeholders) you may use a lookahead, as in .replace(/#(?=\S)/g, ''), where (?=\S) requires a non-whitespace char immediately to the right of the current location. If you need to remove # at the end of the string, too, replace (?=\S) with (?!\s) that will fail the match if the next char is a whitespace.
Probably easier will be to write your own function which probably will look like this: (covers the usecase when symbol may be repeated)
function replaceSymbol(symbol, string) {
if (string.indexOf(symbol) < 0) {
return string;
}
while(string.indexOf(symbol) > -1) {
string = string.replace(symbol, '');
}
return string;
}
var a = replaceSymbol('#', '##s##u#c###c#e###ss is he#re'); // 'success is here'
You might be able to use the following :
let postText = 'this is a #test of #hashtags';
postText = postText.replace(/#\b/g, '');
It relies on the fact that a #hashtag contains a word-boundary between the # and the word that follows it. By matching that word-boundary with \b, we make sure not to match single #.
However, it might match a bit more than you would expect, because the definition of 'word character' in regex isn't obvious : it includes numbers (so #123 would be matched) and more confusingly, the _ character (so #___ would be matched).
I don't know if there's an authoritative source defining whether those are acceptable hashtags or not, so I'll let you judge whether this suits your needs.
You only need the #, the stuff in parens match anything else after said #
postText = postText.replace('#', '');
This will replace all #

Finding duplicates with regular expressions, how does this actually work? [duplicate]

I'm a regular expression newbie and I can't quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as:
Paris in the the spring.
Not that that is related.
Why are you laughing? Are my my regular expressions THAT bad??
Is there a single regular expression that will match ALL of the bold strings above?
Try this regular expression:
\b(\w+)\s+\1\b
Here \b is a word boundary and \1 references the captured match of the first group.
Regex101 example here
I believe this regex handles more situations:
/(\b\S+\b)\s+\b\1\b/
A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html
The below expression should work correctly to find any number of duplicated words. The matching can be case insensitive.
String regex = "\\b(\\w+)(\\s+\\1\\b)+";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0), m.group(1));
}
Sample Input : Goodbye goodbye GooDbYe
Sample Output : Goodbye
Explanation:
The regex expression:
\b : Start of a word boundary
\w+ : Any number of word characters
(\s+\1\b)* : Any number of space followed by word which matches the previous word and ends the word boundary. Whole thing wrapped in * helps to find more than one repetitions.
Grouping :
m.group(0) : Shall contain the matched group in above case Goodbye goodbye GooDbYe
m.group(1) : Shall contain the first word of the matched pattern in above case Goodbye
Replace method shall replace all consecutive matched words with the first instance of the word.
Try this with below RE
\b start of word word boundary
\W+ any word character
\1 same word matched already
\b end of word
()* Repeating again
public static void main(String[] args) {
String regex = "\\b(\\w+)(\\b\\W+\\b\\1\\b)*";// "/* Write a RegEx matching repeated words here. */";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE/* Insert the correct Pattern flag here.*/);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while (numSentences-- > 0) {
String input = in.nextLine();
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(m.group(0),m.group(1));
}
// Prints the modified sentence.
System.out.println(input);
}
in.close();
}
Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)
Try this regex that can catch 2 or more duplicate words and only leave behind one single word. And the duplicate words need not even be consecutive.
/\b(\w+)\b(?=.*?\b\1\b)/ig
Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.
Example
Source
The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):
(\b\w+\b)\W+\1
Here is one that catches multiple words multiple times:
(\b\w+\b)(\s+\1)+
No. That is an irregular grammar. There may be engine-/language-specific regular expressions that you can use, but there is no universal regular expression that can do that.
This is the regex I use to remove duplicate phrases in my twitch bot:
(\S+\s*)\1{2,}
(\S+\s*) looks for any string of characters that isn't whitespace, followed whitespace.
\1{2,} then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.
Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings, but triplicates and beyond, I'll show the adapted pattern.
Pattern: /(\b\S+)(?:\s+\1\b)+/ (Pattern Demo)
Replace: $1 (replaces the fullstring match with capture group #1)
This pattern greedily matches a "whole" non-whitespace substring, then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters (space, tab, newline, etc).
Specifically:
\b (word boundary) characters are vital to ensure partial words are not matched.
The second parenthetical is a non-capturing group, because this variable width substring does not need to be captured -- only matched/absorbed.
the + (one or more quantifier) on the non-capturing group is more appropriate than * because * will "bother" the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design.
*note if you are dealing with sentences or input strings with punctuation, then the pattern will need to be further refined.
The example in Javascript: The Good Parts can be adapted to do this:
var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;
\b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.
This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:
/(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")
I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)
First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.
I tried it like this and it worked well:
var s = "here here here here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result result";
print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))
--> here is ahi-ahi joe's the result
Try this regular expression it fits for all repeated words cases:
\b(\w+)\s+\1(?:\s+\1)*\b
I think another solution would be to use named capture groups and backreferences like this:
.* (?<mytoken>\w+)\s+\k<mytoken> .*/
OR
.*(?<mytoken>\w{3,}).+\k<mytoken>.*/
Kotlin:
val regex = Regex(""".* (?<myToken>\w+)\s+\k<myToken> .*""")
val input = "This is a test test data"
val result = regex.find(input)
println(result!!.groups["myToken"]!!.value)
Java:
var pattern = Pattern.compile(".* (?<myToken>\\w+)\\s+\\k<myToken> .*");
var matcher = pattern.matcher("This is a test test data");
var isFound = matcher.find();
var result = matcher.group("myToken");
System.out.println(result);
JavaScript:
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/;
const input = "This is a test test data";
const result = regex.exec(input);
console.log(result.groups.myToken);
// OR
const regex = /.* (?<myToken>\w+)\s+\k<myToken> .*/g;
const input = "This is a test test data";
const result = [...input.matchAll(regex)];
console.log(result[0].groups.myToken);
All the above detect the test as the duplicate word.
Tested with Kotlin 1.7.0-Beta, Java 11, Chrome and Firefox 100.
You can use this pattern:
\b(\w+)(?:\W+\1\b)+
This pattern can be used to match all duplicated word groups in sentences. :)
Here is a sample util function written in java 17, which replaces all duplications with the first occurrence:
public String removeDuplicates(String input) {
var regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
var pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
var matcher = pattern.matcher(input);
while (matcher.find()) {
input = input.replaceAll(matcher.group(), matcher.group(1));
}
return input;
}
As far as I can see, none of these would match:
London in the
the winter (with the winter on a new line )
Although matching duplicates on the same line is fairly straightforward,
I haven't been able to come up with a solution for the situation in which they
stretch over two lines. ( with Perl )
To find duplicate words that have no leading or trailing non whitespace character(s) other than a word character(s), you can use whitespace boundaries on the left and on the right making use of lookarounds.
The pattern will have a match in:
Paris in the the spring.
Not that that is related.
The pattern will not have a match in:
This is $word word
(?<!\S)(\w+)\s+\1(?!\S)
Explanation
(?<!\S) Negative lookbehind, assert not a non whitespace char to the left of the current location
(\w+) Capture group 1, match 1 or more word characters
\s+ Match 1 or more whitespace characters (note that this can also match a newline)
\1 Backreference to match the same as in group 1
(?!\S) Negative lookahead, assert not a non whitespace char to the right of the current location
See a regex101 demo.
To find 2 or more duplicate words:
(?<!\S)(\w+)(?:\s+\1)+(?!\S)
This part of the pattern (?:\s+\1)+ uses a non capture group to repeat 1 or more times matching 1 or more whitespace characters followed by the backreference to match the same as in group 1.
See a regex101 demo.
Alternatives without using lookarounds
You could also make use of a leading and trailing alternation matching either a whitespace char or assert the start/end of the string.
Then use a capture group 1 for the value that you want to get, and use a second capture group with a backreference \2 to match the repeated word.
Matching 2 duplicate words:
(?:\s|^)((\w+)\s+\2)(?:\s|$)
See a regex101 demo.
Matching 2 or more duplicate words:
(?:\s|^)((\w+)(?:\s+\2)+)(?:\s|$)
See a regex101 demo.
Use this in case you want case-insensitive checking for duplicate words.
(?i)\\b(\\w+)\\s+\\1\\b

Regex to get the string between a character and a whitespace and excluding the first delimiter

In the following text what Regex (Javascript) would match "user" (user is a random name), excluding the "#" character?
I want to tag this #user here
and this #user
#user
I have looked at the following solutions and made the following regexes that did not work
RegEx pattern to match a string between two characters, but exclude the characters
\#(.*)\s
Regular Expression to find a string included between two characters while EXCLUDING the delimiters
(?!\#)(.*?)(?=\s)
Regex: Matching a character and excluding it from the results?
^#[^\s]+
Finally I made this regex that works but returns "#user" instead of "user":
#[^\s\n]+
The Javascript used to execute the regex is:
string.match(/#[^\s\n]+/)
I see I need to post a clarification.
If one knows a pattern beforehand in JS, i.e. if you do not build a regex from separate variables, one should be using a RegExp literal notation (e.g. /<pattern>/<flag(s)>).
In this case, you need a capturing group to get a submatch from a match that will start with a # and go on until the next non-whitespace character. You cannot use String#match if you have multiple values inside one input string, as global regexps with that method lose the captured texts. You need to use RegExp#exec:
var s = "I want to tag this #user here\nand this #user\n#user";
var arr = [];
var re = /#(\S+)\b/g;
while ((m=re.exec(s)) !== null) {
arr.push(m[1]);
}
document.write(JSON.stringify(arr));
The regex I suggest is #(\S+)\b:
# - matches a literal #
(\S+) - matches and captures into Group 1 one or more non-whitespace characters that finish with
\b - word boundary (remove if you have Unicode letters inside the names).
If you execute it this way, it should work:
var str = "I want to tag this #user here";
var patt = new RegExp("#([^\\s\\n]+)");
var result = patt.exec(str)[1];

Regular expression to get the string including the starting character

I have a string like this
var str="|Text|Facebook|Twitter|";
I am trying to get any one of the word with its preceding pipe sign so something like
|Text or |Facebook or |Twitter
I thought of below two patterns but they didn't work
/|Facebook/g //returned nothing
/^|Facebook/g // returned "Facebook" but I want "|Facebook"
What should I use to get |Facebook?
The pipe is special character in a regular expression. A|B matches A or B.
You have to escape the pipe to match | literally.
var str = '|Text|Facebook|Twitter|'
str.match(/\|\w+/g) // => ["|Text", "|Facebook", "|Twitter"]
\w matches any alphabet, digit, _.
You should escape | char:
/\|Facebook/g

Categories

Resources