JS XRegExp Replace all non characters - javascript

My objective is to replace all characters which are not dash (-) or not number or not letters in any language in a string.All of the #!()[], and all other signs to be replaced with empty string. All occurences of - should not be replaced also.
I have used for this the XRegExp plugin but it seems I cannot find the magic solution :)
I have tryed like this :
var txt = "Ad СТИНГ (ALI) - Englishmen In New York";
var regex = new XRegExp('\\p{^N}\\p{^L}',"g");
var b = XRegExp.replace(txt, regex, "")
but the result is : AСТИН(AL EnglishmeINeYork ... which is kind of weird
If I try to add also the condition for not removing the '-' character leads to make the RegEx invalid.

\\p{^N}\\p{^L} means a non-number followed by a non-letter.
Try [^\\p{N}\\p{L}-] that means a non-number, non-letter, non-dash.
A jsfiddle where to do some tests... The third XRegExp is the one you asked.

\p{^N}\p{^L}
is a non-number followed by a non-letter. You probably meant to say a character that is neither a letter nor a number:
[^\p{N}\p{L}]

// all non letters/numbers in a string => /[^a-zA-z0-9]/g
I dont know XRegExp.
but in js Regexp you can replace it by
b.replace(/[^a-zA-z0-9]/g,'')

Related

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

How to check if a string contains specific words in different languages [duplicate]

I have simple regex which founds some word in text:
var patern = new RegExp("\bsomething\b", "gi");
This match word in text with spaces or punctuation around.
So it match:
I have something.
But doesn't match:
I havesomething.
what is fine and exactly what I need.
But I have issue with for example Arabic language. If I have regex:
var patern = new RegExp("\bرياضة\b", "gi");
and text:
رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي
The keyword which I am looking for is at the end of the text.
But this doesn't work, it just doesn't find it.
It works if I remove \b from regex:
var patern = new RegExp("رياضة", "gi");
But that is now what I want, because I don't want to find it if it's part of another word like in english example above:
I havesomething.
So I really have low knowledge about regex and if anyone can help me to work this with english and languages like arabic.
We have first to understand what does \b mean:
\b is an anchor that matches at a position that is called a "word boundary".
In your case, the word boundaries that you are looking for are not having other Arabic letters.
To match only Arabic letters in Regex, we use unicode:
[\u0621-\u064A]+
Or we can simply use Arabic letters directly
[ء-ي]+
The code above will match any Arabic letters. To make a word boundary out of it, we could simply reverse it on both sides:
[^ء-ي]ARABIC TEXT[^ء-ي]
The code above means: don't match any Arabic characters on either sides of an Arabic word which will work in your case.
Consider this example that you gave us which I modified a little bit:
أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا
If we are trying to match only رياض, this word will make our search match also رياضة, رياضيات, and رياضتي. However, if we add the code above, the match will successfully be on رياض only.
var x = " أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا ";
x = x.replace(/([^ء-ي]رياض[^ء-ي])/g, '<span style="color:red">$1</span>');
document.write (x);
If you would like to account for أآإا with one code, you could use something like this [\u0622\u0623\u0625\u0627] or simply list them all between square brackets [أآإا]. Here is a complete code
var x = "أنا هنا وانا هناك .. آنا هنا وإنا هناك";
x = x.replace(/([أآإا]نا)/g, '<span style="color:red">$1</span>');
document.write (x);
Note: If you want to match every possible Arabic characters in Regex including all Arabic letters أ ب ت ث ج, all diacritics َ ً ُ ٌ ِ ٍ ّ, and all Arabic numbers ١٢٣٤٥٦٧٨٩٠, use this regex: [،-٩]+
Useful link about the ranking of Arabic characters in Unicode: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode
This doesn't work because of the Arabic language which isn't supported on the regex engine.
You could search for the unicode chars in the text (Unicode ranges).
Or you could use encoding to convert the text into unicode and then make somehow the regex (i never have tried this but it should work).
I used this ء-ي٠-٩ and it works for me
If you don't need a complicated RegEx (for instance, because you're looking for a particular word or a short list of words), then I've found that it's actually easier to tokenize the search text and find it that way:
>>> text = 'رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي '
>>> tokens = text.split()
>>> print(tokens)
['رياضة', 'أنا', 'أحب', 'رياضتي', 'وأنا', 'سعيد', 'حقا', 'هنا', 'لها', 'حبي']
>>> search_words = ['رياضة', 'رياضت']
>>> found = [w for w in tokens if w in search_words]
>>> print(found)
['رياضة'] # returns only full-word match
I'm sure that this is slower than RegEx, but not enough that I've ever noticed.
If your text had punctuation, you could do a more sophisticated tokenization (so it would find things like 'رياضة؟') using NLTK.

How to convert string from PHP to javascript regular expression?

This is my string converted into javascript object.
{"text" : "Must consist of alphabetical characters and spaces only", regexp:"/^[a-z\\s]+$/i"}
I need regexp to use it for validation but it won’t work because of the double quotes and \s escape sequence.
To make it work the value of regexp must be {"text" : "Must consist of alphabetical characters and spaces only", regexp : /^[a-z\s]+$/i}.
I also used this new RegExp(object.regexp) and any other way I can possibly think but with no luck at all.
Any help is appreciated!
Try split-ing out the part that you want, before putting it into the new RegExp constructor:
var regexVariable = new RegExp(object.regexp.split("/")[1]);
That will trim off the string representation of the regex "boundaries", as well as the "i" flag, and leave you with just the "guts" of the regex.
Pushing the result of that to the console results in the following regex: /^[a-z\s]+$/
Edit:
Not sure if you want to "read" the case insensitivity from the value in the object or not, but, if you do, you can expand the use of the split a little more to get any flags included automatically:
var aRegexParts = object.regexp.split("/");
var regexVariable = new RegExp(aRegexParts[1], aRegexParts[2]);
Logging that in the console results in the first regex that I posted, but with the addition of the "i" flag: /^[a-z\s]+$/i
Borrowing the example #RoryMcCrossan made, you can use a regular expression to parse your regular expression.
var object = {
"text": "Must consist of alphabetical characters and spaces only",
"regexp": "/^[a-z\\s]+$/i"
}
// parse out the main regex and any additional flags.
var extracted_regex = object.regexp.match(/\/(.*?)\/([ig]+)?/);
var re = new RegExp(extracted_regex[1], extracted_regex[2]);
// don't use document.write in production! this is just so that it's
// easier to see the values in stackoverflow's editor.
document.write('<b>regular expression:</b> ' + re + '<br>');
document.write('<b>string:</b> ' + object.text + '<br>');
document.write('<b>evaluation:</b> ' + re.test(object.text));
not used regex in Java but the regular expression itself should look something like :
"^([aA-zZ] | \s)*$"
If Java uses regular expression as I am used to them [a-z] will only capture lowercase characters
Hope this helps even if it's just a little (would add this as a comment instead of answer but need 50 rep)

Simple Regexp Pattern matching with escape characters

Hopefully a simple one!
I've been trying to get this to work for several hours now but am having no luck, as I'm fairly new to regexp I may be missing something very obvious here and was hoping someone could point me in the right direction. The pattern I want to match is as follows: -
At least 1 or more numbers + "##" + at least 1 or more numbers + "##" + at least 1 or more numbers
so a few examples of valid combinations would be: -
1##2##3
123#123#123
0##0##0
A few invalid combinations would be
a##b##c
1## ##1
I've got the following regexp like so: -
[\d+]/#/#[\d+]/#/#[\d+]
And am using it like so (note the double slashes as its inside a string): -
var patt = new RegExp("[\\d+]/#/#[\\d+]/#/#[\\d+]");
if(newFieldValue!=patt){newFieldValue=="no match"}
I also tried these but still nothing: -
if(!patt.text(newFieldValue)){newFieldValue==""}
if(patt.text(newFieldValue)){}else{newFieldValue==""}
But nothing I try is matching, where am I going wrong here?
Any pointers gratefully received, cheers!
1) I can't see any reason to use the RegExp constructor over a RegExp literal for your case. (The former is used primarily where the pattern needs to by dynamic, i.e. is contributed to by variables.)
2) You don't need a character class if there's only one type of character in it (so \d+ not [\d+]
3) You are not actually checking the pattern against the input. You don't apply RegEx by creating an instance of it and using ==; you need to use test() or match() to see if a match is made (the former if you want to check only, not capture)
4) You have == where you mean to assign (=)
if (!/\d+##\d+##\d+/.test(newFieldValue)) newFieldValue = "no match";
You put + inside the brackets, so you're matching a single character that's either a digit or +, not a sequence of digits. I also don't understand why you have / before each #, your description doesn't mention anything about this character.
Use:
var patt = /\d+##\d+##\d+/;
You should use the test method of the pat regex
if (!patt.test(newFieldValue)){ newFieldValue=="no match"; }
once you have a valid regular expression.
Try this regex :
^(?:\d+##){2}\d+$
Demo: http://regex101.com/r/mE8aG7
With the following regex
[\d+]/#/#[\d+]/#/#[\d+]
You would only match things like:
+/#/#5/#/#+
+/#/#+/#/#+
0/#/#0/#/#0
because the regex engine sees it like on the schema below:
Something like:
((-\s)?\d+##)+\d+

java script Regular Expressions patterns problem

My problem start with like-
var str='0|31|2|03|.....|4|2007'
str=str.replace(/[^|]\d*[^|]/,'5');
so the output becomes like:"0|5|2|03|....|4|2007" so it replaces 31->5
But this doesn't work for replacing other segments when i change code like this:
str=str.replace(/[^|]{2}\d*[^|]/,'6');
doesn't change 2->6.
What actually i am missing here.Any help?
I think a regular expression is a bad solution for that problem. I'd rather do something like this:
var str = '0|31|2|03|4|2007';
var segments = str.split("|");
segments[1] = "35";
segments[2] = "123";
Can't think of a good way to solve this with a regexp.
Here is a specific regex solution which replaces the number following the first | pipe symbol with the number 5:
var re = /^((?:\d+\|){1})\d+/;
return text.replace(re, '$15');
If you want to replace the digits following the third |, simply change the {1} portion of the regex to {3}
Here is a generalized function that will replace any given number slot (zero-based index), with a specified new number:
function replaceNthNumber(text, n, newnum) {
var re = new RegExp("^((?:\\d+\\|){"+ n +'})\\d+');
return text.replace(re, '$1'+ newnum);
}
Firstly, you don't have to escape | in the character set, because it doesn't have any special meaning in character sets.
Secondly, you don't put quantifiers in character sets.
And finally, to create a global matching expression, you have to use the g flag.
[^\|] means anything but a '|', so in your case it only matches a digit. So it will only match anything with 2 or more digits.
Second you should put the {2} outside of the []-brackets
I'm not sure what you want to achieve here.

Categories

Resources