Finding the difference between two string in Javascript with regex [closed] - javascript

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Regex experts please help to see if this problem can be solved by regex:
Given string 1 is any string
And string 2 is any string containing all parts of string 1 (but not a simple match -- I will give example)
How to use regex to replace all parts of string 1 in string 2 with blank so that what's remained is the string not in string 1?
For example:
str1 = "test xyz";
str2 = "test ab xyz"
I want " ab" or "ab " back. What is the regex I can write so that when I run a replace function on str2, it will return " ab"?
Here is some non-regex code:
function findStringDiff(str1, str2) {
var compareString = function(str1, str2) {
var a1 = str1.split("");
var a2 = str2.split("");
var idx2 = 0;
a1.forEach(function(val) {
if (a2[idx2] === val) {
a2.splice(idx2,1);
} else {
idx2 += 1;
}
});
if (idx2 > 0) {
a2.splice(idx2,a2.length);
}
return a2.join("");
}
if (str1.length < str2.length) {
return compareString(str1, str2);
} else {
return compareString(str2, str1);
}
}
console.log(findStringDiff("test xyz","test ab xyz"));

Regexes only recognize if a string matches a certain pattern. They're not flexible enough to do comparisons like you're asking for. You would have to take the first string and build a regular language based on it to recognize the second string, and then use match groups to grab the other parts of the second string and concatenate them together. Here's something that does what I think you want in a readable way.
//assuming "b" contains a subsequence containing
//all of the letters in "a" in the same order
function getDifference(a, b)
{
var i = 0;
var j = 0;
var result = "";
while (j < b.length)
{
if (a[i] != b[j] || i == a.length)
result += b[j];
else
i++;
j++;
}
return result;
}
console.log(getDifference("test fly", "test xy flry"));
Here's a jsfiddle for it: http://jsfiddle.net/d4rcuxw9/1/

I find this question really interesting. Even though I'm a little late, I would like to share my solution on how to accomplish this with regex. The solution is concise but not very readable.
While I like it for its conciseness, I probably would not use it my code, because it's opacity reduces the maintainability.
var str1 = "test xyz",
str2 = "test ab xyz"
replacement = '';
var regex = new RegExp(str1.split('').map(function(char){
return char.replace(/[.(){}+*?[|\]\\^$]/, '\\$&');
}).join('(.*)'));
if(regex.test(str2)){
for(i=1; i<str1.length; i++) replacement = replacement.concat('$' + i);
var difference = str2.replace(regex, replacement);
} else {
alert ('str2 does not contain str1');
}
The regular expression for "test xyz" is /t(.*)e(.*)s(.*)t(.*) (.*)x(.*)y(.*)z/ and replacement is "$1$2$3$4$5$6$7".
The code is no longer concise, but it works now even if str1 contains special characters.

To find out if there are extra '.' like you are asking for, you can do this:
result = "$1...00".match(/\$1\.(\.*)?00/)[1];
result is then the EXTRA '.'s found. You cannot use regex to compare strings using only regex. Perhaps use this, then compare the results.
You can also try this:
result = "$1...00".match(/(\$)(\d+)\.(\.*)?(\d+)/);
// Outputs: ["$1...00", "$", "1", "..", "00"]
Which will extract the various parts to compare.

If you are only concerned with testing whether a given string contains two or more sequential dot '.' characters:
var string = '$1..00',
regexp = /(\.\.+)/;
alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));
If you need it to match the currency format:
var string = '$1..00',
regexp = /\$\d*(\.\.+)(?:\d\d)+/;
alert('Is this regular expression ' + regexp + ' found in this string ' + string + '?\n\n' + regexp.test(string) + '\n\n' + 'Match and captures: ' + regexp.exec(string));
But I caution you that Regular Expressions aren't for comparing the differences between two strings; they are used for defining patterns to match against given strings.
So, while this may directly answer how to find the "multiple dots" pattern, it is useless for "finding the difference between two strings".
The StackOverflow tag wiki provides an excellent overview and basic reference for RegEx. See: https://stackoverflow.com/tags/regex/info

Related

Recombine capture groups in single regexp?

I am trying to handle input groups similar to:
'...A.B.' and want to output '.....AB'.
Another example:
'.C..Z..B.' ==> '......CZB'
I have been working with the following:
'...A.B.'.replace(/(\.*)([A-Z]*)/g, "$1")
returns:
"....."
and
'...A.B.'.replace(/(\.*)([A-Z]*)/g, "$2")
returns:
"AB"
but
'...A.B.'.replace(/(\.*)([A-Z]*)/g, "$1$2")
returns
"...A.B."
Is there a way to return
"....AB"
with a single regexp?
I have only been able to accomplish this with:
'...A.B.'.replace(/(\.*)([A-Z]*)/g, "$1") + '...A.B.'.replace(/(\.*)([A-Z]*)/g, "$2")
==> ".....AB"
If the goal is to move all of the . to the beginning and all of the A-Z to the end, then I believe the answer to
with a single regexp?
is "no."
Separately, I don't think there's a simpler, more efficient way than two replace calls — but not the two you've shown. Instead:
var str = "...A..B...C.";
var result = str.replace(/[A-Z]/g, "") + str.replace(/\./g, "");
console.log(result);
(I don't know what you want to do with non-., non-A-Z characters, so I've ignored them.)
If you really want to do it with a single call to replace (e.g., a single pass through the string matters), you can, but I'm fairly sure you'd have to use the function callback and state variables:
var str = "...A..B...C.";
var dots = "";
var nondots = "";
var result = str.replace(/\.|[A-Z]|$/g, function(m) {
if (!m) {
// Matched the end of input; return the
// strings we've been building up
return dots + nondots;
}
// Matched a dot or letter, add to relevant
// string and return nothing
if (m === ".") {
dots += m;
} else {
nondots += m;
}
return "";
});
console.log(result);
That is, of course, incredibly ugly. :-)

How to remove string between two characters every time they occur [duplicate]

This question already has answers here:
Strip HTML from Text JavaScript
(44 answers)
removing html tags from string
(3 answers)
Closed 7 years ago.
I need to get rid of any text inside < and >, including the two delimiters themselves.
So for example, from string
<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>​
I would like to get this one
that
This is what i've tried so far:
var str = annotation.split(' ');
str.substring(str.lastIndexOf("<") + 1, str.lastIndexOf(">"))
But it doesn't work for every < and >.
I'd rather not use RegEx if possible, but I'm happy to hear if it's the only option.
You can simply use the replace method with /<[^>]*>/g.It matches < followed by [^>]* any amount of non> until > globally.
var str = '<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>';
str = str.replace(/<[^>]*>/g, "");
alert(str);
For string removal you can use RegExp, it is ok.
"<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>​".replace(/<\/?[^>]+>/g, "")
Since the text you want is always after a > character, you could split it at that point, and then the first character in each String of the array would be the character you need. For example:
String[] strings = stringName.split("<");
String word = "";
for(int i = 0; i < strings.length; i++) {
word += strings[i].charAt(0);
}
This is probably glitchy right now, but I think this would work. You don't need to actually remove the text between the "<>"- just get the character right after a '>'
Using a regular expression is not the only option, but it's a pretty good option.
You can easily parse the string to remove the tags, for example by using a state machine where the < and > characters turns on and off a state of ignoring characters. There are other methods of course, some shorter, some more efficient, but they will all be a few lines of code, while a regular expression solution is just a single replace.
Example:
function removeHtml1(str) {
return str.replace(/<[^>]*>/g, '');
}
function removeHtml2(str) {
var result = '';
var ignore = false;
for (var i = 0; i < str.length; i++) {
var c = str.charAt(i);
switch (c) {
case '<': ignore = true; break;
case '>': ignore = false; break;
default: if (!ignore) result += c;
}
}
return result;
}
var s = "<brev-y>th</brev-y><sw-ex>a</sw-ex><sl>t</sl>";
console.log(removeHtml1(s));
console.log(removeHtml2(s));
There are several ways to do this. Some are better than others. I haven't done one lately for these two specific characters, so I took a minute and wrote some code that may work. I will describe how it works. Create a function with a loop that copies an incoming string, character by character, to an outgoing string. Make the function a string type so it will return your modified string. Create the loop to scan from incoming from string[0] and while less than string.length(). Within the loop, add an if statement. When the if statement sees a "<" character in the incoming string it stops copying, but continues to look at every character in the incoming string until it sees the ">" character. When the ">" is found, it starts copying again. It's that simple.
The following code may need some refinement, but it should get you started on the method described above. It's not the fastest and not the most elegant but the basic idea is there. This did compile, and it ran correctly, here, with no errors. In my test program it produced the correct output. However, you may need to test it further in the context of your program.
string filter_on_brackets(string str1)
{
string str2 = "";
int copy_flag = 1;
for (size_t i = 0 ; i < str1.length();i++)
{
if(str1[i] == '<')
{
copy_flag = 0;
}
if(str1[i] == '>')
{
copy_flag = 2;
}
if(copy_flag == 1)
{
str2 += str1[i];
}
if(copy_flag == 2)
{
copy_flag = 1;
}
}
return str2;
}

Splitting a string by ],[

I have a string like the following:
"[a,b,c],[d,e,f],[g,h,i]"
I was wondering how can I separate the string by ],[ in JavaScript. .split("],[") will remove the brackets. I want to preserve them.
Expected output:
["[a,b,c]","[d,e,f]","[g,h,i]"]
Edit:
Here is a more complicated case that I highlighted in a comment on #Leo's answer (wherein a ],[-delimited string contains ],):
"[dfs[dfs],dfs],[dfs,df,sdfs]]"
Expected output:
["[dfs[dfs],dfs]","[dfs,df,sdfs]]"]
Try this:
"[a,b,c],[d,e,f],[g,h,i]".match(/(\[[^\]]+\])/g)
// ["[a,b,c]", "[d,e,f]", "[g,h,i]"]
EDIT For OP's new case, here's the trick:
"[dfs[dfs],dfs],[dfs,df,sdfs]]".match(/(?!,\[).+?\](?=,\[|$)/g)
// ["[dfs[dfs],dfs]", "[dfs,df,sdfs]]"]
It works for even more complicated cases:
"[dfs[aa,[a],dfs],[dfs[dfs],dfs],[dfs,df,sdfs]]".match(/(?!,\[).+?\](?=,\[|$)/g)
// ["[dfs[aa,[a],dfs]", "[dfs[dfs],dfs]", "[dfs,df,sdfs]]"]
"[dfs[aa,[a],dfs],[dfs[dfs],dfs],[dfs,df,sdfs]],[dfs,df,sdfs]]".match(/(?!,\[).+?\](?=,\[|$)/g)
// ["[dfs[aa,[a],dfs]", "[dfs[dfs],dfs]", "[dfs,df,sdfs]]", "[dfs,df,sdfs]]"]
Below is my personal opinion
However, JavaScript's RegExp doesn't support lookbehind (?<, which is super handy for such requirements), using RegExp may become a maintainability nightmare. In this situation, I'd suggest an approach like, maybe #alienchow's replacing delimiters - not so neat, but more maintainable.
Personally I'd do
"[dfs[dfs],dfs],[dfs,df,sdfs]]".split("],[");
then loop through it to:
Append the first string with a "]".
Prepend the last string with a "[".
Prepend a "[" and append a "]" to all strings in between.
However, if you know what kind of strings and characters you will be receiving and you reaaaaally want a one-liner approach, you could try the hack below.
Replace all instances of "],[" with "]unlikely_string_or_special_unicode[", then split by "unlikely_string_or_special_unicode" - for example:
"[dfs[dfs],dfs],[dfs,df,sdfs]]".replace(/\],\[/g,"]~I_have_a_dream~[").split("~I_have_a_dream~");
Warning: Not 100% full-proof. If your input string has the unlikely string you used as a delimiter, then it implodes and the universe comes to an end.
TMTOWDI
I prefer doing this with a regex as #Leo explained, but another way to do it in the spirit of TMTOWDI & completeness is with the map function following the split:
var test = "[a,b,c],[d,e,f],[g,h,i]";
var splitTest = test.split("],[").map(
function(str) {
if (str[0] !== '[') {
str = '[' + str;
}
if (str[str.length - 1] !== ']') {
str += ']';
}
return str;
});
// FORNOW: to see the results
for (var i = 0; i < splitTest.length; i++) {
alert(splitTest[i]);
}
Afterthought:
If you perchance have an empty pair of square brackets in your ],[-delimited string (i.e. "[a,b,c],[d,e,f],[],[g,h,i]" for example), this approach will preserve it too (as would changing #Leo's regex from /(\[[^\]]+\])/g to /(\[[^\]]*\])/g).
TMTOWDI Redeux
With the curveball that ] and [ may be within the ],[-delimited strings (per your comment on #Leo's answer), here is a rehash of my initial approach that is more robust:
var test = "[dfs[dfs],dfs],[dfs,df,sdfs]]";
var splitTest = test.split("],[").map(
function(str, arrIndex, arr) {
if (arrIndex !== 0) {
str = '[' + str;
}
if (arrIndex !== arr.length - 1) {
str += ']';
}
return str;
});
// FORNOW: to see the results
for (var i = 0; i < splitTest.length; i++) {
alert(splitTest[i]);
}

JavaScript check if string contains any of the words from regex

I'm trying to check if a string contains any of these words:
AB|AG|AS|Ltd|KB|University
My current code:
var acceptedwords = '/AB|AG|AS|Ltd|KB|University/g'
var str = 'Hello AB';
var matchAccepted = str.match(acceptedwords);
console.log(matchAccepted);
if (matchAccepted !== null) { // Contains the accepted word
console.log("Contains accepted word: " + str);
} else {
console.log("Does not contain accepted word: " + str);
}
But for some strange reason this does not match.
Any ideas what I'm doing wrong?
That's not the right way to define a literal regular expression in Javascript.
Change
var acceptedwords = '/AB|AG|AS|Ltd|KB|University/g'
to
var acceptedwords = /AB|AG|AS|Ltd|KB|University/;
You might notice I removed the g flag : it's useless as you only want to know if there's one match, you don't want to get them all. You don't even have to use match here, you could use test :
var str = 'Hello AB';
if (/AB|AG|AS|Ltd|KB|University/.test(str)) { // Contains the accepted word
console.log("Contains accepted word: " + str);
} else {
console.log("Does not contain accepted word: " + str);
}
If you want to build a regex with strings, assuming none of them contains any special character, you could do
var words = ['AB','AG', ...
var regex = new RegExp(words.join('|'));
If your names may contain special characters, you'll have to use a function to escape them.
If you want your words to not be parts of other words (meaning you don't want to match "ABC") then you should check for words boundaries :
regex = new RegExp(words.map(function(w){ return '\\b'+w+'\\b' }).join('|'),'g');

get detailed substring

i have a problem i'm trying to solve, i have a javascript string (yes this is the string i have)
<div class="stories-title" onclick="fun(4,'this is test'); navigate(1)
What i want to achieve are the following points:
1) cut characters from start until the first ' character (cut the ' too)
2) cut characters from second ' character until the end of the string
3) put what's remaining in a variable
For example, the result of this example would be the string "this is test"
I would be very grateful if anyone have a solution.. Especially a simple one so i can understand it.
Thanks all in advance
You can use split() function:
var mystr = str.split("'")[1];
var newstr = str.replace(/[^']+'([^']+).*/,'$1');
No need to cut anything, you just want to match the string between the first ' and the second ' - see similar questions like Javascript RegExp to find all occurences of a a quoted word in an array
var string = "<div class=\"stories-title\" onclick=\"fun(4,'this is test'); navigate(1)";
var m = string.match(/'(.+?)'/);
if (m)
return m[1]; // the matching group
You can use regular expressions
/\'(.+)\'/
http://rubular.com/r/RcVmejJOmU
http://www.regular-expressions.info/javascript.html
If you want to do the work yourself:
var str = "<div class=\"stories-title\" onclick=\"fun(4,'this is test'); navigate(1)";
var newstr = "";
for (var i = 0; i < str.length; i++) {
if (str[i] == '\'') {
while (str[++i] != '\'') {
newstr += str[i];
}
break;
}
}

Categories

Resources