How to use regexp without built-in regexp in js? - javascript

In a weird case of mine, I cannot use the built-in regexp in js when I try to inject js to WebView.
What could be my second best thing to use? I basically use regexp to detect:
line-feed(someString.match(/\n/))
cartridge return(someString.match(/\r/))
split string into words(manyWords.split(/\s+/))
But other ways to achieve regexps without js built-in regexp will be appreciated as well.

Use javascript's built-in string.prototype.indexOf and string.prototype.split, there's really no need for regex for those cases.
For line feed you can use someString.indexOf('\n') > -1
For cartridge return you can use someString.indexOf('\r') > -1
For splitting to words you can use manyWords.split(' ')
Incase you want the split a string that contains line breaks to words, you need to nest the split like this:
manyWords.split('\n').flatMap(function(line) { return line.split(' ');});

Related

Efficiently remove common patterns from a string

I am trying to write a function to calculate how likely two strings are to mean the same thing. In order to do this I am converting to lower case and removing special characters from the strings before I compare them. Currently I am removing the strings '.com' and 'the' using String.replace(substring, '') and special characters using String.replace(regex, '')
str = str.toLowerCase()
.replace('.com', '')
.replace('the', '')
.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Is there a better regex that I can use to remove the common patterns like '.com' and 'the' as well as the special characters? Or some other way to make this more efficient?
As my dataset grows I may find other common meaningless patterns that need to be removed before trying to match strings and would like to avoid the performance hit of chaining more replace functions.
Examples:
Fish & Chips? => fish chips
stackoverflow.com => stackoverflow
The Lord of the Rings => lord of rings
You can connect the replace calls to a single one with a rexexp like this:
str = str.toLowerCase().replace(/\.com|the|[&\/\\#,+()$~%.'":*?<>{}]/g, '');
The different strings to remove are inside parentheses () and separated by pipes |
This makes it easy enough to add more string to the regexp.
If you are storing the words to remove in an array, you can generate the regex using the RegExp constructor, e.g.:
var words = ["\\.com", "the"];
var rex = new RegExp(words.join("|") + "|[&\\/\\\\#,+()$~%.'\":*?<>{}]", "g");
Then reuse rex for each string:
str = str.toLowerCase().replace(rex, "");
Note the additional escaping required because instead of a regular expression literal, we're using a string, so the backslashes (in the words array and in the final bit) need to be escaped, as does the " (because I used " for the string quotes).
The problem with this question is that im sure you have a very concrete idea in your mind of what you want to do, but the solution you have arrived at (removing un-informative letters before making a is-identical comparison) may not be the best for the comparison you want to do.
I think perhaps a better idea would be to use a different method comparison and a different datastructure than a string. A very simple example would be to condense your strings to sets with set('string') and then compare set similarity/difference. Another method might be to create a Directed Acyclic Graph, or sub-string Trei. The main point is that it's probably ok to reduce the information from the original string and store/compare that - however don't underestimate the value of storing the original string, as it will help you down the road if you want to change the way you compare.
Finally, if your strings are really really really long, you might want to use a perceptual hash - which is like an MD5 hash except similar strings have similar hashes. However, you will most likely have to roll your own for short strings, and define what you think is important data, and what is superfluous.

Split string with various delimiters while keeping delimiters

I have the following string:
"dogs#cats^horses^fish!birds"
How can I get the following array back?
['dogs','#cats','^horses','^fish','!birds']
Essentially I am trying to split the string while keeping the delimeters. I've tried string.match with no avail.
Assuming those are your only separators then you can do this:
var string = "dogs#cats^horses^fish!birds";
string.replace(/(#|\^|!)/g, '|$1').split('|');
We basically add our own separator, in this case | and split it based on that.
This does what you want:
str.match(/((^|[^\w])\w+)/g)
Without more test cases though, it's hard to say how reliable it would be.
This is also assuming a large set of possible delimiters. If it's a small fixed amount, Samer's solution would be a good way to go

Replace a string's special characters using regex

the string looks something like below:
/1/2/3/4 however I want to replace this with ?1=2&3=4.
I am planning to use REReplace in ColdFusion.
Could you suggest me a regex for this ?I also thought of using loops but stuck either way...
Thanks in Advance
A bit cumbersome without making it more manageable using a loop as #Leigh suggested; but you can use the following on string inputs that contain even occurrences of n/m in the format you described:
var s = "/1/2/3/4/5/6";
s.replace(/^\//,'?').replace(/(\d+)\/(\d+)/g,'$1=$2').replace(/\//g,'&')
// => "?1=2&3=4&5=6"

Non-capturing groups in Javascript regex

I am matching a string in Javascript against the following regex:
(?:new\s)(.*)(?:[:])
The string I use the function on is "new Tag:var;"
What it suppod to return is only "Tag" but instead it returns an array containing "new Tag:" and the desired result as well.
I found out that I might need to use a lookbehind instead but since it is not supported in Javascript I am a bit lost.
Thank you in advance!
Well, I don't really get why you make such a complicated regexp for what you want to extract:
(?:new\\s)(.*)(?:[:])
whereas it can be solved using the following:
s = "new Tag:";
var out = s.replace(/new\s([^:]*):.*;/, "$1")
where you got only one capturing group which is the one you're looking for.
\\s (double escaping) is only needed for creating RegExp instance.
Also your regex is using greedy pattern in .* which may be matching more than desired.
Make it non-greedy:
(?:new\s)(.*?)(?:[:])
OR better use negation:
(?:new\s)([^:]*)(?:[:])

Javascript replace with part of match?

I'm looking over replace() examples and I'm not exactly sure the best way to do this:
Say I have a string something like
{G}{J}{L}...
What's the best way to use string.replace() to change the inner and outer brackets but leave the letter inside them? Do I need to do separate matches for the outer and inner brackets or is it possible/faster to do it in a single statement?
I see that $ can get the whole match and I guess I could remove the first and last characters and replace them after but that seems slow.
> "{G}{J}{L}".replace(/{(.)}/g,"$1")
"GJL"
Is this what you're after? Or maybe this?
> "{G}{J}{L}".replace(/{(.)}/g,"[$1]")
"[G][J][L]"
One pretty straightforward way is to just perform the replacements separately, and the performance difference should be negligible unless your strings are huge:
var string = "{G}{J}{L}";
string = string.replace(/\{/g, "<").replace(/}/g, ">")

Categories

Resources