Mixed results with White Spaces, and add a dash in Javascript? - javascript

How do you combine eliminating white-spaces and special characters with only a single '-' character?
Here's a little Background:
When publishing a job to my career section for my company, the ATS will turn a job title for the URL, e.g if a job title is:
Olympia, WA: SLP Full or Part Time it will become olympia-wa-slp-full-or-part-time
I've experimented from other similar questions, but have only come close with this bit of code:
function newTitle(str) {
var x = str.replace(/[\W]/g, '-').toLowerCase();
return x;
now if I run it, the output generated is olympia--wa--slp-full-or-part-time
(has 2 dashes from the extra spaces). What am I not getting right?
I've tried the other following bits:
str.replace(/\s+/g, '');
and
str.replaceAll("[^a-zA-Z]+", " ");
but neither get close to the desired format.
Thanks!

You got pretty close in your first example, just add + after [\W] to match one or more non-word characters. You can also give it a try in Regexr
function newTitle(str) {
var x = str.replace(/[\W]+/g, '-').toLowerCase();
return x;
}
alert(newTitle('Olympia, WA: SLP Full or Part Time'));

What you actually want, it looks like, is to create a slug from a string.
Here is a nice reusable function that also takes care of multiple dashes:
function slugify(s) {
s = s.replace(/[^\w\s-]/g, '').trim().toLowerCase();
s = s.replace(/[-\s]+/g, '-');
return s;
}
console.log(
slugify("Olympia, WA: SLP Full or Part Time")
);

Your last example [^a-zA-Z]+ almost works if you use a dash as the replacement. This uses a negated character class to match not what you specified so that would include whitespaces and special characters.
Note that if you have a job with for example a digit or an underscore that that would also be replaced. Your could expand the character class with what you don't want to be replaced like [^a-zA-Z0-9]+ or if you also want to keep the underscore \W+ as that would match [^a-zA-Z0-9_]
function newTitle(str) {
return str.replace(/[^a-zA-Z]+/g, '-').toLowerCase();
}
console.log(newTitle("Olympia, WA: SLP Full or Part Time"));

Related

Replace all slashes on a line with a match on at the beginning of the same line

I'm trying to change all slashes on a line to replace with the 3-characters block at the beginninig of each line. (PMC,PAJ, etc in below example)
.PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00
.PAJ.77998XX/77896XX.T00
.PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00
.PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00
The desired outcome should be:
PMC.89569XXPMC90051XXPMC90204XXPMC89533XXPMC90554XXPMC90053XXPMC90215XXPMC89874XXPMC89974XXPMC90481XXPMC90221XXPMC90508XXPMC90183XXPMC88526XXPMC89843XXPMC88041XXPMC90446XXPMC88515XXPMC89574XXPMC89847XXPMC88616XXPMC90513XXPMC90015XXPMC90334XXPMC89649XX.T00
I'm not sure how to accomplish this.
This is what I have so far:
(.)([A-Z]{3})(.)(\/)
If you only plan to support ECMAScript 2018 and newer, you may achieve what you need with a single regex:
.replace(/(?<=^\.([^.]+)\..*?)\//g, "$1")
See the regex demo.
Details
(?<=^\.([^.]+)\..*?) - a positive lookbehind that, immediately to left of the current location, requires
^ - start of string
\. - a dot
([^.]+) - Group 1: one or more chars other than a dot
\. - a dot
.*? - any 0+ chars, other than linebreak chars, as few as possible
\/ - a / char.
JS demo:
var strs = ['.PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00','.PAJ.77998XX/77896XX.T00','.PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00','.PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00'];
for (var s of strs) {
console.log(s.replace(/(?<=^\.([^.]+)\..*?)\//g, "$1"));
}
I am not sure if you can do it with just one regex and you will have to probably do it as a two step process. First, you can capture the three capital letters using substring() method and then you can replace all slashes with those three letter appearing in the beginning of character after first dot. Here is a demo with JS code,
function transformLine(s) {
var repStr = s.substring(1,4);
var replacedStr = s.replace(/\//g, repStr);
return replacedStr.substring(1,replacedStr.length);
}
var lines = [".PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00", ".PAJ.77998XX/77896XX.T00", ".PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00", ".PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00"];
for (var i = 0;i<lines.length;i++) {
console.log("Before: " + lines[i]);
console.log("After: " + transformLine(lines[i])+"\n\n");
}
I've replaced the first dot as your expected output does not have it.
Let me know if this works for you.
Edit:
I have updated the code to provide a function that takes a string as input and returns the modified string. Please check the demo.
Edit2: Solving it mostly using regex
This one liner in the function does all the job for you in transforming your line to the required one.
function transformLine(s) {
return s.replace(/\//g, /^.(.{3})/.exec(s)[1]).replace(/^./,'');
}
var lines = [".PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00", ".PAJ.77998XX/77896XX.T00", ".PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00", ".PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00"];
for (var i = 0;i<lines.length;i++) {
console.log("Before: " + lines[i]);
console.log("After: " + transformLine(lines[i])+"\n\n");
}
As you can see here, this line,
return s.replace(/\//g, /^.(.{3})/.exec(s)[1]).replace(/^./,'');
does all the job you need. It first extracts the three capital letter using this /^.(.{3})/.exec(s)[1] then all slashes are replaced with this captured word and then finally first character which is a dot is removed using this /^./,'' and finally returns the string you need.
Let me know if this is what you wanted. Else let me know if you further wanted it in any particular way.

Uppercase for each new word swedish characters and html markup

I was pointed out to this post, which does not seem to follow the criteria I have:
Replace a Regex capture group with uppercase in Javascript
I am trying to make a regex that will:
format a string by adding uppercase for the first letter of each word and lower case for the rest of the characters
ignore HTML markup
Accept swedish characters (åäöÅÄÖ)
Say I've got this string:
<b>app</b>le store östersund
Then I want it to be (changes marked by uppercase characters)
<b>App</b>le Store Östersund
I've been playing around with it and the closest I've got is the following:
(?!([^<])*?>)[åäöÅÄÖ]|\s\b\w
Resulted in
<b>app</b>le Store Östersund
Or this
/(?!([^<])*?>)[åäöÅÄÖ]|\S\b\w/g
Resulted in
<B>App</B>Le store Östersund
Here's a fiddle:
http://refiddle.com/refiddles/598aabef75622d4a531b0000
Any help or advice is much appreciated.
It is not possible to do this with regexp alone, since regexp doesn't understand HTML structure. [*] Instead, we need to process each text node, and carry through our logic for what is the beginning of the word in case a word continues across different text nodes. A character is at start of the word if it is preceded by a whitespace, or if it is at the start of the string and it is either the first text node, or the previous text node ended in whitespace.
function htmlToTitlecase(html, letters) {
let div = document.createElement('div');
let re = new RegExp("(^|\\s)([" + letters + "])", "gi");
div.innerHTML = html;
let treeWalker = document.createTreeWalker(div, NodeFilter.SHOW_TEXT);
let startOfWord = true;
while (treeWalker.nextNode()) {
let node = treeWalker.currentNode;
node.data = node.data.replace(re, function(match, space, letter) {
if (space || startOfWord) {
return space + letter.toUpperCase();
} else {
return match;
}
});
startOfWord = node.data.match(/\s$/);
}
return div.innerHTML;
}
console.log(htmlToTitlecase("<b>app</b>le store östersund", "a-zåäö"));
// <b>App</b>le Store Östersund
[*] Maybe possible, but even if so, it would be horribly ugly, since it would need to cover an awful amount of corner cases. Also might need a stronger RegExp engine than JavaScript's, like Ruby's or Perl's.
EDIT:
Even if just specifying really simple html tags? The only ones I am actually in need of covering is <b> and </b> at the moment.
This was not specified in the question. The solution is general enough to work for any markup (including simple tags). But...
function simpleHtmlToTitlecaseSwedish(html) {
return html.replace(/(^|\s)(<\/?b>|)([a-zåäö])/gi, function(match, space, tag, letter) {
return space + tag + letter.toUpperCase();
});
}
console.log(simpleHtmlToTitlecaseSwedish("<b>app</b>le store östersund", "a-zåäö"));
I have a solution which use almost only regex. It may be not the most intuitive way to do it, but it should be effective and I find it funny :)
You have to append at the end of your string every lowercase character followed by their uppercase counterpart, like this (it must also be preceded by a space for my regex) :
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZåÅäÄöÖ
(I don't know which letters are missing, I know nothing about swedish alphabet, sorry... I'm counting on you to correct that !)
Then you can use the following regex :
(?![^<]*>)(\s<[^/]*?>|\s|^)([\wåäö])(?=.*\2(.)\S*$)|[\wåÅäÄöÖ]+$
Replace by :
$1$3
Test it here
Here is a working javascript code :
// Initialization
var regex = /(?![^<]*>)(\s<[^/]*?>|\s|^)([\wåäö])(?=.*\2(.)\S*$)|[\wåÅäÄöÖ]+$/g;
var string = "test <b when=\"2>1\">ap<i>p</i></b>le store östersund";
// Processing
result = string + " aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZåÅäÄöÖ";
result = result.replace(regex, "$1$3");
// Display result
console.log(result);
Edit : I forgot to handle first word of the string, it's corrected :)

Capitalize every letter after / and - characters

I'm trying to capitalize every letter after a / or - character. Meaning, if given the string
this/is/a/pretty-cool/url
Its expected output would look like
This/Is/A/Pretty-Cool/Url
My code:
string = string.replace(/\/(\b[a-z](?!\s))/g, function(i,e) { return '/'+e.toUpperCase() });
Which currently returns
this/Is/A/Pretty-cool/Url
Not quite there, obviously.
How can I get this to work as expected?
Here you have a simple solution:
string = string.replace(/(^|\/|-)(\S)/g, s=>s.toUpperCase())
You just match one character after either the start of the string, a / or a -. It's simple because there's no problem uppercasing one of those chars ('/'.toUpperCase() is '/').
Now, let's imagine that you don't want to uppercase the first part (maybe it's different in your real problem, maybe you care about that poor function which has to uppercase a "/"), then you would have used submatches like this:
string = string.replace(/(^|\/|-)(\S)/g, (_,a,b)=>a+b.toUpperCase())
(but you don't have to go to such extremities here)
Starting from your code you have missing the - char.
So, changing the code to support the char you can use:
var string = string.replace(/(^|\/|-)(\b[a-z](?!\s))/g, function(i,e) { return i.charAt(0)+e.toUpperCase() });
or
var string = string.replace(/(^|\/|-)(\b[a-z](?!\s))/g, function(i,e) { return i.toUpperCase() });
Here's another variant, which uppercases after any non-word character, and also at the start of the string:
string = string.replace(/(^|\W)(\w)/g, (match, a, b) => a + b.toUpperCase());
(or using the same short cut as #DenysSéguret):
string = string.replace(/(^|\W)(\w)/g, s => s.toUpperCase());

AngularJS filter to remove a certain regular expression

I am attempting to make an angularJS filter which will remove timestamps that look like this: (##:##:##) or ##:##:##.
This is a filter to remove all letters:
.filter('noLetter', function() {
//this filter removes all letters
return function removeLetters(string){
return string.replace(/[^0-9]+/g, " ");
}
})
This is my attempt to make a filter that removes the time stamps, however it is not working, help is much appreciated.
.filter('noStamps', function () {
return function removeStamps(item) {
return item.replace(/^\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]\)$/i, "");
}
})
My goal is for it to delete the timestamps it finds and leave nothing in their place.
edit based on question in comments:
The time stamps are in the text so it would say "this is an example 21:20:19 of what I am 21:20:20 trying to do 21:20:22"
I would want this to be converted into "this is an example of what I am trying to do" by the filter.
You may use
/\s*\(?\b\d{2}:\d{2}:\d{2}\b\)?/g
See regex demo
Thre main points:
The ^(start of string) and $(end of string) anchors should be removed so that the expression becomes unanchored, and can match input text partially.
Global flag to match all occurrences
Limiting quantifier {2} to shorten the regex (and the use of a shorthand class \d helps shorten it, too)
\)? and \(? are used with ?quantifier to match 1 or 0 occurrences of the round brackets.
\s* in the beginning "trims" the result (as the leading whitespace is matched).
JS snippet:
var str = 'this is an example (21:20:19) of what I am 21:20:20 trying to do 21:20:22';
var result = str.replace(/\s*\(?\b\d{2}:\d{2}:\d{2}\b\)?/g, '');
document.getElementById("r").innerHTML = result;
<div id="r"/>

Javascript/Jquery - how to replace a word but only when not part of another word?

I am currently doing a regex comparison to remove words (rude words) from a text field when written by the user. At the moment it performs the check when the user hits space and removes the word if matches. However it will remove the word even if it is part of another word. So if you type apple followed by space it will be removed, that's ok. But if you type applepie followed by space it will remove 'apple' and leave pie, that's not ok. I am trying to make it so that in this instance if apple is part of another word it will not be removed.
Is there any way I can perform the comparison on the whole word only or ignore the comparison if it is combined with other characters?
I know that this allows people to write many rude things with no space. But that is the desired effect by the people that give me orders :(
Thanks for any help.
function rude(string) {
var regex = /apple|pear|orange|banana/ig;
//exaple words because I'm sure you don't need to read profanity
var updatedString = string.replace( regex, function(s) {
var blank = "";
return blank;
});
return updatedString;
}
$(input).keyup(function(event) {
var text;
if (event.keyCode == 32) {
var text = rude($(this).val());
$(this).val(text);
$("someText").html(text);
}
}
You can use word boundaries (\b), which match 0 characters, but only at the beginning or end of a word. I'm also using grouping (the parentheses), so it's easier to read an write such expressions.
var regex = /\b(apple|pear|orange|banana)\b/ig;
BTW, in your example you don't need to use a function. This is sufficient:
function rude(string) {
var regex = /\b(apple|pear|orange|banana)\b/ig;
return string.replace(regex, '');
}

Categories

Resources