I'm trying to capitalize every letter after a / or - character. Meaning, if given the string
this/is/a/pretty-cool/url
Its expected output would look like
This/Is/A/Pretty-Cool/Url
My code:
string = string.replace(/\/(\b[a-z](?!\s))/g, function(i,e) { return '/'+e.toUpperCase() });
Which currently returns
this/Is/A/Pretty-cool/Url
Not quite there, obviously.
How can I get this to work as expected?
Here you have a simple solution:
string = string.replace(/(^|\/|-)(\S)/g, s=>s.toUpperCase())
You just match one character after either the start of the string, a / or a -. It's simple because there's no problem uppercasing one of those chars ('/'.toUpperCase() is '/').
Now, let's imagine that you don't want to uppercase the first part (maybe it's different in your real problem, maybe you care about that poor function which has to uppercase a "/"), then you would have used submatches like this:
string = string.replace(/(^|\/|-)(\S)/g, (_,a,b)=>a+b.toUpperCase())
(but you don't have to go to such extremities here)
Starting from your code you have missing the - char.
So, changing the code to support the char you can use:
var string = string.replace(/(^|\/|-)(\b[a-z](?!\s))/g, function(i,e) { return i.charAt(0)+e.toUpperCase() });
or
var string = string.replace(/(^|\/|-)(\b[a-z](?!\s))/g, function(i,e) { return i.toUpperCase() });
Here's another variant, which uppercases after any non-word character, and also at the start of the string:
string = string.replace(/(^|\W)(\w)/g, (match, a, b) => a + b.toUpperCase());
(or using the same short cut as #DenysSéguret):
string = string.replace(/(^|\W)(\w)/g, s => s.toUpperCase());
Related
I'm receiving a list of files in an object and I just need to display a file name and its type in a table.
All files come back from a server in such format: timestamp_id_filename.
Example: 1568223848_12345678_some_document.pdf
I wrote a helper function which cuts the string.
At first, I did it with String.prototype.split() method, I used regex, but then again - there was a problem. Files can have underscores in their names so that didn't work, so I needed something else. I couldn't come up with a better idea. I think it looks really dumb and it's been haunting me the whole day.
The function looks like this:
const shortenString = (attachmentName) => {
const file = attachmentName
.slice(attachmentName.indexOf('_') + 1)
.slice(attachmentName.slice(attachmentName.indexOf('_') + 1).indexOf('_') + 1);
const fileName = file.slice(0, file.lastIndexOf('.'));
const fileType = file.slice(file.lastIndexOf('.'));
return [fileName, fileType];
};
I wonder if there is a more elegant way to solve the problem without using loops.
You can use replace and split, with the pattern we are replacing the string upto the second _ from start of string and than we split on . to get name and type
let nameAndType = (str) => {
let replaced = str.replace(/^(?:[^_]*_){2}/g, '')
let splited = replaced.split('.')
let type = splited.pop()
let name = splited.join('.')
return {name,type}
}
console.log(nameAndType("1568223848_12345678_some_document.pdf"))
console.log(nameAndType("1568223848_12345678_some_document.xyz.pdf"))
function splitString(val){
return val.split('_').slice('2').join('_');
}
const getShortString = (str) => str.replace(/^(?:[^_]*_){2}/g, '')
For input like
1568223848_12345678_some_document.pdf, it should give you something like some_document.pdf
const re = /(.*?)_(.*?)_(.*)/;
const name = "1568223848_12345678_some_document.pdf";
[,date, id, filename] = re.exec(name);
console.log(date);
console.log(id);
console.log(filename);
some notes:
you want to make the regular expression 1 time. If you do this
function getParts(str) {
const re = /expression/;
...
}
Then you're making a new regular expression object every time you call getParts.
.*? is faster than .*
This is because .* is greedy so the moment the regular expression engine sees that it puts the entire rest of the string into that slot and then checks if can continue the expression. If it fails it backs off one character. If that fails it backs off another character, etc.... .*? on the other hand is satisfied as soon as possible. So it adds one character then sees if the next part of the expression works, if not it adds one more character and sees if the expressions works, etc..
splitting on '_' works but it could potentially make many temporary strings
for example if the filename is 1234_1343_a________________________.pdf
you'd have to test to see if using a regular experssion is faster or slower than splitting, assuming speed matters.
You can kinda chain .indexOf to get second offset and any further, although more than two would look ugly. The reason is that indexOf takes start index as second argument, so passing index of the first occurrence will help you find the second one:
var secondUnderscoreIndex = name.indexOf("_",name.indexOf("_")+1);
So my solution would be:
var index = name.indexOf("_",name.indexOf("_")+1));
var [timestamp, name] = [name.substring(0, index), name.substr(index+1)];
Alternatively, using regular expression:
var [,number1, number2, filename, extension] = /([0-9]+)_([0-9]+)_(.*?)\.([0-9a-z]+)/i.exec(name)
// Prints: "1568223848 12345678 some_document pdf"
console.log(number1, number2, filename, extension);
I like simplicity...
If you ever need the date in times, theyre in [1] and [2]
var getFilename = function(str) {
return str.match(/(\d+)_(\d+)_(.*)/)[3];
}
var f = getFilename("1568223848_12345678_some_document.pdf");
console.log(f)
If ever files names come in this format timestamp_id_filename. You can use a regular expression that skip the first two '_' and save the nex one.
test:
var filename = '1568223848_12345678_some_document.pdf';
console.log(filename.match(/[^_]+_[^_]+_(.*)/)[1]); // result: 'some_document.pdf'
Explanation:
/[^]+[^]+(.*)/
[^]+ : take characters diferents of ''
: take '' character
Repeat so two '_' are skiped
(.*): Save characters in a group
match method: Return array, his first element is capture that match expression, next elements are saved groups.
Split the file name string into an array on underscores.
Discard the first two elements of the array.
Join the rest of the array with underscores.
Now you have your file name.
How do you combine eliminating white-spaces and special characters with only a single '-' character?
Here's a little Background:
When publishing a job to my career section for my company, the ATS will turn a job title for the URL, e.g if a job title is:
Olympia, WA: SLP Full or Part Time it will become olympia-wa-slp-full-or-part-time
I've experimented from other similar questions, but have only come close with this bit of code:
function newTitle(str) {
var x = str.replace(/[\W]/g, '-').toLowerCase();
return x;
now if I run it, the output generated is olympia--wa--slp-full-or-part-time
(has 2 dashes from the extra spaces). What am I not getting right?
I've tried the other following bits:
str.replace(/\s+/g, '');
and
str.replaceAll("[^a-zA-Z]+", " ");
but neither get close to the desired format.
Thanks!
You got pretty close in your first example, just add + after [\W] to match one or more non-word characters. You can also give it a try in Regexr
function newTitle(str) {
var x = str.replace(/[\W]+/g, '-').toLowerCase();
return x;
}
alert(newTitle('Olympia, WA: SLP Full or Part Time'));
What you actually want, it looks like, is to create a slug from a string.
Here is a nice reusable function that also takes care of multiple dashes:
function slugify(s) {
s = s.replace(/[^\w\s-]/g, '').trim().toLowerCase();
s = s.replace(/[-\s]+/g, '-');
return s;
}
console.log(
slugify("Olympia, WA: SLP Full or Part Time")
);
Your last example [^a-zA-Z]+ almost works if you use a dash as the replacement. This uses a negated character class to match not what you specified so that would include whitespaces and special characters.
Note that if you have a job with for example a digit or an underscore that that would also be replaced. Your could expand the character class with what you don't want to be replaced like [^a-zA-Z0-9]+ or if you also want to keep the underscore \W+ as that would match [^a-zA-Z0-9_]
function newTitle(str) {
return str.replace(/[^a-zA-Z]+/g, '-').toLowerCase();
}
console.log(newTitle("Olympia, WA: SLP Full or Part Time"));
Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!
Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)
You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!
Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));
My problem start with like-
var str='0|31|2|03|.....|4|2007'
str=str.replace(/[^|]\d*[^|]/,'5');
so the output becomes like:"0|5|2|03|....|4|2007" so it replaces 31->5
But this doesn't work for replacing other segments when i change code like this:
str=str.replace(/[^|]{2}\d*[^|]/,'6');
doesn't change 2->6.
What actually i am missing here.Any help?
I think a regular expression is a bad solution for that problem. I'd rather do something like this:
var str = '0|31|2|03|4|2007';
var segments = str.split("|");
segments[1] = "35";
segments[2] = "123";
Can't think of a good way to solve this with a regexp.
Here is a specific regex solution which replaces the number following the first | pipe symbol with the number 5:
var re = /^((?:\d+\|){1})\d+/;
return text.replace(re, '$15');
If you want to replace the digits following the third |, simply change the {1} portion of the regex to {3}
Here is a generalized function that will replace any given number slot (zero-based index), with a specified new number:
function replaceNthNumber(text, n, newnum) {
var re = new RegExp("^((?:\\d+\\|){"+ n +'})\\d+');
return text.replace(re, '$1'+ newnum);
}
Firstly, you don't have to escape | in the character set, because it doesn't have any special meaning in character sets.
Secondly, you don't put quantifiers in character sets.
And finally, to create a global matching expression, you have to use the g flag.
[^\|] means anything but a '|', so in your case it only matches a digit. So it will only match anything with 2 or more digits.
Second you should put the {2} outside of the []-brackets
I'm not sure what you want to achieve here.
I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.
If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}
I think you want is
var subStrings = inputString.split('ATG');
KISS :)
Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.
Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });
This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.