The most efficient way to trim a file name in JavaScript

The most efficient way to trim a file name in JavaScript - javascript

I was wondering how to trim a file name in JS to show "..." or any appendix for that matter after a certain number of characters, the most efficient way to handle all possible test cases.
Rules
Show the actual file extension and not the last character after splitting the string name with "."
The function should take the input file name (string), the number of characters to trim (integer) and appendix (string) as the parameter.
By efficient, I mean I expect to write fewer lines of code and handle all possible edge cases.
Sample Inputs
myAwesomeFile.min.css
my Awesome File.tar.gz
file.png
Expected output (say I want to trim after 5 characters)
myAwe....min.css
my Aw....tar.gz
file.png
Editing the question to show my attempt
function trimFileName(str, noOfChars, appendix) {
let nameArray = str.split(".");
let fileType = `.${nameArray.pop()}`;
let fileName = nameArray.join(" ");
if (fileName.length >= noOfChars) {
fileName = fileName.substr(0, noOfChars) + appendix;
};
return (fileName + fileType);
}
console.log(trimFileName("myAwesomeFile.min.css", 5, "..."));
console.log(trimFileName("my Awesome File.tar.gz", 5, "..."));
console.log(trimFileName("file.png", 5, "..."));
Edit #2: Feel free to go ahead and edit the question if you think it's not the standard expectation and add more edge cases to the sample inputs and expected outputs.
Edit #3: Added a few more details to the question after the new comments. I know my attempt doesn't fulfill my expected outputs (and I am unsure whether the output I have listed above is a standard expectation or not).
Edit #4 (Final): Removed the rule of not breaking a word in the middle after a continuous backlash in the comments and changed the rules to cater to more realistic and practical use cases.

If we treat the dot character . as a separator for file extensions, what you ask for can be solved with a single line of JavaScript:
name.replace(new RegExp('(^[^\\.]{' + chars + '})[^\\.]+'), '$1' + subst);
Demo code in the following snippet:
function f(name, chars, subst) {
return name.replace(
new RegExp('(^[^\\.]{' + chars + '})[^\\.]+'), '$1' + subst);
}
test('myAwesomeFile.min.css', 5, '...', 'myAwe....min.css');
test('my Awesome File.tar.gz', 5, '...', 'my Aw....tar.gz');
test('file.png', 5, '...', 'file.png');
function test(filename, length, subst, expected) {
let actual = f(filename, length, subst);
console.log(actual, actual === expected ? 'OK' : 'expected: ' + expected);
}
On Windows, AFAIK, the file extension is only what follows the last dot. Thus, technically, the file extension of "myAwesomeFile.min.css" is just css, and the file extension of "my Awesome File.tar.gz" is just gz.
In this case, what you ask for can still be solved with one line of JavaScript:
name.replace(new RegExp('(^.{' + chars + '}).+(\\.[^\\.]*$)'), '$1' + subst + '$2');
Demo code in the following snippet:
function f(name, chars, subst) {
return name.replace(
new RegExp('(^.{' + chars + '}).+(\\.[^\\.]*$)'), '$1' + subst + '$2');
}
test('myAwesomeFile.min.css', 5, '...', 'myAwe....css');
test('my Awesome File.tar.gz', 5, '...', 'my Aw....gz');
test('file.png', 5, '...', 'file.png');
function test(filename, length, subst, expected) {
let actual = f(filename, length, subst);
console.log(actual, actual === expected ? 'OK' : 'expected: ' + expected);
}
If you really want to allow edge cases with specific multiple extensions, you probably need to define a comprehensive list of all allowed multiple extensions to know how to deal with cases like "my.awesome.file.min.css". You would need to provide a list of all cases you want to include before it would be possible to determine how efficient any solution could be.

It is really hard to account for all extensions (including edge cases). See this list for example for common extensions: https://www.computerhope.com/issues/ch001789.htm. Event with that many extensions, the list is exhaustive of all extensions.
Your function is OK but to account for more cases it could be re-written to this:
function trimFileName(filename, limit = 5, spacer = '.') {
const split = filename.indexOf(".");
const name = filename.substring(0, split);
const ext = filename.substring(split);
let trimName = name.substring(0, limit);
if (name.length > trimName.length)
trimName = trimName.padEnd(limit + 3, spacer);
return trimName + ext;
}
console.log(trimFileName("myAwesomeFile.min.css"));
console.log(trimFileName("my Awesome File.tar.gz"));
console.log(trimFileName("file.png"));

Below is a pretty simple approach to achieve shortening in the fashion you desire. Comments are in the code, but let me know if anything needs additional explanation:
//A simple object to hold some configs about how we want to shorten the file names
const config = {
charsBeforeTrim: 5,
seperator: '.',
replacementText: '....'
};
//Given file names to shorten
const files = ['myAwesomeFile.min.css', 'my Awesome File.tar.gz', 'file.png'];
//Function to do the actual file name shortening
const shorten = s =>
s.length > config.charsBeforeTrim ? `${s.substring(0, config.charsBeforeTrim)}${config.replacementText}` : s;
//Function to generate a short file name with extension(s)
const generateShortName = (file, config) => {
//ES6 Array destructuring makes it easy to get the file name in a unique variable while keeping the remaining elements (the extensions) in their own array:
const [name, ...extensions] = file.split(config.seperator);
//Simply append all remaining extension chunks to the shortName
return extensions.reduce((accum, extension) => {
accum += `${config.seperator}${extension}`;
return accum;
}, shorten(name));
};
//Demonstrate usage
const shortFileNames = files.map(file => generateShortName(file, config));
console.log(shortFileNames);

const parse = (filename, extIdx = filename.lastIndexOf('.')) => ({
name: filename.substring(0, extIdx),
extension: filename.substring(extIdx + 1),
})
const trimFileName = (
filename, size = 5, fill = '...',
file = parse(filename),
head = file.name.substring(0, size)
) => file.name.length >= size ? `${head}${fill}${file.extension}` : filename
/* - - - - - - - - - - - - - - - - - - - - - - - - - */
;[
'myAwesomeFile.min.css',
'my.Awesome.File.min.css',
'my Awesome File.tar.gz',
'file.png',
].forEach(f => console.log(trimFileName(f)))

You can fairly straightforwardly pull the your extension condition (easily replaced with a list of valid extensions) and regex pull the last part. Then you just add a check on the filename (starting at the beginning of filename) to trim the result.
const trim = (string, x) => {
// We assume that up to last two . delimited substrings of length are the extension
const extensionRegex = /(?:(\.[a-zA-Z0-9]+){0,2})$/g;
const { index } = extensionRegex.exec(string);
// No point in trimming since result string would be longer than input
if (index + 2 < x) {
return string;
}
return string.substr(0, x) + ".." + string.substr(index);
};
/* Assert that we keep the extension */
console.log("cat.tar.gz", trim("cat.tar.gz", 100) == "cat.tar.gz");
console.log("cat.zip", trim("cat.zip", 100) == "cat.zip");
/* Assert that we keep x characters before trim */
console.log("1234567890cat.tar.gz",!trim("1234567890cat.tar.gz",10).includes("cat"));
console.log("1234567890cat.zip", !trim("1234567890cat.zip", 10).includes("cat"));

Related

Howto detect wide characters in javascript?

I write a small parser for a custom query language which contains Chinese characters. When detecting syntax error, it outputs error message as following:
語法錯誤：應為數，但為字串。
索引 = '3213茂訊'"
^
The last line has only one '^' character to indicate the position of error token. For Chinese characters' visual length occupy two other characters, I need to detect wide character to calculate the '^' position to indicate the right token. Does anyone knows some function can detect the wide character in javascript?

I’m not sure if I understand you correct. But you might want to try the https://www.npmjs.com/package/wcwidth package. Which can be implemented as follows:
import wcwidth from 'wcwidth';
const getCharAtPosition = (str, position) => {
let currPos = 0;
return [...str].find(char => {
const charWidth = wcwidth(char);
const isPosition =
currPos === position || (charWidth === 2 && currPos === position - 1);
currPos += charWidth;
return isPosition;
});
};
const indicatorPos = ' ^'.indexOf('^');
console.log(getCharAtPosition(`索引 = '3213茂訊'"`, indicatorPos));
// will log: '
I didn’t test it, but something like this might work.

Get initials and full last name from a string containing names

Assume there are some strings containing names in different format (each line is a possible user input):
'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'
I need to transform those names to get the format Lastname ABC. So each surename should be transformed to its initial which are appended to the lastname.
The example should result in
Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R
The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for
Some hints to optimize the transformation code
How do I put those in a single function at all? I think first of all I have to test which format the string has...??
So let me explain how far I tried to solve that:
First example string
In the first example there are initials followed by a dot. The dots should be removed and the comma between the name and the initals should be removed.
firstString
.replace('.', '')
.replace(' &', ', ')
I think I do need an regex to get the comma after the name and before the initials.
Second example string
In the second example the name should be splitted by space and the last element is handled as lastname:
const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})
return lastname + ' ' + initals.join('')
...not very elegant
Third example string
The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.

It wouldn't be possible without calling multiple replace() methods. The steps in provided solution is as following:
Remove all dots in abbreviated names
Substitute lastname with firstname
Replace lastnames with their beginning letter
Remove unwanted characters
Demo:
var s = `Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.`
// Remove all dots in abbreviated names
var b = s.replace(/\b([A-Z])\./g, '$1')
// Substitute first names and lastnames
.replace(/([A-Z][\w-]+(?: +[A-Z][\w-]+)*) +([A-Z][\w-]+)\b/g, ($0, $1, $2) => {
// Replace full lastnames with their first letter
return $2 + " " + $1.replace(/\b([A-Z])\w+ */g, '$1');
})
// Remove unwanted preceding / following commas and ampersands
.replace(/(,) +([A-Z]+)\b *[,&]?/g, ' $2$1');
console.log(b);

Given your example data i would try to make guesses based on name part count = 2, since it is very hard to rely on any ,, & or \n - which means treat them all as ,.
Try this against your data and let me know of any use-cases where this fails because i am highly confident that this script will fail at some point with more data :)
let testString = "Guilcher, G.M., Harvey, M. & Hand, J.P.\nRi Liesner, Peter Tom Collins, Michael Richards\nManco-Johnson M, Santagostino E, Ljung R.";
const inputToArray = i => i
.replace(/\./g, "")
.replace(/[\n&]/g, ",")
.replace(/ ?, ?/g, ",")
.split(',');
const reducer = function(accumulator, value, index, array) {
let pos = accumulator.length - 1;
let names = value.split(' ');
if(names.length > 1) {
accumulator.push(names);
} else {
if(accumulator[pos].length > 1) accumulator[++pos] = [];
accumulator[pos].push(value);
}
return accumulator.filter(n => n.length > 0);
};
console.log(inputToArray(testString).reduce(reducer, [[]]));

Here's my approach. I tried to keep it short but complexity was surprisingly high to get the edge cases.
First I'm formatting the input, to replace & for ,, and removing ..
Then, I'm splitting the input by \n, then , and finally (spaces).
Next I'm processing the chunks. On each new segment (delimited by ,), I process the previous segment. I do this because I need to be sure that the current segment isn't an initial. If that's the case, I do my best to skip that inital-only segment and process the previous one. The previous one will have the correct initial and surname, as I have all the information I neeed.
I get the initial on the segment if there's one. This will be used on the start of the next segment to process the current one.
After finishing each line, I process again the last segment, as it wont be called otherwise.
I understand the complexity is high without using regexp, and probably would have been better to use a state machine to parse the input instead.
const isInitial = s => [...s].every(c => c === c.toUpperCase());
const generateInitial = arr => arr.reduce((a, c, i) => a + (i < arr.length - 1 ? c[0].toUpperCase() : ''), '');
const formatSegment = (words, initial) => {
if (!initial) {
initial = generateInitial(words);
}
const surname = words[words.length - 1];
return {initial, surname};
}
const doDisplay = x => x.map(x => x.surname + ' ' + x.initial).join(', ');
const doProcess = _ => {
const formatted = input.value.replace(/\./g, '').replace(/&/g, ',');
const chunks = formatted.split('\n').map(x => x.split(',').map(x => x.trim().split(' ')));
const peoples = [];
chunks.forEach(line => {
let lastSegment = null;
let lastInitial = null;
let lastInitialOnly = false;
line.forEach(segment => {
if (lastSegment) {
// if segment only contains an initial, it's the initial corresponding
// to the previous segment
const initialOnly = segment.length === 1 && isInitial(segment[0]);
if (initialOnly) {
lastInitial = segment[0];
}
// avoid processing last segments that were only initials
// this prevents adding a segment twice
if (!lastInitialOnly) {
// if segment isn't an initial, we need to generate an initial
// for the previous segment, if it doesn't already have one
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
lastInitialOnly = initialOnly;
// Skip initial only segments
if (initialOnly) {
return;
}
}
lastInitial = null;
// Remove the initial from the words
// to avoid getting the initial calculated for the initial
segment = segment.filter(word => {
if (isInitial(word)) {
lastInitial = word;
return false;
}
return true;
});
lastSegment = segment;
});
// Process last segment
if (!lastInitialOnly) {
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
});
return peoples;
}
process.addEventListener('click', _ => {
const peoples = doProcess();
const display = doDisplay(peoples);
output.value = display;
});
.row {
display: flex;
}
.row > * {
flex: 1 0;
}
<div class="row">
<h3>Input</h3>
<h3>Output</h3>
</div>
<div class="row">
<textarea id="input" rows="10">Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.
Jordan M, Michael Jackson & Willis B.</textarea>
<textarea id="output" rows="10"></textarea>
</div>
<button id="process" style="display: block;">Process</button>

Look for substring in a string with at most one different character-javascript

I am new in programing and right now I am working on one program. Program need to find the substring in a string and return the index where the chain starts to be the same. I know that for that I can use "indexOf". Is not so easy. I want to find out substrings with at moste one different char.
I was thinking about regular expresion... but not really know how to use it because I need to use regular expresion for every element of the string. Here some code wich propably will clarify what I want to do:
var A= "abbab";
var B= "ba";
var tb=[];
console.log(A.indexOf(B));
for (var i=0;i<B.length; i++){
var D=B.replace(B[i],"[a-z]");
tb.push(A.indexOf(D));
}
console.log(tb);
I know that the substring B and string A are the lowercase letters. Will be nice to get any advice how to make it using regular expresions. Thx
Simple Input:
A B
1) abbab ba
2) hello world
3) banana nan
Expected Output:
1) 1 2
2) No Match!
3) 0 2

While probably theoretically possible, I think it would very complicated to try this kind of search while attempting to incorporate all possible search query options in one long complex regular expression. I think a better approach is to use JavaScript to dynamically create various simpler options and then search with each separately.
The following code sequentially replaces each character in the initial query string with a regular expression wild card (i.e. a period, '.') and then searches the target string with that. For example, if the initial query string is 'nan', it will search with '.an', 'n.n' and 'na.'. It will only add the position of the hit to the list of hits if that position has not already been hit on a previous search. i.e. It ensures that the list of hits contains only unique values, even if multiple query variations found a hit at the same location. (This could be implemented even better with ES6 sets, but I couldn't get the Stack Overflow code snippet tool to cooperate with me while trying to use a set, even with the Babel option checked.) Finally, it sorts the hits in ascending order.
Update: The search algorithm has been updated/corrected. Originally, some hits were missed because the exec search for any query variation would only iterate as per the JavaScript default, i.e. after finding a match, it would start the next search at the next character after the end of the previous match, e.g. it would find 'aa' in 'aaaa' at positions 0 and 2. Now it starts the next search at the next character after the start of the previous match, e.g. it now finds 'aa' in 'aaaa' at positions 0, 1 and 2.
const findAllowingOneMismatch = (target, query) => {
const numLetters = query.length;
const queryVariations = [];
for (let variationNum = 0; variationNum < numLetters; variationNum += 1) {
queryVariations.push(query.slice(0, variationNum) + "." + query.slice(variationNum + 1));
};
let hits = [];
queryVariations.forEach(queryVariation => {
const re = new RegExp(queryVariation, "g");
let myArray;
while ((searchResult = re.exec(target)) !== null) {
re.lastIndex = searchResult.index + 1;
const hit = searchResult.index;
// console.log('found a hit with ' + queryVariation + ' at position ' + hit);
if (hits.indexOf(hit) === -1) {
hits.push(searchResult.index);
}
}
});
hits = hits.sort((a,b)=>(a-b));
console.log('Found "' + query + '" in "' + target + '" at positions:', JSON.stringify(hits));
};
[
['abbab', 'ba'],
['hello', 'world'],
['banana', 'nan'],
['abcde abcxe abxxe xbcde', 'abcd'],
['--xx-xxx--x----x-x-xxx--x--x-x-xx-', '----']
].forEach(pair => {findAllowingOneMismatch(pair[0], pair[1])});

check if a string array element is a sub string of a URL

I am working on a URL whitelist in a browser extension. What I have currently works but I need to check the list in two places and I want to try to make it more efficient so as to reduce the chances of increased page load times.
I have to check the list in two places. The first check is in a page mod with an attached content script which is applied to all sites, the content script is changed if the url is in the whitelist. The second check is in a request observer to send different headers if the url is whitelisted.
I have tried to only check it once and pass the result from the page mod to the requst observer or from the request observer to the page mod, it results in timing issues with either the headers not being correct or the modifictaions to the content script are not applied, when they should be.
Is there a way I can improve on the substring checking code below to make it faster
I have a list of user entered sites which are sorted alphabetically before saving.
For now the format of the list is simple.
example1.com
b.example2.com/some/content.html
c.exampleN.com
and the url could be
http://example1.com/some/site/content.html
I am currently checking the if the url contains a substring with the value of each array element
//check if a url is in the list
function listCheck(list,url){
for (var i=0; i<list.length; i++){
if (url.indexOf(list[i]) > -1)
return true;
}
return false;
};

You can use binary search with the first letter of the URL. This will come handy because whitelists can grow pretty fast. However you cannot do this with patterns. (e.g.: *.somedomain.com)
Consider about using a hashtable to store the whitelist. You can make it efficient and specialized by writing your own hash function.
Regex will make things easier, but will also make things slow at times. If you use regex, make sure you know what you are doing. You can shrink the comparison list first by one of the methods described above.
Edit: Here's the binary search I was talking about. This is applicable only if wildcards are not used.
function binarySearch(needle, haystack, startIndex, endIndex) {
//console.log("\ttrying to find " + needle + " between " +
// haystack[startIndex] + "(" + startIndex + ") and " +
// haystack[endIndex] + "(" + endIndex + ")");
// the basic case, where the list is narrowed down to 1 or 2 items
if (startIndex == endIndex || endIndex - startIndex == 1) {
if (haystack[startIndex] == needle)
return startIndex;
if (haystack[endIndex] == needle)
return endIndex;
return -1;
}
var midIndex = Math.ceil((startIndex + endIndex) / 2);
//console.log("\t\tgot " + haystack[midIndex] + "(" + midIndex +
// ") for middle of the list.");
var comparison = haystack[midIndex].localeCompare(needle);
//console.log("\t\tcomparison: " + comparison);
if (comparison > 0)
return binarySearch(needle, haystack, startIndex, midIndex);
if (comparison < 0)
return binarySearch(needle, haystack, midIndex, endIndex);
return midIndex; // (comparison == 0)
}
var sitelist = [ // the whitelist (the haystack).
"alpha.com",
"bravo.com",
"charlie.com",
"delta.com",
"echo.com",
"foxtrot.com",
"golf.com",
"hotel.com",
"india.com",
"juliet.com",
"kilo.com",
"lima.com",
"mike.com",
"november.com",
"oscar.com",
"papa.com",
"quebec.com",
"romeo.com",
"sierra.com",
"tango.com",
"uniform.com",
"victor.com",
"whiskey.com",
"xray.com",
"yankee.com",
"zulu.com"
];
function testBinarySearch(needle) {
console.log("trying to find " + needle);
var foundIndex = binarySearch(needle, sitelist, 0, sitelist.length - 1);
if (foundIndex < 0)
console.log(needle + " not found");
else
console.log(needle + " found at: " + foundIndex);
}
// note that the list is already sorted. if the list is not sorted,
// haystack.sort();
// we can find "uniform.com" using 5 comparisons, instead of 20
testBinarySearch("uniform.com");
// we can confirm the non-existance of "google.com" in 4 comparisons, not 26
testBinarySearch("google.com");
// this is an interesting (worst) case, it takes 5 comparisons, instead of 1
testBinarySearch("alpha.com");
// "zulu.com" takes 4 comparisons instead of 26
testBinarySearch("zulu.com");
When your list grows, binary search can scale very well. I would not go to other pros and cons of binary search since they are very well documented in large number of places.
More SO questions on JavaScript binary search:
Binary Search in Javascript
Searching a Binary Tree in JavaScript
Binary Search Code
Binary search in JSON object
javascript binary search tree implementation

Using a regexp will make things easier. Whit this code you just need to make ONE comparison.
function listCheck(list, url) {
var exp = new RegExp('(' + list.join('|') + ')');
if (exp.test(url)) return true;
else return false;
}
EDIT: you can get errors with symbols . or / or - in urls, so this code works better:
function listCheck(list, url) {
var exp = new RegExp('(' + list.join('|').replace(/(\/|\.|\-)/g, '\\$1') + ')');
if (exp.test(url)) return true;
else return false;
}

How can I get file extensions with JavaScript?

See code:
var file1 = "50.xsl";
var file2 = "30.doc";
getFileExtension(file1); //returns xsl
getFileExtension(file2); //returns doc
function getFileExtension(filename) {
/*TODO*/
}

Newer Edit: Lots of things have changed since this question was initially posted - there's a lot of really good information in wallacer's revised answer as well as VisioN's excellent breakdown
Edit: Just because this is the accepted answer; wallacer's answer is indeed much better:
return filename.split('.').pop();
My old answer:
return /[^.]+$/.exec(filename);
Should do it.
Edit: In response to PhiLho's comment, use something like:
return (/[.]/.exec(filename)) ? /[^.]+$/.exec(filename) : undefined;

return filename.split('.').pop();
Edit:
This is another non-regex solution that I think is more efficient:
return filename.substring(filename.lastIndexOf('.')+1, filename.length) || filename;
There are some corner cases that are better handled by VisioN's answer below, particularly files with no extension (.htaccess etc included).
It's very performant, and handles corner cases in an arguably better way by returning "" instead of the full string when there's no dot or no string before the dot. It's a very well crafted solution, albeit tough to read. Stick it in your helpers lib and just use it.
Old Edit:
A safer implementation if you're going to run into files with no extension, or hidden files with no extension (see VisioN's comment to Tom's answer above) would be something along these lines
var a = filename.split(".");
if( a.length === 1 || ( a[0] === "" && a.length === 2 ) ) {
return "";
}
return a.pop(); // feel free to tack .toLowerCase() here if you want
If a.length is one, it's a visible file with no extension ie. file
If a[0] === "" and a.length === 2 it's a hidden file with no extension ie. .htaccess
This should clear up issues with the slightly more complex cases. In terms of performance, I think this solution is a little slower than regex in most browsers. However, for most common purposes this code should be perfectly usable.

The following solution is fast and short enough to use in bulk operations and save extra bytes:
return fname.slice((fname.lastIndexOf(".") - 1 >>> 0) + 2);
Here is another one-line non-regexp universal solution:
return fname.slice((Math.max(0, fname.lastIndexOf(".")) || Infinity) + 1);
Both work correctly with names having no extension (e.g. myfile) or starting with . dot (e.g. .htaccess):
"" --> ""
"name" --> ""
"name.txt" --> "txt"
".htpasswd" --> ""
"name.with.many.dots.myext" --> "myext"
If you care about the speed you may run the benchmark and check that the provided solutions are the fastest, while the short one is tremendously fast:
How the short one works:
String.lastIndexOf method returns the last position of the substring (i.e. ".") in the given string (i.e. fname). If the substring is not found method returns -1.
The "unacceptable" positions of dot in the filename are -1 and 0, which respectively refer to names with no extension (e.g. "name") and to names that start with dot (e.g. ".htaccess").
Zero-fill right shift operator (>>>) if used with zero affects negative numbers transforming -1 to 4294967295 and -2 to 4294967294, which is useful for remaining the filename unchanged in the edge cases (sort of a trick here).
String.prototype.slice extracts the part of the filename from the position that was calculated as described. If the position number is more than the length of the string method returns "".
If you want more clear solution which will work in the same way (plus with extra support of full path), check the following extended version. This solution will be slower than previous one-liners but is much easier to understand.
function getExtension(path) {
var basename = path.split(/[\\/]/).pop(), // extract file name from full path ...
// (supports `\\` and `/` separators)
pos = basename.lastIndexOf("."); // get last position of `.`
if (basename === "" || pos < 1) // if file name is empty or ...
return ""; // `.` not found (-1) or comes first (0)
return basename.slice(pos + 1); // extract extension ignoring `.`
}
console.log( getExtension("/path/to/file.ext") );
// >> "ext"
All three variants should work in any web browser on the client side and can be used in the server side NodeJS code as well.

function getFileExtension(filename)
{
var ext = /^.+\.([^.]+)$/.exec(filename);
return ext == null ? "" : ext[1];
}
Tested with
"a.b" (=> "b")
"a" (=> "")
".hidden" (=> "")
"" (=> "")
null (=> "")
Also
"a.b.c.d" (=> "d")
".a.b" (=> "b")
"a..b" (=> "b")

There is a standard library function for this in the path module:
import path from 'path';
console.log(path.extname('abc.txt'));
Output:
.txt
So, if you only want the format:
path.extname('abc.txt').slice(1) // 'txt'
If there is no extension, then the function will return an empty string:
path.extname('abc') // ''
If you are using Node, then path is built-in. If you are targetting the browser, then Webpack will bundle a path implementation for you. If you are targetting the browser without Webpack, then you can include path-browserify manually.
There is no reason to do string splitting or regex.

function getExt(filename)
{
var ext = filename.split('.').pop();
if(ext == filename) return "";
return ext;
}

var extension = fileName.substring(fileName.lastIndexOf('.')+1);

If you are dealing with web urls, you can use:
function getExt(filepath){
return filepath.split("?")[0].split("#")[0].split('.').pop();
}
getExt("../js/logic.v2.min.js") // js
getExt("http://example.net/site/page.php?id=16548") // php
getExt("http://example.net/site/page.html#welcome.to.me") // html
getExt("c:\\logs\\yesterday.log"); // log
Demo: https://jsfiddle.net/squadjot/q5ard4fj/

var parts = filename.split('.');
return parts[parts.length-1];

function file_get_ext(filename)
{
return typeof filename != "undefined" ? filename.substring(filename.lastIndexOf(".")+1, filename.length).toLowerCase() : false;
}

Code
/**
* Extract file extension from URL.
* #param {String} url
* #returns {String} File extension or empty string if no extension is present.
*/
var getFileExtension = function (url) {
"use strict";
if (url === null) {
return "";
}
var index = url.lastIndexOf("/");
if (index !== -1) {
url = url.substring(index + 1); // Keep path without its segments
}
index = url.indexOf("?");
if (index !== -1) {
url = url.substring(0, index); // Remove query
}
index = url.indexOf("#");
if (index !== -1) {
url = url.substring(0, index); // Remove fragment
}
index = url.lastIndexOf(".");
return index !== -1
? url.substring(index + 1) // Only keep file extension
: ""; // No extension found
};
Test
Notice that in the absence of a query, the fragment might still be present.
"https://www.example.com:8080/segment1/segment2/page.html?foo=bar#fragment" --> "html"
"https://www.example.com:8080/segment1/segment2/page.html#fragment" --> "html"
"https://www.example.com:8080/segment1/segment2/.htaccess?foo=bar#fragment" --> "htaccess"
"https://www.example.com:8080/segment1/segment2/page?foo=bar#fragment" --> ""
"https://www.example.com:8080/segment1/segment2/?foo=bar#fragment" --> ""
"" --> ""
null --> ""
"a.b.c.d" --> "d"
".a.b" --> "b"
".a.b." --> ""
"a...b" --> "b"
"..." --> ""
JSLint
0 Warnings.

Fast and works correctly with paths
(filename.match(/[^\\\/]\.([^.\\\/]+)$/) || [null]).pop()
Some edge cases
/path/.htaccess => null
/dir.with.dot/file => null
Solutions using split are slow and solutions with lastIndexOf don't handle edge cases.

// 获取文件后缀名
function getFileExtension(file) {
var regexp = /\.([0-9a-z]+)(?:[\?#]|$)/i;
var extension = file.match(regexp);
return extension && extension[1];
}
console.log(getFileExtension("https://www.example.com:8080/path/name/foo"));
console.log(getFileExtension("https://www.example.com:8080/path/name/foo.BAR"));
console.log(getFileExtension("https://www.example.com:8080/path/name/.quz/foo.bar?key=value#fragment"));
console.log(getFileExtension("https://www.example.com:8080/path/name/.quz.bar?key=value#fragment"));

i just wanted to share this.
fileName.slice(fileName.lastIndexOf('.'))
although this has a downfall that files with no extension will return last string.
but if you do so this will fix every thing :
function getExtention(fileName){
var i = fileName.lastIndexOf('.');
if(i === -1 ) return false;
return fileName.slice(i)
}

"one-liner" to get filename and extension using reduce and array destructuring :
var str = "filename.with_dot.png";
var [filename, extension] = str.split('.').reduce((acc, val, i, arr) => (i == arr.length - 1) ? [acc[0].substring(1), val] : [[acc[0], val].join('.')], [])
console.log({filename, extension});
with better indentation :
var str = "filename.with_dot.png";
var [filename, extension] = str.split('.')
.reduce((acc, val, i, arr) => (i == arr.length - 1)
? [acc[0].substring(1), val]
: [[acc[0], val].join('.')], [])
console.log({filename, extension});
// {
// "filename": "filename.with_dot",
// "extension": "png"
// }

There's also a simple approach using ES6 destructuring:
const path = 'hello.world.txt'
const [extension, ...nameParts] = path.split('.').reverse();
console.log('extension:', extension);

function extension(fname) {
var pos = fname.lastIndexOf(".");
var strlen = fname.length;
if (pos != -1 && strlen != pos + 1) {
var ext = fname.split(".");
var len = ext.length;
var extension = ext[len - 1].toLowerCase();
} else {
extension = "No extension found";
}
return extension;
}
//usage
extension('file.jpeg')
always returns the extension lower cas so you can check it on field change
works for:
file.JpEg
file (no extension)
file. (noextension)

This simple solution
function extension(filename) {
var r = /.+\.(.+)$/.exec(filename);
return r ? r[1] : null;
}
Tests
/* tests */
test('cat.gif', 'gif');
test('main.c', 'c');
test('file.with.multiple.dots.zip', 'zip');
test('.htaccess', null);
test('noextension.', null);
test('noextension', null);
test('', null);
// test utility function
function test(input, expect) {
var result = extension(input);
if (result === expect)
console.log(result, input);
else
console.error(result, input);
}
function extension(filename) {
var r = /.+\.(.+)$/.exec(filename);
return r ? r[1] : null;
}

I'm sure someone can, and will, minify and/or optimize my code in the future. But, as of right now, I am 200% confident that my code works in every unique situation (e.g. with just the file name only, with relative, root-relative, and absolute URL's, with fragment # tags, with query ? strings, and whatever else you may decide to throw at it), flawlessly, and with pin-point precision.
For proof, visit: https://projects.jamesandersonjr.com/web/js_projects/get_file_extension_test.php
Here's the JSFiddle: https://jsfiddle.net/JamesAndersonJr/ffcdd5z3/
Not to be overconfident, or blowing my own trumpet, but I haven't seen any block of code for this task (finding the 'correct' file extension, amidst a battery of different function input arguments) that works as well as this does.
Note: By design, if a file extension doesn't exist for the given input string, it simply returns a blank string "", not an error, nor an error message.
It takes two arguments:
String: fileNameOrURL (self-explanatory)
Boolean: showUnixDotFiles (Whether or Not to show files that begin with a dot ".")
Note (2): If you like my code, be sure to add it to your js library's, and/or repo's, because I worked hard on perfecting it, and it would be a shame to go to waste. So, without further ado, here it is:
function getFileExtension(fileNameOrURL, showUnixDotFiles)
{
/* First, let's declare some preliminary variables we'll need later on. */
var fileName;
var fileExt;
/* Now we'll create a hidden anchor ('a') element (Note: No need to append this element to the document). */
var hiddenLink = document.createElement('a');
/* Just for fun, we'll add a CSS attribute of [ style.display = "none" ]. Remember: You can never be too sure! */
hiddenLink.style.display = "none";
/* Set the 'href' attribute of the hidden link we just created, to the 'fileNameOrURL' argument received by this function. */
hiddenLink.setAttribute('href', fileNameOrURL);
/* Now, let's take advantage of the browser's built-in parser, to remove elements from the original 'fileNameOrURL' argument received by this function, without actually modifying our newly created hidden 'anchor' element.*/
fileNameOrURL = fileNameOrURL.replace(hiddenLink.protocol, ""); /* First, let's strip out the protocol, if there is one. */
fileNameOrURL = fileNameOrURL.replace(hiddenLink.hostname, ""); /* Now, we'll strip out the host-name (i.e. domain-name) if there is one. */
fileNameOrURL = fileNameOrURL.replace(":" + hiddenLink.port, ""); /* Now finally, we'll strip out the port number, if there is one (Kinda overkill though ;-)). */
/* Now, we're ready to finish processing the 'fileNameOrURL' variable by removing unnecessary parts, to isolate the file name. */
/* Operations for working with [relative, root-relative, and absolute] URL's ONLY [BEGIN] */
/* Break the possible URL at the [ '?' ] and take first part, to shave of the entire query string ( everything after the '?'), if it exist. */
fileNameOrURL = fileNameOrURL.split('?')[0];
/* Sometimes URL's don't have query's, but DO have a fragment [ # ](i.e 'reference anchor'), so we should also do the same for the fragment tag [ # ]. */
fileNameOrURL = fileNameOrURL.split('#')[0];
/* Now that we have just the URL 'ALONE', Let's remove everything to the last slash in URL, to isolate the file name. */
fileNameOrURL = fileNameOrURL.substr(1 + fileNameOrURL.lastIndexOf("/"));
/* Operations for working with [relative, root-relative, and absolute] URL's ONLY [END] */
/* Now, 'fileNameOrURL' should just be 'fileName' */
fileName = fileNameOrURL;
/* Now, we check if we should show UNIX dot-files, or not. This should be either 'true' or 'false'. */
if ( showUnixDotFiles == false )
{
/* If not ('false'), we should check if the filename starts with a period (indicating it's a UNIX dot-file). */
if ( fileName.startsWith(".") )
{
/* If so, we return a blank string to the function caller. Our job here, is done! */
return "";
};
};
/* Now, let's get everything after the period in the filename (i.e. the correct 'file extension'). */
fileExt = fileName.substr(1 + fileName.lastIndexOf("."));
/* Now that we've discovered the correct file extension, let's return it to the function caller. */
return fileExt;
};
Enjoy! You're Quite Welcome!:

Try this:
function getFileExtension(filename) {
var fileinput = document.getElementById(filename);
if (!fileinput)
return "";
var filename = fileinput.value;
if (filename.length == 0)
return "";
var dot = filename.lastIndexOf(".");
if (dot == -1)
return "";
var extension = filename.substr(dot, filename.length);
return extension;
}

If you are looking for a specific extension and know its length, you can use substr:
var file1 = "50.xsl";
if (file1.substr(-4) == '.xsl') {
// do something
}
JavaScript reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr

I just realized that it's not enough to put a comment on p4bl0's answer, though Tom's answer clearly solves the problem:
return filename.replace(/^.*?\.([a-zA-Z0-9]+)$/, "$1");

For most applications, a simple script such as
return /[^.]+$/.exec(filename);
would work just fine (as provided by Tom). However this is not fool proof. It does not work if the following file name is provided:
image.jpg?foo=bar
It may be a bit overkill but I would suggest using a url parser such as this one to avoid failure due to unpredictable filenames.
Using that particular function, you could get the file name like this:
var trueFileName = parse_url('image.jpg?foo=bar').file;
This will output "image.jpg" without the url vars. Then you are free to grab the file extension.

function func() {
var val = document.frm.filename.value;
var arr = val.split(".");
alert(arr[arr.length - 1]);
var arr1 = val.split("\\");
alert(arr1[arr1.length - 2]);
if (arr[1] == "gif" || arr[1] == "bmp" || arr[1] == "jpeg") {
alert("this is an image file ");
} else {
alert("this is not an image file");
}
}

I'm many moons late to the party but for simplicity I use something like this
var fileName = "I.Am.FileName.docx";
var nameLen = fileName.length;
var lastDotPos = fileName.lastIndexOf(".");
var fileNameSub = false;
if(lastDotPos === -1)
{
fileNameSub = false;
}
else
{
//Remove +1 if you want the "." left too
fileNameSub = fileName.substr(lastDotPos + 1, nameLen);
}
document.getElementById("showInMe").innerHTML = fileNameSub;
<div id="showInMe"></div>

A one line solution that will also account for query params and any characters in url.
string.match(/(.*)\??/i).shift().replace(/\?.*/, '').split('.').pop()
// Example
// some.url.com/with.in/&ot.s/files/file.jpg?spec=1&.ext=jpg
// jpg

return filename.replace(/\.([a-zA-Z0-9]+)$/, "$1");
edit: Strangely (or maybe it's not) the $1 in the second argument of the replace method doesn't seem to work... Sorry.

fetchFileExtention(fileName) {
return fileName.slice((fileName.lastIndexOf(".") - 1 >>> 0) + 2);
}

Wallacer's answer is nice, but one more checking is needed.
If file has no extension, it will use filename as extension which is not good.
Try this one:
return ( filename.indexOf('.') > 0 ) ? filename.split('.').pop().toLowerCase() : 'undefined';

Don't forget that some files can have no extension, so:
var parts = filename.split('.');
return (parts.length > 1) ? parts.pop() : '';

Develop Reference

JavaScript is the programming language of the Web.

The most efficient way to trim a file name in JavaScript - javascript

Related

Howto detect wide characters in javascript?

Get initials and full last name from a string containing names

Look for substring in a string with at most one different character-javascript

check if a string array element is a sub string of a URL

How can I get file extensions with JavaScript?

Categories

Resources