Substring of a Turkish String

Substring of a Turkish String - javascript

I have a string like this
var element = "İstanbul";
and when I convert it to lower case like this:
var element = element.toLowerCase();
it becomes
"istanbul"
I need the substring of the lower case string "istanbul".
So, when I do this before the lowerCase operation
element.substr(0,2)
the output is correct
but when I do the following it's wrong from which I know substr(0,2) should give "is" instead of i
Why is it happening and how can I correct this?

It is happening because during changing to lower case the string is normalised, and the İ turns into 2 characters: "i" ( http://www.fileformat.info/info/unicode/char/0069/index.htm) and "̇" (the latter is a diacritical mark http://www.fileformat.info/info/unicode/char/0307/index.htm).
To prevent it you may split the string into characters using the ES2015 string iteration facilities and lower case the characters separately:
const arr_l_new = [...str].map(s => s.toLowerCase());
Then you can take the first N characters:
const first_2_chars = arr_l_new.slice(0, 2).join('');
Note: that if you count the length of the first_2_chars you will notice it has the length of 3, due to the diacritic character, which is actually not visible for the lower case i.
var str = "İstanbul";
const arr_l = [...str].map(s => s.toLowerCase());
const first_2_l = arr_l.slice(0, 2).join('');
console.log(first_2_l, first_2_l.length);

try
element.toLowerCase().replace(new RegExp("İ".toLowerCase(), "g"), "i");
instead of
element.toLowerCase();

Related

Replacing letters in specific position in string w/regex(javascript)

I have this regex (regex101).
\b([\w])(\w*?)\1
Substitution: $1$2#
What I need to do is in JavaScript I iterate thru the chosen and guessed words and am currently checking each position for matches. I have a variable i that holds the current position that I would like to substitute the character # for if both letters in the current position match.
Example, If the current words are mummy and the guessed word is mommy after first iteration we would #ummy & #ommy. After checking 3rd position would have #u#my & #o#my. and etc till end would have #u##y & #o##y.
And can I substitute variables into my regex to accommodate different word pairs. So on another iteration if words were apple and maple code would transform two strings to ap#ple & ma#ple. My JavaScript looks like :
const regex = /\b([\w])(\w*?)\1/gm;
const subst = `$1$2#`;
wirdleString = wirdleString.replace(regex, subst);
guessString = guessString.replace(regex, subst);
I'm sorry my regex skills are very weak. Thanks in advance....

First of all: I don't think this is a task for regexps. The problem is much easier solved using iteration.
That said, as an academic exercise, here's one go to match all characters that differ from "mummy":
(?<=.{0})m(?=.{4})|
(?<=.{1})u(?=.{3})|
(?<=.{2})m(?=.{2})|
(?<=.{3})m(?=.{1})|
(?<=.{3})y(?=.{0})
You can create this by a mapping like this:
const string = "mummy";
const length = string.length;
const exp = string.split('').map((char, index) =>
`(?<=.{${index}})${char}(?=.{${length - index -1}})`
).join('|');
const attempt = "momma";
const result = attempt.replace(new RegExp(exp, 'g'), "#");
// '#o##a'
On a general note, I think an iterative approach would be easier and definitely faster:
const result = [...string].map((char, index) =>
attempt[index] == char ? '#' : char
).join('');

JavaScript get first name and last name from string as array

I have a string that has the following format: <strong>FirstName LastName</strong>
How can I change this into an array with the first element firstName and second lastName?
I did this, but no luck, it won't produce the right result:
var data = [myString.split('<strong>')[1], myString.split('<strong>')[2]]
How can I produce ["firstName", "lastName"] for any string with that format?

In order to parse HTML, use the best HTML parser out there, the DOM itself!
// create a random element, it doesn't have to be 'strong' (e.g., it could be 'div')
var parser = document.createElement('strong');
// set the innerHTML to your string
parser.innerHTML = "<strong>FirstName LastName</strong>";
// get the text inside the element ("FirstName LastName")
var fullName = parser.textContent;
// split it into an array, separated by the space in between FirstName and LastName
var data = fullName.split(" ");
// voila!
console.log(data);
EDIT
As #RobG pointed out, you could also explicitly use a DOM parser rather than that of an element:
var parser = new DOMParser();
var doc = parser.parseFromString("<strong>FirstName LastName</strong>", "text/html");
console.log(doc.body.textContent.split(" "));
However, both methods work perfectly fine; it all comes down to preference.

Just match everything between <strong> and </strong>.
var matches = "<strong>FirstName LastName</strong>".match(/<strong>(.*)<\/strong>/);
console.log(matches[1].split(' '));

The preferred approach would be to use DOM methods; create an element and get the .textContent then match one or more word characters or split space character.
let str = '<strong>FirstName LastName</strong>';
let [,first, last] = str.split(/<[/\w\s-]+>|\s/g);
console.log(first, last);
/<[/\w\s-]+>|\s/g
Splits < followed by one or more word, space or dash characters characters followed by > character or space to match space between words in the string.
Comma operator , within destructuring assignment is used to omit that index from the result of .split() ["", "FirstName", "LastName", ""].

this is my approach of doing your problem. Hope it helps!
var str = "<strong>FirstName LastName</strong>";
var result = str.slice(0, -9).substr(8).split(" ");
Edit: it will only work for this specific example.

Another way to do this in case you had something other than an html
var string = "<strong>FirstName LastName</strong>";
string = string.slice(0, -9); // remove last 9 chars
string = string.substr(8); // remove first 8 chars
string = string.split(" "); // split into an array at space
console.log(string);

How can I remove all characters up to and including the 3rd slash in a string?

I'm having trouble with removing all characters up to and including the 3 third slash in JavaScript. This is my string:
http://blablab/test
The result should be:
test
Does anybody know the correct solution?

To get the last item in a path, you can split the string on / and then pop():
var url = "http://blablab/test";
alert(url.split("/").pop());
//-> "test"
To specify an individual part of a path, split on / and use bracket notation to access the item:
var url = "http://blablab/test/page.php";
alert(url.split("/")[3]);
//-> "test"
Or, if you want everything after the third slash, split(), slice() and join():
var url = "http://blablab/test/page.php";
alert(url.split("/").slice(3).join("/"));
//-> "test/page.php"

var string = 'http://blablab/test'
string = string.replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'')
alert(string)
This is a regular expression. I will explain below
The regex is /[\s\S]*\//
/ is the start of the regex
Where [\s\S] means whitespace or non whitespace (anything), not to be confused with . which does not match line breaks (. is the same as [^\r\n]).
* means that we match anywhere from zero to unlimited number of [\s\S]
\/ Means match a slash character
The last / is the end of the regex

var str = "http://blablab/test";
var index = 0;
for(var i = 0; i < 3; i++){
index = str.indexOf("/",index)+1;
}
str = str.substr(index);
To make it a one liner you could make the following:
str = str.substr(str.indexOf("/",str.indexOf("/",str.indexOf("/")+1)+1)+1);

You can use split to split the string in parts and use slice to return all parts after the third slice.
var str = "http://blablab/test",
arr = str.split("/");
arr = arr.slice(3);
console.log(arr.join("/")); // "test"
// A longer string:
var str = "http://blablab/test/test"; // "test/test";

You could use a regular expression like this one:
'http://blablab/test'.match(/^(?:[^/]*\/){3}(.*)$/);
// -> ['http://blablab/test', 'test]
A string’s match method gives you either an array (of the whole match, in this case the whole input, and of any capture groups (and we want the first capture group)), or null. So, for general use you need to pull out the 1th element of the array, or null if a match wasn’t found:
var input = 'http://blablab/test',
re = /^(?:[^/]*\/){3}(.*)$/,
match = input.match(re),
result = match && match[1]; // With this input, result contains "test"

let str = "http://blablab/test";
let data = new URL(str).pathname.split("/").pop();
console.log(data);

split string only on first instance of specified character

In my code I split a string based on _ and grab the second item in the array.
var element = $(this).attr('class');
var field = element.split('_')[1];
Takes good_luck and provides me with luck. Works great!
But, now I have a class that looks like good_luck_buddy. How do I get my javascript to ignore the second _ and give me luck_buddy?
I found this var field = element.split(new char [] {'_'}, 2); in a c# stackoverflow answer but it doesn't work. I tried it over at jsFiddle...

Use capturing parentheses:
'good_luck_buddy'.split(/_(.*)/s)
['good', 'luck_buddy', ''] // ignore the third element
They are defined as
If separator contains capturing parentheses, matched results are returned in the array.
So in this case we want to split at _.* (i.e. split separator being a sub string starting with _) but also let the result contain some part of our separator (i.e. everything after _).
In this example our separator (matching _(.*)) is _luck_buddy and the captured group (within the separator) is lucky_buddy. Without the capturing parenthesis the luck_buddy (matching .*) would've not been included in the result array as it is the case with simple split that separators are not included in the result.
We use the s regex flag to make . match on newline (\n) characters as well, otherwise it would only split to the first newline.

What do you need regular expressions and arrays for?
myString = myString.substring(myString.indexOf('_')+1)
var myString= "hello_there_how_are_you"
myString = myString.substring(myString.indexOf('_')+1)
console.log(myString)

I avoid RegExp at all costs. Here is another thing you can do:
"good_luck_buddy".split('_').slice(1).join('_')

With help of destructuring assignment it can be more readable:
let [first, ...rest] = "good_luck_buddy".split('_')
rest = rest.join('_')

A simple ES6 way to get both the first key and remaining parts in a string would be:
const [key, ...rest] = "good_luck_buddy".split('_')
const value = rest.join('_')
console.log(key, value) // good, luck_buddy

Nowadays String.prototype.split does indeed allow you to limit the number of splits.
str.split([separator[, limit]])
...
limit Optional
A non-negative integer limiting the number of splits. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all.
The array may contain fewer entries than limit if the end of the string is reached before the limit is reached.
If limit is 0, no splitting is performed.
caveat
It might not work the way you expect. I was hoping it would just ignore the rest of the delimiters, but instead, when it reaches the limit, it splits the remaining string again, omitting the part after the split from the return results.
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C"]
I was hoping for:
let str = 'A_B_C_D_E'
const limit_2 = str.split('_', 2)
limit_2
(2) ["A", "B_C_D_E"]
const limit_3 = str.split('_', 3)
limit_3
(3) ["A", "B", "C_D_E"]

This solution worked for me
var str = "good_luck_buddy";
var index = str.indexOf('_');
var arr = [str.slice(0, index), str.slice(index + 1)];
//arr[0] = "good"
//arr[1] = "luck_buddy"
OR
var str = "good_luck_buddy";
var index = str.indexOf('_');
var [first, second] = [str.slice(0, index), str.slice(index + 1)];
//first = "good"
//second = "luck_buddy"

You can use the regular expression like:
var arr = element.split(/_(.*)/)
You can use the second parameter which specifies the limit of the split.
i.e:
var field = element.split('_', 1)[1];

Replace the first instance with a unique placeholder then split from there.
"good_luck_buddy".replace(/\_/,'&').split('&')
["good","luck_buddy"]
This is more useful when both sides of the split are needed.

I need the two parts of string, so, regex lookbehind help me with this.
const full_name = 'Maria do Bairro';
const [first_name, last_name] = full_name.split(/(?<=^[^ ]+) /);
console.log(first_name);
console.log(last_name);

Non-regex solution
I ran some benchmarks, and this solution won hugely:1
str.slice(str.indexOf(delim) + delim.length)
// as function
function gobbleStart(str, delim) {
return str.slice(str.indexOf(delim) + delim.length);
}
// as polyfill
String.prototype.gobbleStart = function(delim) {
return this.slice(this.indexOf(delim) + delim.length);
};
Performance comparison with other solutions
The only close contender was the same line of code, except using substr instead of slice.
Other solutions I tried involving split or RegExps took a big performance hit and were about 2 orders of magnitude slower. Using join on the results of split, of course, adds an additional performance penalty.
Why are they slower? Any time a new object or array has to be created, JS has to request a chunk of memory from the OS. This process is very slow.
Here are some general guidelines, in case you are chasing benchmarks:
New dynamic memory allocations for objects {} or arrays [] (like the one that split creates) will cost a lot in performance.
RegExp searches are more complicated and therefore slower than string searches.
If you already have an array, destructuring arrays is about as fast as explicitly indexing them, and looks awesome.
Removing beyond the first instance
Here's a solution that will slice up to and including the nth instance. It's not quite as fast, but on the OP's question, gobble(element, '_', 1) is still >2x faster than a RegExp or split solution and can do more:
/*
`gobble`, given a positive, non-zero `limit`, deletes
characters from the beginning of `haystack` until `needle` has
been encountered and deleted `limit` times or no more instances
of `needle` exist; then it returns what remains. If `limit` is
zero or negative, delete from the beginning only until `-(limit)`
occurrences or less of `needle` remain.
*/
function gobble(haystack, needle, limit = 0) {
let remain = limit;
if (limit <= 0) { // set remain to count of delim - num to leave
let i = 0;
while (i < haystack.length) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain++;
i = found + needle.length;
}
}
let i = 0;
while (remain > 0) {
const found = haystack.indexOf(needle, i);
if (found === -1) {
break;
}
remain--;
i = found + needle.length;
}
return haystack.slice(i);
}
With the above definition, gobble('path/to/file.txt', '/') would give the name of the file, and gobble('prefix_category_item', '_', 1) would remove the prefix like the first solution in this answer.
Tests were run in Chrome 70.0.3538.110 on macOSX 10.14.

Use the string replace() method with a regex:
var result = "good_luck_buddy".replace(/.*?_/, "");
console.log(result);
This regex matches 0 or more characters before the first _, and the _ itself. The match is then replaced by an empty string.

Javascript's String.split unfortunately has no way of limiting the actual number of splits. It has a second argument that specifies how many of the actual split items are returned, which isn't useful in your case. The solution would be to split the string, shift the first item off, then rejoin the remaining items::
var element = $(this).attr('class');
var parts = element.split('_');
parts.shift(); // removes the first item from the array
var field = parts.join('_');

Here's one RegExp that does the trick.
'good_luck_buddy' . split(/^.*?_/)[1]
First it forces the match to start from the
start with the '^'. Then it matches any number
of characters which are not '_', in other words
all characters before the first '_'.
The '?' means a minimal number of chars
that make the whole pattern match are
matched by the '.*?' because it is followed
by '_', which is then included in the match
as its last character.
Therefore this split() uses such a matching
part as its 'splitter' and removes it from
the results. So it removes everything
up till and including the first '_' and
gives you the rest as the 2nd element of
the result. The first element is "" representing
the part before the matched part. It is
"" because the match starts from the beginning.
There are other RegExps that work as
well like /_(.*)/ given by Chandu
in a previous answer.
The /^.*?_/ has the benefit that you
can understand what it does without
having to know about the special role
capturing groups play with replace().

if you are looking for a more modern way of doing this:
let raw = "good_luck_buddy"
raw.split("_")
.filter((part, index) => index !== 0)
.join("_")

Mark F's solution is awesome but it's not supported by old browsers. Kennebec's solution is awesome and supported by old browsers but doesn't support regex.
So, if you're looking for a solution that splits your string only once, that is supported by old browsers and supports regex, here's my solution:
String.prototype.splitOnce = function(regex)
{
var match = this.match(regex);
if(match)
{
var match_i = this.indexOf(match[0]);
return [this.substring(0, match_i),
this.substring(match_i + match[0].length)];
}
else
{ return [this, ""]; }
}
var str = "something/////another thing///again";
alert(str.splitOnce(/\/+/)[1]);

For beginner like me who are not used to Regular Expression, this workaround solution worked:
var field = "Good_Luck_Buddy";
var newString = field.slice( field.indexOf("_")+1 );
slice() method extracts a part of a string and returns a new string and indexOf() method returns the position of the first found occurrence of a specified value in a string.

This should be quite fast
function splitOnFirst (str, sep) {
const index = str.indexOf(sep);
return index < 0 ? [str] : [str.slice(0, index), str.slice(index + sep.length)];
}
console.log(splitOnFirst('good_luck', '_')[1])
console.log(splitOnFirst('good_luck_buddy', '_')[1])

This worked for me on Chrome + FF:
"foo=bar=beer".split(/^[^=]+=/)[1] // "bar=beer"
"foo==".split(/^[^=]+=/)[1] // "="
"foo=".split(/^[^=]+=/)[1] // ""
"foo".split(/^[^=]+=/)[1] // undefined
If you also need the key try this:
"foo=bar=beer".split(/^([^=]+)=/) // Array [ "", "foo", "bar=beer" ]
"foo==".split(/^([^=]+)=/) // [ "", "foo", "=" ]
"foo=".split(/^([^=]+)=/) // [ "", "foo", "" ]
"foo".split(/^([^=]+)=/) // [ "foo" ]
//[0] = ignored (holds the string when there's no =, empty otherwise)
//[1] = hold the key (if any)
//[2] = hold the value (if any)

a simple es6 one statement solution to get the first key and remaining parts
let raw = 'good_luck_buddy'
raw.split('_')
.reduce((p, c, i) => i === 0 ? [c] : [p[0], [...p.slice(1), c].join('_')], [])

You could also use non-greedy match, it's just a single, simple line:
a = "good_luck_buddy"
const [,g,b] = a.match(/(.*?)_(.*)/)
console.log(g,"and also",b)

Regex using javascript to return just numbers

If I have a string like "something12" or "something102", how would I use a regex in javascript to return just the number parts?

Regular expressions:
var numberPattern = /\d+/g;
'something102asdfkj1948948'.match( numberPattern )
This would return an Array with two elements inside, '102' and '1948948'. Operate as you wish. If it doesn't match any it will return null.
To concatenate them:
'something102asdfkj1948948'.match( numberPattern ).join('')
Assuming you're not dealing with complex decimals, this should suffice I suppose.

You could also strip all the non-digit characters (\D or [^0-9]):
let word_With_Numbers = 'abc123c def4567hij89'
let word_Without_Numbers = word_With_Numbers.replace(/\D/g, '');
console.log(word_Without_Numbers)

For number with decimal fraction and minus sign, I use this snippet:
const NUMERIC_REGEXP = /[-]{0,1}[\d]*[.]{0,1}[\d]+/g;
const numbers = '2.2px 3.1px 4px -7.6px obj.key'.match(NUMERIC_REGEXP)
console.log(numbers); // ["2.2", "3.1", "4", "-7.6"]
Update: - 7/9/2018
Found a tool which allows you to edit regular expression visually: JavaScript Regular Expression Parser & Visualizer.
Update:
Here's another one with which you can even debugger regexp: Online regex tester and debugger.
Update:
Another one: RegExr.
Update:
Regexper and Regex Pal.

If you want only digits:
var value = '675-805-714';
var numberPattern = /\d+/g;
value = value.match( numberPattern ).join([]);
alert(value);
//Show: 675805714
Now you get the digits joined

I guess you want to get number(s) from the string. In which case, you can use the following:
// Returns an array of numbers located in the string
function get_numbers(input) {
return input.match(/[0-9]+/g);
}
var first_test = get_numbers('something102');
var second_test = get_numbers('something102or12');
var third_test = get_numbers('no numbers here!');
alert(first_test); // [102]
alert(second_test); // [102,12]
alert(third_test); // null

IMO the #3 answer at this time by Chen Dachao is the right way to go if you want to capture any kind of number, but the regular expression can be shortened from:
/[-]{0,1}[\d]*[\.]{0,1}[\d]+/g
to:
/-?\d*\.?\d+/g
For example, this code:
"lin-grad.ient(217deg,rgba(255, 0, 0, -0.8), rgba(-255,0,0,0) 70.71%)".match(/-?\d*\.?\d+/g)
generates this array:
["217","255","0","0","-0.8","-255","0","0","0","70.71"]
I've butchered an MDN linear gradient example so that it fully tests the regexp and doesn't need to scroll here. I think I've included all the possibilities in terms of negative numbers, decimals, unit suffixes like deg and %, inconsistent comma and space usage, and the extra dot/period and hyphen/dash characters within the text "lin-grad.ient". Please let me know if I'm missing something. The only thing I can see that it does not handle is a badly formed decimal number like "0..8".
If you really want an array of numbers, you can convert the entire array in the same line of code:
array = whatever.match(/-?\d*\.?\d+/g).map(Number);
My particular code, which is parsing CSS functions, doesn't need to worry about the non-numeric use of the dot/period character, so the regular expression can be even simpler:
/-?[\d\.]+/g

var result = input.match(/\d+/g).join([])

Using split and regex :
var str = "fooBar0123".split(/(\d+)/);
console.log(str[0]); // fooBar
console.log(str[1]); // 0123

The answers given don't actually match your question, which implied a trailing number. Also, remember that you're getting a string back; if you actually need a number, cast the result:
item=item.replace('^.*\D(\d*)$', '$1');
if (!/^\d+$/.test(item)) throw 'parse error: number not found';
item=Number(item);
If you're dealing with numeric item ids on a web page, your code could also usefully accept an Element, extracting the number from its id (or its first parent with an id); if you've an Event handy, you can likely get the Element from that, too.

As per #Syntle's answer, if you have only non numeric characters you'll get an Uncaught TypeError: Cannot read property 'join' of null.
This will prevent errors if no matches are found and return an empty string:
('something'.match( /\d+/g )||[]).join('')

Here is the solution to convert the string to valid plain or decimal numbers using Regex:
//something123.777.321something to 123.777321
const str = 'something123.777.321something';
let initialValue = str.replace(/[^0-9.]+/, '');
//initialValue = '123.777.321';
//characterCount just count the characters in a given string
if (characterCount(intitialValue, '.') > 1) {
const splitedValue = intitialValue.split('.');
//splittedValue = ['123','777','321'];
intitialValue = splitedValue.shift() + '.' + splitedValue.join('');
//result i.e. initialValue = '123.777321'
}

If you want dot/comma separated numbers also, then:
\d*\.?\d*
or
[0-9]*\.?[0-9]*
You can use https://regex101.com/ to test your regexes.

Everything that other solutions have, but with a little validation
// value = '675-805-714'
const validateNumberInput = (value) => {
let numberPattern = /\d+/g
let numbers = value.match(numberPattern)
if (numbers === null) {
return 0
}
return parseInt(numbers.join([]))
}
// 675805714

One liner
I you do not care about decimal numbers and only need the digits, I think this one liner is rather elegant:
/**
* #param {String} str
* #returns {String} - All digits from the given `str`
*/
const getDigitsInString = (str) => str.replace(/[^\d]*/g, '');
console.log([
'?,!_:/42\`"^',
'A 0 B 1 C 2 D 3 E',
' 4 twenty 20 ',
'1413/12/11',
'16:20:42:01'
].map((str) => getDigitsInString(str)));
Simple explanation:
\d matches any digit from 0 to 9
[^n] matches anything that is not n
* matches 0 times or more the predecessor
( It is an attempt to match a whole block of non-digits all at once )
g at the end, indicates that the regex is global to the entire string and that we will not stop at the first occurrence but match every occurrence within it
Together those rules match anything but digits, which we replace by an empty strings. Thus, resulting in a string containing digits only.

Develop Reference

JavaScript is the programming language of the Web.

Substring of a Turkish String - javascript

try element.toLowerCase().replace(new RegExp("İ".toLowerCase(), "g"), "i"); instead of element.toLowerCase();

Related

Replacing letters in specific position in string w/regex(javascript)

JavaScript get first name and last name from string as array

How can I remove all characters up to and including the 3rd slash in a string?

split string only on first instance of specified character

Regex using javascript to return just numbers

Categories

Resources