Remove last occurence of invisible unicode character using regex?

Remove last occurence of invisible unicode character using regex? - javascript

I have string my string 󠀀, there is an invisible character \u{E0000} at the end of this string, I wanted to know how I can use regex to remove this character so that if I were to split the string using .split(' '), it would say the length is 2 and not 3 which is what it is showing right now.
This is the regex I am currently using to remove the character, however when I split the string it still shows the length is 3 and not 2. The split would like look ['my', 'string'].
.replace(/[\u034f\u2800(\u{E0000})\u180e\ufeff\u2000-\u200d\u206D]/gu, '');

The invisible character you have there is 2 code points, so you need to replace a sequence of 2 unicode escapes: \u{e0000}\u{dc00}.
However, you also seem to be misunderstanding the way split works. If you have a space at the end of the string, it will still try to split it into a separate element. See below example where there is no special character following:
// removing the special character so the length of string is 10 with my string
console.log(
"my string 󠀀".length,
"my string 󠀀".replace(/[\u034f\u2800(\u{e0000}\u{dc00})\u180e\ufeff\u2000-\u200d\u206D]/gu, '')
.length
);
console.log(
// use trim to remove trailing space so that it behaves the way you want
"my string 󠀀".replace(/[\u034f\u2800(\u{e0000}\u{dc00})\u180e\ufeff\u2000-\u200d\u206D]/gu, '')
.trim().split(' ')
);
// notice that it still tries to split the final into a 3rd element.
console.log( //\u0020 is the hex code for space
("my string" + "\u0020").split(' ')
);
Note that you may need to adjust your Regex. I haven't checked, but it is highly likely that the unicode characters you are using are not correct, and do not take into account multi-codepoint characters.
I've created a function below for extracting full escape sequences.
var codePoints = (char, pos, end) => Array(char.length).fill(0).map((_,i)=>char.codePointAt(i)).slice(pos||0, end)
//some code point values stop iterator; use length instead
var escapeSequence = (codes, pos, end) => codePoints(codes, pos,end).map(p=>`\\u{${p.toString(16)}}`).join('')
document.getElementById('btn').onclick=()=>{
const text = document.getElementById('text').value
const start = +document.getElementById('start').value
const end = document.getElementById('end').value||undefined
document.getElementById('result').innerHTML = escapeSequence(text,start,end)
}
console.log(
escapeSequence('1️⃣')
)
console.log(
escapeSequence("󠀀"),
)
console.log(
escapeSequence("my string 󠀀",10)
)
<label for="text">unicode text: </label><input type="text" id="text"><br>
<label for="start">start position to retrieve from: </label><input type="number" id="start"><br>
<label for="end">end position to retrieve from: </label><input type="number" id="end"><br>
<button id="btn">get unicode escaped code points</button><br>
<div id="result"></div>

Related

How do you move the first n characters of every line in a string to the end of the same line (not to end of string)?

The string fileString contains multiple lines of characters, like this:
1234a6b4ba21ba54f6bde411930b0b1ec6df
3124a6b4ba21ba54f6bde411930b0b1ef248
2134a6b4ba21ba54f6bde411900b89f7dcf3
4123a6b4ba21ba54f6bde411920bbf835b60
I'd like to move the first 4 characters of every line to the end of its respective line, like this:
a6b4ba21ba54f6bde411930b0b1ec6df1234
a6b4ba21ba54f6bde411930b0b1ef2483124
a6b4ba21ba54f6bde411900b89f7dcf32134
a6b4ba21ba54f6bde411920bbf835b604123
I saw another post with a proposed solution, but that code moves the first 4 characters of the string to the end of the string, which is not what I'm trying to do.
So with this code:
var num = 4
fileString = fileString.substring(num) + fileString.substring(0, num)
The initial string stated above turns into this:
a6b4ba21ba54f6bde411930b0b1ec6df
3124a6b4ba21ba54f6bde411930b0b1ef248
2134a6b4ba21ba54f6bde411900b89f7dcf3
4123a6b4ba21ba54f6bde411920bbf835b60
1234

No need for array manipulations
A simple String.replace() using regex capture groups seems to provide the shortest solution:
string.replace(/(\w{4})(\w{32})/g, '$2$1');
Definitions
() = defines the capture groups
\w = select alphanumeric characters
{n} = takes n characters
$n = tokens representing the unnamed capture groups in sequence
And "$2$1" is creating a replacement string by appending the text of the first capture group to the end of the second.
For more details see: regex101 Fiddle
Snippet
Try the code to see how it works. You may also add brackets and spaces around the numbers and it works the same.
stdout.value = stdin.value.replace(/(\w{4})(\w{32})/g, '$2$1');
textarea {
width: 80%;
height: 5rem;
display: block;
margin-bottom: 1rem;
background-color: aliceblue;
}
Input:
<textarea id="stdin">
1234a6b4ba21ba54f6bde411930b0b1ec6df
3124a6b4ba21ba54f6bde411930b0b1ef248
2134a6b4ba21ba54f6bde411900b89f7dcf3
4123a6b4ba21ba54f6bde411920bbf835b60
</textarea>
Output:
<textarea id="stdout"></textarea>

Split the string into lines and then rejoin them later:
const string = `\
1234a6b4ba21ba54f6bde411930b0b1ec6df
3124a6b4ba21ba54f6bde411930b0b1ef248
2134a6b4ba21ba54f6bde411900b89f7dcf3
4123a6b4ba21ba54f6bde411920bbf835b60`;
const result = string
.split("\n")
.map((line) => line.substring(4) + line.substring(0, 4))
.join("\n");
console.log(result);

const str = document.querySelector('div')
// Get the text (from wherever)
.textContent
// Split on the line-break
.split('\n')
// Filter out empty strings
.filter(l => l.length)
// Map over the array that `filter` returns
// and move the characters around
.map(str => `${str.slice(4, -4)}${str.slice(0, 4)}`)
// Join the array up with line breaks
.join('\n');
console.log(str);
<div>
1234a6b4ba21ba54f6bde411930b0b1ec6df
3124a6b4ba21ba54f6bde411930b0b1ef248
2134a6b4ba21ba54f6bde411900b89f7dcf3
4123a6b4ba21ba54f6bde411920bbf835b60
</div>

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.

I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}

You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);

You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

Split string by all spaces except those in parentheses

I'm trying to split text the following like on spaces:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}"
but I want it to ignore the spaces within parentheses. This should produce an array with:
var words = ["Text", "(what is)|what's", "a", "story|fable" "called|named|about", "{Search}|{Title}"];
I know this should involve some sort of regex with line.match(). Bonus points if the regex removes the parentheses. I know that word.replace() would get rid of them in a subsequent step.

Use the following approach with specific regex pattern(based on negative lookahead assertion):
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}",
words = line.split(/(?!\(.*)\s(?![^(]*?\))/g);
console.log(words);
(?!\(.*) ensures that a separator \s is not preceded by brace ((including attendant characters)
(?![^(]*?\)) ensures that a separator \s is not followed by brace )(including attendant characters)

Not a single regexp but does the job. Removes the parentheses and splits the text by spaces.
var words = line.replace(/[\(\)]/g,'').split(" ");

One approach which is useful in some cases is to replace spaces inside parens with a placeholder, then split, then unreplace:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}";
var result = line.replace(/\((.*?)\)/g, m => m.replace(' ', 'SPACE'))
.split(' ')
.map(x => x.replace(/SPACE/g, ' '));
console.log(result);

single regex to capitalize first letter and replace dot

Trying out with a regex for simple problem. My input string is
firstname.ab
And am trying to output it as,
Firstname AB
So the main aim is to capitalize the first letter of the string and replace the dot with space. So chose to write two regex to solve.
First One : To replace dot with space /\./g
Second One : To capitalize the first letter /\b\w/g
And my question is, Can we do both operation with a single regex ?
Thanks in advance !!

You can use a callback function inside the replace:
var str = 'firstname.ab';
var result = str.replace(/^([a-zA-Z])(.*)\.([^.]+)$/, function (match, grp1, grp2, grp3, offset, s) {
return grp1.toUpperCase() + grp2 + " " + grp3.toUpperCase();
});
alert(result);
The grp1, grp2 and grp3 represent the capturing groups in the callback function. grp1 is a leading letter ([a-zA-Z]). Then we capturing any number of character other than newline ((.*) - if you have linebreaks, use [\s\S]*). And then comes the literal dot \. that we do not capture since we want to replace it with a space. And lastly, the ([^.]+$) regex will match and the capture all the remaining substring containing 1 or more characters other then a literal dot till the end.
We can use capturing groups to re-build the input string this way.

var $input = $('#input'),
value = $input.val(),
value = value.split( '.' );
value[0] = value[0].charAt( 0 ).toUpperCase() + value[0].substr(1),
value[1] = value[1].toUpperCase(),
value = value.join( ' ' );
$input.val( value );
It would be much easier if you simply split the value, process the string in the array, and join them back.
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" value="first.ab" id="input">

javascript regular expression test for 6 digit numbers only. comma seperated

and so this must pass:
454555, 939999 , 019999 ,727663
its for a user entering 6 digit invoice numbers. it should fail if a number is 5 or 7 digit and not 6. so 1234567, 123456 should fail, as one set is more than 6 numbers.
So far I have :
[0-9]{6}(\s*,*,\s*[0-9]{6})*
which only draw back is that it accepts 7 or more digit numbers. cant figure out if its even possible at this point to do both, test for 6 digits separated by a comma and one or more space, and all the digits have to be only 6 digits and fail if one is not.
any help appreciated. regular expressions are not my forte.
thanks
Norman

You can write it using regex like the function below.
const isPassword = (password: string) => /^\d{6}$/gm.test(password);
And here is an example test file below.
test('should recognize a valid password', () => {
expect(isPassword('123456')).toBe(true);
expect(isPassword('000000')).toBe(true);
});
test('should recognize an invalid password', () => {
expect(isPassword('asdasda1234')).toBe(false);
expect(isPassword('1234567')).toBe(false);
expect(isPassword('a123456a')).toBe(false);
expect(isPassword('11.11.11')).toBe(false);
expect(isPassword('aaaaaa')).toBe(false);
expect(isPassword('eeeeee')).toBe(false);
expect(isPassword('......')).toBe(false);
expect(isPassword('werwerwerwr')).toBe(false);
});

In order to validate the full string you can use this regex.
^(\s*\d{6}\s*)(,\s*\d{6}\s*)*,?\s*$
It works with six digits only, and you have to enter at least one 6 digit number.
It also works if you have a trailing comma with whitespaces.

It's accepting more than six digit numbers because you're not anchoring the text, and for some odd reason you're optionally repeating the comma. Try something like this:
^[0-9]{6}(?:\s*,\s*[0-9]{6})*$
Also note that [0-9] is equivalent to \d, so this can be rewritten more concisely as:
^\d{6}(?:\s*,\s*\d{6})*$

Your regex does not match 7 digits in a row, but it also doesn't enforce that it matches the whole string. It just has to match some substring in the string, so it would also match each of these:
"1234512345612345612345"
"NaNaNaN 123456, 123456 BOOO!"
"!##$%^&*({123456})*&^%$##!"
Just add the start of string (^) and end of string ($) anchors to enforce that the whole string matches and it will work correctly:
^[0-9]{6}(\s*,*,\s*[0-9]{6})*$
Also note that ,*, could be shortened to ,+, and if you only want one comma in a row, just use ,, not ,* or ,+.
You can also replace [0-9] with \d:
^\d{6}(\s*,\s*\d{6})*$

Using only regex:
var commaSeparatedSixDigits = /^(?:\d{6}\s*,\s*)*\d{6}$/;
if (myInput.test(commaSeparatedSixDigits)) console.log( "Is good!" );
This says:
^ - Starting at the beginning of the string
(?:…)* - Find zero or more of the following:
\d{6} - six digits
\s* - maybe some whitespace
, - a literal comma
\s* - maybe some whitespace
\d{6} - Followed by six digits
$ - Followed by the end of the string
Alternatively:
var commaSeparatedSixDigits = /^\s*\d{6}(?:\s*,\s*\d{6})*\s*$/;
I leave it as an exercise to you to decipher what's different about this.
Using JavaScript + regex:
function isOnlyCommaSeparatedSixDigitNumbers( str ){
var parts = srt.split(/\s*,\s*/);
for (var i=parts.length;i--;){
// Ensure that each part is exactly six digit characters
if (! /^\d{6}$/.test(parts[i])) return false;
}
return true;
}

I see a lot of complication here. Sounds to me like what you want is pretty simple:
/^(\d{6},)*\d{6}$/
Then we account for whitespace:
/^\s*(\d{6}\s*,\s*)*\d{6}\s*$/
But as others have noted, this is actually quite simple in JavaScript without using regex:
function check(input) {
var parts = input.split(',');
for (var i = 0, n = parts.length; i < n; i++) {
if (isNaN(+parts[i].trim())) {
return false;
}
}
return true;
}
Tested in the Chrome JavaScript console.

There isn;t any real need for a regexp. Limit the input to only 6 characters, only accept numbers and ensure that the input has 6 digits (not show here). So you would need:
HTML
<input type='text' name='invoice' size='10' maxlength='6' value='' onkeypress='evNumersOnly(event);'>
JavaScript
<script>
function evNumbersOnly( evt ) {
//--- only accepts numbers
//--- this handles incompatabilities between browsers
var theEvent = evt || window.event;
//--- this handles incompatabilities between browsers
var key = theEvent.keyCode || theEvent.which;
//--- convert key number to a letter
key = String.fromCharCode( key );
var regex = /[0-9]/; // Allowable characters 0-9.+-,
if( !regex.test(key) ) {
theEvent.returnValue = false;
//--- this prevents the character from being displayed
if(theEvent.preventDefault) theEvent.preventDefault();
}
}
</script>

Develop Reference

JavaScript is the programming language of the Web.

Remove last occurence of invisible unicode character using regex? - javascript

Related

How do you move the first n characters of every line in a string to the end of the same line (not to end of string)?

regex to extract numbers starting from second symbol

Split string by all spaces except those in parentheses

single regex to capitalize first letter and replace dot

javascript regular expression test for 6 digit numbers only. comma seperated

Categories

Resources