Tell Browsers to replace UTF-8 word break character with hyphen?

Tell Browsers to replace UTF-8 word break character with hyphen? - javascript

I just explored the great UTF-8 character \u200b, which tells Browsers where it can break a word, if it doesn't fit it's parent container:
MySuperLongWordThat\u200bWontFitItsParentContainer
will be displayed as
MySuperLongWordThat
WontFitItsParentContainer
Is there any way to tell the Browser to automatically replace \u200b with a hyphen - in case the word will break?
I thought about replacing it manually with JavaScript, but I do not know any event that will fire when the word breaks.

That's not what zero-width space is intended for.
The CSS hyphens property can be used to help, but you'll notice from that documentation that if you want to manually insert word-wrap points then you should use  - the "soft-hyphen".

Related

Terminology: Is tab a non-breaking space?

AFAIK, nbsp (non-breaking space) is this: . But tab (\t) is also a non-breaking space right? I mean it doesn't create a new line.
If everything above is correct, then how would you call a variable that can contain either or \t ? Something ugly like tabOrNbsp?
I am asking because currently in my code a variable called nbsp is used for that purpose, but the ambiguity makes me sick. Or is it correct as it is now?
P.S. This question is so dumb, but don't hate, now I see how dumb it was. In the end the commenters and the answerer really helped to sort things out!

I think you've misunderstood the term "non-breaking space".
Normally, although a space or tab character doesn't require a line break, it allows line-wrapping. So if a paragraph goes on long enough, it will eventually spread across multiple lines, even if you use lots of spaces and tabs.
A "non-breaking space" is a space that does not allow line-wrapping; if two words have a non-breaking space between them, then those two words will always end up on the same line, even if they're at the end of a line and you would otherwise expect line-wrapping between them. In Unicode, non-breaking space is coded as a specific character, U+00A0 NO-BREAK SPACE, and in HTML, you can use the entity reference to conveniently embed this character. This character is different from the normal space character , which in Unicode is coded as U+0020 SPACE.
If I've correctly understood the idea that you have in mind, the closest term is probably "linear white space" (LWS or LWSP), which means a sequence of space or tab characters (the idea being that these are whitespace characters "within a line", not forcing a line break).

RegExp must have \w+ and \s+ characters

I've been trying to create a RegExp that makes sure a sure has entered at least one word and at least one space. I tried to use this:
/\w+\s+/
But that makes sure that there is a word AFTER a space. I just want to make sure there is both in a string. They don't need to be in the order of the above RegExp.
How can I make the RegExp work, but without matching the order?

/(?=.*?\w)(?=.*?\s)/
?= means "look-ahead", and .* means "any number of characters"
So "find any number of characters then a \w", "find any number of characters and a \s"
Another thing to note about how this works, look-aheads are "non-matching", making it so that this can match in any order.

You have two things:
Is there a word character?
Is there a space?
Two things.
str.match(/\w/)
str.match(/\s/)
So why are you trying to do them as one step?
if( str.match(/\w/) && str.match(/\s/))
There are a lot of answers to my question. However, I do not want to simply pick the one that is upvoted. Please give a detailed explanation of why your regex works, and maybe why mine doesn't.
My answer provides the simplest solution. It is very clear to anyone reading it that we are checking "if it has a word character, and if it contains a space character". It is also very easy to expand on, such as if you want to add another check.
zyklus' answer (/(?=.*?\w)(?=.*?\s)/) is the fastest when speed-tested on a 50Kb string of input. In more common cases (ie. 100 character at most), this speed difference will be practically non-existent. It is twice as fast as my answer, but "2 * very small number = very small number". It's easy enough to add new test cases (just add another (?=.*something) block) but is less humanly-obvious as to what it does.
Jacob's answer ((\w+.*\s+)|(\s+.*\w+)) does quite literally what you asked, checking first if there is a word character and then a space character, then checks the other way around before failing. It works, however it is slower. Furthermore, if you decide to add a new test case, you'd get something like (\w+.*\s+.*\d+)|(\w+.*\d+.*\s)|(\s+.*\w+.*\d+)|(\s+.*\d+.*\w+)|(\d+.*\w+.*\s+)|‌(\d+.*\s+.*\w+). It only gets worse if you add a fourth test (24 arrangements to check) and is unreadably ugly. Do not use this answer.
Other answers are variants of existing ones.

If you need to do it in one RegEx for some reason:
(\w+.*\s+)|(\s+.*\w+)
Can be handy if you're working with a library that only enables you to use a single regular expression.

Trim line breaks from a string in JavaScript without frameworks [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I trim a string in javascript?
Using JavaScript without any frameworks how would you trim line breaks? Trim as being defined by PHP, removing the leading and ending line breaks while preserving line breaks after the first non-white-space character and until though not beyond the last non-white-space character.
By default most people will want to know how to remove both leading/ending line breaks as well as white-space spaces too though some may want to retain the white-space spaces while trimming just the white-space line-breaks. It also generally helps to see two working examples and how they relate when they both work for people learning code, so I'm looking for trimming both white-space line-breaks and white-space spaces.
It'd be good also to see how to trim just the leading/ending line breaks while preserving white-space spaces (which may or may not be included in the main answer).

I think this works:
string.replace(/^\s+|\s+$/g,"");

trim in general can be defined as .replace(/^\s+|\s+$/g,''), but since you want only vertical whitespace you should use .replace(/^[\r\n]+|[\r\n]+$/g,'').

Backreference each character

For the sake of simplicity & learning something new, please don't suggest using two separate replace functions. I know that's an option but I would rather also know how to do this (or if it's not possible).
'<test></test>'.replace(/<|>/g,'$&'.charCodeAt(0))
This is what I've got so far. This sample code is, as you can tell, for another piece of code to escape HTML entities while still using innerHTML (because I do intend to include a few HTML entities such as small images, so again please don't suggest textContent).
Since I'm trying to replace both < and >, the problem is converting each individual one to their respective character codes. Since regular expressions allow for this "OR" condition as well as backreferences to each one, I'm hoping there's a way to get the reference of each individual character as they're replaced. $& will return <><> (because they're replaced in that order), but I don't know how to get them as they're replaced and take their character codes for the HTML entities. The problem is, I don't know what to use in this case if anything.
If that explanation wasn't clear enough, I want it to be something like this (and this is obviously not going to work, it'll best convey what I mean):
Assuming x is the index of the character being replaced,
'<test></test>'.replace(/<|>/g,'$&'.charCodeAt(x))
Hopefully that makes more sense. So, is this actually possible in some way?

'<test></test>'.replace(/[<>]/g,function(a) {return '&#'+a.charCodeAt(0)+';';});
I've put the characters in a square-bracket-thing (don't know it's proper name). That way you can add whatever characters you want.
The above will return:
<test></test>

Javascript regex syntax for HTML5 input validation

So there have been plenty of questions, and filtering through a few, I still dont know how to go about this...
Pattern for:
Alphabets ONLY, no case sensitivity, no limit on character count or words, minimum 3 characters...
I have
pattern="[A-z]{3,}"
That gives me everything, except that I'm limited to one word only... :-(
Edit: Let me be a little more clear on what I want the validation to achieve for me...
I'm using it to capture a person's name. But I do not want any special characters or numerals involved, so no "John Doe Jr.", as the '.' will get rejected, but I want to be able to capture double, or even single character portions, whilst maintaining 'global' 3 char minimum limit...

All you have to do to allow spaces as well is to add a space to the character pattern where you have [A-z].
So it becomes:
pattern="[A-z ]{3,}"
Hope that helps.
Note, however, that this will prevent other types of white space characters. I assume this is what you want, since you're being quite restrictive with the rest of the character set, but it's worth pointing out that non-breaking spaces, carriage returns, and other white space will be blocked in the above. If you want to allow them, use \s instead of just a space: this will match any white space character.
Finally, it's worth pointing out that the standard alphabet is often insufficient even for plain English text. There are valid English words with accents, as well as apostrophes and other punctuation. You haven't specified what the field is being used for, so I'll assume this is not an issue, but I felt it was worth pointing out nevertheless.

It is difficult to see what is the question. You are matching a String, not a set of words.
If your pattern is "a list of words, each of the words alphabetical only and separated by whitespace", then the regex would be
([A-Za-z]{3,}\\s*)+
Edited to answer to updated question.
[A-Za-z\\s]*([A-Za-z]{3,})+[A-Za-z\\s]* (works in Java)

How about pattern = "[A-z\s]{3,}"?

Develop Reference

JavaScript is the programming language of the Web.

Tell Browsers to replace UTF-8 word break character with hyphen? - javascript

That's not what zero-width space is intended for. The CSS hyphens property can be used to help, but you'll notice from that documentation that if you want to manually insert word-wrap points then you should use - the "soft-hyphen".

Related

Terminology: Is tab a non-breaking space?

RegExp must have \w+ and \s+ characters

Trim line breaks from a string in JavaScript without frameworks [duplicate]

Backreference each character

Javascript regex syntax for HTML5 input validation

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

Tell Browsers to replace UTF-8 word break character with hyphen? - javascript

That's not what zero-width space is intended for. The CSS hyphens property can be used to help, but you'll notice from that documentation that if you want to manually insert word-wrap points then you should use ­ - the "soft-hyphen".

Related

Terminology: Is tab a non-breaking space?

RegExp must have \w+ and \s+ characters

Trim line breaks from a string in JavaScript without frameworks [duplicate]

Backreference each character

Javascript regex syntax for HTML5 input validation

Categories

Resources

That's not what zero-width space is intended for. The CSS hyphens property can be used to help, but you'll notice from that documentation that if you want to manually insert word-wrap points then you should use - the "soft-hyphen".