I have a string that comes from HTML without tags but with escaped symbols, like:
abc&symbol1;def&symbol2;ghi&symbol3;jkl...
In JavaScript or TypeScript, how can I replace all sequences like &symbolN; with one fixed character like X so I get:
abcXdefXghiXjkl...
(by the way, the target is to get the length of a string with distinct HTML escaped characters like £ so that each one of them is counted like one character)
Update: maybe I've not explained accurately: symbol1, symbol2,... do not mean that "symbol" string repeats, but completely distinct symbols that DO NOT repeat, e.g. "abc£def ghi€..." So no way to use a repeating textual pattern like "&symbol;"
Just to calculate length, you can cheat, as you say:
html.replace(/&[^;]+;/, 'X').length
To convert HTML into text properly, one should use a HTML parser, not regexp. For example, in browser,
let e = document.createElement('div');
e.innerHTML = html;
let text = e.textContent;
Related
My case is: I have a string with HTML elements:
This is a text and "specific_string"
I need a Regex to match only the one that is not in a HTML attribute.
This is my current Regex, it works but it gives a false positive when the string is wrapped by double quotes
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
I have tried the following Regex:
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
It works but it gives a false positive when the string is wrapped by double quotes
if you want to get what's inside the tag you might be trying to use the split() tool; to cut the string every >" or "<" basically like this:
let string = "<a href='something+specific_string' title='testing'>This is a text and 'specific_string'</a>";
string = string.split('>');
string = string[1].split('<');
console.log(string)
So, when you want to manipulate it, just use position 0 of the string. Is not regex like u wnat, but is an idea
Though it can suffice in simple cases, you should know it's often said that RegExp is ill-suited for parsing HTML, and depending on environment you could be better off using more robust techniques. (There's http://htmlparsing.com/ dedicated to the topic but yet it doesn't discuss JS.)
That said, the following works in Chrome 107 and Node 16.13.
(s=>s.match(/(?<=>[^<]*|^[^<]*)specific_string/))
('This is a text and "specific_string"')
It uses look-behind. In lieu of that you could use /(>[^<]*|^[^<]*)(specific_string)/ and compensate index/lengths to get the position of a match...
As you answer in a comment that you'll replace in user-provided HTML, I encourage you to consider security implications (namely XSS).
Back on the topic of parsing HTML w/o RegExp we obviously have the techniques in a web browser and I couldn't stop myself writing a quick and dirty textNode replacer in web JS, working in Chrome 107:
((html, fun) => {
const el = document.createElement('body')
el.innerHTML = html
const X = new XPathEvaluator, R = X.evaluate('//*[text()]', el)
const A = []; for (let n; n = R.iterateNext();) A.push(n) // mutating el while iterating XPathResult is illegal
for (let n of A) fun(n)
return el.innerHTML})
('This is a text and "specific_string"',
n => n.innerHTML = n.innerHTML
.replace(/specific_string/, '<b>replaced</b>'))
I'm trying to split a string by either three or more pound signs or three or more spaces.
I'm using a function that looks like this:
var produktDaten = dataMatch[0].replace(/\x03/g, '').trim().split('/[#\s]/{3,}');
console.log(produktDaten + ' is the data');
I need to clean the data up a bit, hence the replace and trim.
The output I'm getting looks like this:
##########################################################################MA-KF6###Beckhoff###EL1808 BECK.EL1808###MA-KF7###Beckhoff###EL1808 BECK.EL1808###MA-KF12###Beckhoff###EL1808 BECK.EL1808###MA-KF13###Beckhoff###EL1808 BECK.EL1808###MA-KF14###Beckhoff###EL1808 BECK.EL1808###MA-KF15###Beckhoff###EL1808 BECK.EL1808###MA-KF16###Beckhoff###EL1808 BECK.EL1808###MA-KF19###Beckhoff###EL1808 BECK.EL1808 is the data
How is this possible? Irrespective of the input, shouldn't the pound and multiple spaces be deleted by the split?
You passed a string to the split, the input string does not contain that string. I think you wanted to use
/[#\s]{3,}/
like here:
var produktDaten = "##########################################################################MA-KF6###Beckhoff###EL1808 BECK.EL1808###MA-KF7###Beckhoff###EL1808 BECK.EL1808###MA-KF12###Beckhoff###EL1808 BECK.EL1808###MA-KF13###Beckhoff###EL1808 BECK.EL1808###MA-KF14###Beckhoff###EL1808 BECK.EL1808###MA-KF15###Beckhoff###EL1808 BECK.EL1808###MA-KF16###Beckhoff###EL1808 BECK.EL1808###MA-KF19###Beckhoff###EL1808 BECK.EL1808";
console.log(produktDaten.replace(/\x03/g, '').trim().split(/[#\s]{3,}/));
This /[#\s]{3,}/ regex matches 3 or more chars that are either # or whitespace.
NOTE: just removing ' around it won't fix the issue since you are using an unescaped / and quantify it. You actually need to quantify the character class, [#\s].
I have a string after Json.stringify in javascript using node. I wanted to replace the text in the string which starts with 'ab' then followed by some numbers(atleast one digit), with 'ab^^^^^^' where the number of '^' s should be equal to the number of digits after ab. The text starting with ab can occur atleast once, In this example it occurs twice. I need help in regex and replacing the string
string - in this, text starting with ab occurs twice.
var str = JSON.stringify({"abc":{"idcardno":"ertyuiop","form":{"somestring":"This string:\n- can have multiple \nab12345ab5677\n","flag":"true","flag2":"false"},"anothertext":"samplestring","numbetstr":"7"}});
after the regex replace it should be like this
{"abc":{"idcardno":"ertyuiop","form":{"somestring":"This string:\n- can have multiple \na^^^^^ab^^^^\n","flag":"true","flag2":"false"},"anothertext":"samplestring","numbetstr":"7"}}
Edit
As per the post below the below will be the contents of obj.abc.form.string, coming in multiple lines. How do I do the regex(above mentioned) replace of this object?
This string:
- can have multiple
ab12345ab56778
Don't process stringifed JSON with regexp. Process the JavaScript object itself, then stringify. In your case, assuming obj is the input:
obj.abc.form.somestring = transform(obj.abc.form.somestring);
str = JSON.stringify(obj);
where transform is a regexp/replace making the transformation you want.
#torazaburo is right, it's a bad practice to manipulate JSON directly. Once you get ahold of the string in obj.abc.form.somestring, though, you can use replace, passing a function:
str.replace(/ab\d+/g, function(match) {return match.replace(/\d/g,'^')})
I'm trying to do something which seems fairly basic, but can't seem to get it working.
I'm trying to strip the characters after the last instance of an underscore.
I have this long Query String:
json_data=demo_title=Demo+title&proc1_script=script.sh+parameters&proc1_chk_make=on&outputp2_value=&demo_input_description=hola+mundo&outputp4_visible=on&outputp4_info=&inputdata1_max_pixels=1024000&tag=&outputp1_id=nanana&proc1_src_compresion=zip&proc1_chk_cmake=off&outputp3_description=&outputp3_value=&inputdata1_description=input+data+description&inputp2_description=bien%3F&inputp3_description=funciona&proc1_cmake=-D+CMAKE_BUILD_TYPE%3Astring%3DRelease+&outputp2_visible=on&outputp3_visible=on&outputp1_type=header&inputp1_type=text&demo_params_description=va+bien&outputp1_description=&inputdata1_type=image2d&proc1_chk_script=off&demo_result_description=win%3F&outputp2_id=nanfdsvfa&inputp1_description=funciona&demo_wait_description=boh&outputp4_description=&inputp2_type=integer&inputp2_id=papapa&outputp1_value=&outputp3_id=nananartrtrt&inputp3_id=pepepe&outputp3_type=header&inputp3_visible=+off&outputp1_visible=on&inputdata1_id=id_lsd&outputp4_value=&inputp2_visible=on&proc1_source=lsd-1.5.zip&inputp3_value=si&proc1_make=-j4+-C+&images_config_file=cfgmydemo.cfg&outputp2_type=header&proc1_subdir=xxx-1.5&proc1_url=http%3A%2F%2Fwww.ipol.im%2Fpub%2Falgo%2F...&inputdata1_image_depth=1x8i&inputp1_id=popopo&inputp1_value=si&inputp2_value=no&demo_data_filename=data_saved.cfg&inputdata1_info=info_lsd&outputp3_info=&inputdata1_image_format=.pgm&outputp1_info=&inputdata1_compress=False&inputp1_visible=on&proc1_id=lsd&outputp4_id=nana&outputp2_description=&outputp4_type=header&outputp2_info=&inputp3_type=float&&tag&inputp4_iddcksmdclk&inputp4_typetext&inputp4_descriptionkldmsclk&inputp4_valueklcdmkl&inputp4_infoclkdmscdl
Now I replace the separator = in separator %24+ and & in +%23+ using fr=fr.replace(/\&/g,"+%23+");
Separator
javascript Mako
= %24+
& +%23+
But the result is:
json_data%24+demo_title%24+Demo+title+%23+proc1_script%24+script.sh+parameters+%23+proc1_chk_make%24+on+%23+outputp2_value%24++%23+demo_input_description%24+hola+mundo+%23+outputp4_visible%24+on+%23+outputp4_info%24++%23+inputdata1_max_pixels%24+1024000+%23+tag%24++%23+outputp1_id%24+nanana+%23+proc1_src_compresion%24+zip+%23+proc1_chk_cmake%24+off+%23+outputp3_description%24++%23+outputp3_value%24++%23+inputdata1_description%24+input+data+description+%23+inputp2_description%24+bien%3F+%23+inputp3_description%24+funciona+%23+proc1_cmake%24+-D+CMAKE_BUILD_TYPE%3Astring%3DRelease++%23+outputp2_visible%24+on+%23+outputp3_visible%24+on+%23+outputp1_type%24+header+%23+inputp1_type%24+text+%23+demo_params_description%24+va+bien+%23+outputp1_description%24++%23+inputdata1_type%24+image2d+%23+proc1_chk_script%24+off+%23+demo_result_description%24+win%3F+%23+outputp2_id%24+nanfdsvfa+%23+inputp1_description%24+funciona+%23+demo_wait_description%24+boh+%23+outputp4_description%24++%23+inputp2_type%24+integer+%23+inputp2_id%24+papapa+%23+outputp1_value%24++%23+outputp3_id%24+nananartrtrt+%23+inputp3_id%24+pepepe+%23+outputp3_type%24+header+%23+inputp3_visible%24++off+%23+outputp1_visible%24+on+%23+inputdata1_id%24+id_lsd+%23+outputp4_value%24++%23+inputp2_visible%24+on+%23+proc1_source%24+lsd-1.5.zip+%23+inputp3_value%24+si+%23+proc1_make%24+-j4+-C++%23+images_config_file%24+cfgmydemo.cfg+%23+outputp2_type%24+header+%23+proc1_subdir%24+xxx-1.5+%23+proc1_url%24+http%3A%2F%2Fwww.ipol.im%2Fpub%2Falgo%2F...+%23+inputdata1_image_depth%24+1x8i+%23+inputp1_id%24+popopo+%23+inputp1_value%24+si+%23+inputp2_value%24+no+%23+demo_data_filename%24+data_saved.cfg+%23+inputdata1_info%24+info_lsd+%23+outputp3_info%24++%23+inputdata1_image_format%24+.pgm+%23+outputp1_info%24++%23+inputdata1_compress%24+False+%23+inputp1_visible%24+on+%23+proc1_id%24+lsd+%23+outputp4_id%24+nana+%23+outputp2_description%24++%23+outputp4_type%24+header+%23+outputp2_info%24++%23+inputp3_type%24+float+%23++%23+tag+%23+inputp4_iddcksmdclk+%23+inputp4_typetext+%23+inputp4_descriptionkldmsclk+%23+inputp4_valueklcdmkl+%23+inputp4_infoclkdmscdl
Now I am interested how to replace this = after the value jsondata.
Explain:
In the Query string there is the string json_data+%23+ and this +%23+ I want replace to =
How?
Strip the characters after the last instance of an underscore:
json_data.substring(0, json_data.lastIndexOf("_"));
Replace +%23+ with =
json_data.replace("+%23+", "=");
However, if you're trying to turn all the %xx into what they're supposed to be, you should url decode the string instead.
Which would probably have to be something like:
decodeURIComponent((json_data).replace('+', '%20'));
I am trying to read in a list of words separated by spaces from a textbox with Javascript. This will eventually be in a website.
Thank you.
This should pretty much do it:
<textarea id="foo">some text here</textarea>
<script>
var element = document.getElementById('foo');
var wordlist = element.value.split(' ');
// wordlist now contains 3 values: 'some', 'text' and 'here'
</script>
A more accurate way to do this is to use regular expressions to strip extra spaces first, and than use #Aron's method, otherwise, if you have something like "a b c d e" you will get an array with a lot of empty string elements, which I'm sure you don't want
Therefore, you should use:
<textarea id="foo">
this is some very bad
formatted text a
</textarea>
<script>
var str = document.getElementById('foo').value;
str = str.replace(/\s+/g, ' ').replace(/^\s+|\s$/g);
var words = str.split(' ');
// words will have exactly 7 items (the 7 words in the textarea)
</script>
The first .replace() function replaces all consecutive spaces with 1 space and the second one trims the whitespace from the start and the end of the string, making it ideal for word parsing :)
Instead of splitting by whitespaces, you can also try matching sequences of non-whitespace characters.
var words = document.getElementById('foo').value.match(/\S+/g);
Problems with the splitting method is that when there are leading or trailing whitespaces, you will get an empty element for them. For example, " hello world " would give you ["", "hello", "world", ""].
You may strip the whitespaces before and after the text, but there is another problem: When the string is empty. For example, splitting "" will give you [""].
Instead of finding what we don't want and split it, I think it is better to look for what we want.