Replace only first occurrence with javascript - javascript

I have a string that comes from HTML without tags but with escaped symbols, like:
abc&symbol1;def&symbol2;ghi&symbol3;jkl...
In JavaScript or TypeScript, how can I replace all sequences like &symbolN; with one fixed character like X so I get:
abcXdefXghiXjkl...
(by the way, the target is to get the length of a string with distinct HTML escaped characters like £ so that each one of them is counted like one character)
Update: maybe I've not explained accurately: symbol1, symbol2,... do not mean that "symbol" string repeats, but completely distinct symbols that DO NOT repeat, e.g. "abc£def ghi€..." So no way to use a repeating textual pattern like "&symbol;"

Just to calculate length, you can cheat, as you say:
html.replace(/&[^;]+;/, 'X').length
To convert HTML into text properly, one should use a HTML parser, not regexp. For example, in browser,
let e = document.createElement('div');
e.innerHTML = html;
let text = e.textContent;

Related

Regex to find a specific string that is not in a HTML attribute

My case is: I have a string with HTML elements:
This is a text and "specific_string"
I need a Regex to match only the one that is not in a HTML attribute.
This is my current Regex, it works but it gives a false positive when the string is wrapped by double quotes
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
I have tried the following Regex:
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
It works but it gives a false positive when the string is wrapped by double quotes
if you want to get what's inside the tag you might be trying to use the split() tool; to cut the string every >" or "<" basically like this:
let string = "<a href='something+specific_string' title='testing'>This is a text and 'specific_string'</a>";
string = string.split('>');
string = string[1].split('<');
console.log(string)
So, when you want to manipulate it, just use position 0 of the string. Is not regex like u wnat, but is an idea
Though it can suffice in simple cases, you should know it's often said that RegExp is ill-suited for parsing HTML, and depending on environment you could be better off using more robust techniques. (There's http://htmlparsing.com/ dedicated to the topic but yet it doesn't discuss JS.)
That said, the following works in Chrome 107 and Node 16.13.
(s=>s.match(/(?<=>[^<]*|^[^<]*)specific_string/))
('This is a text and "specific_string"')
It uses look-behind. In lieu of that you could use /(>[^<]*|^[^<]*)(specific_string)/ and compensate index/lengths to get the position of a match...
As you answer in a comment that you'll replace in user-provided HTML, I encourage you to consider security implications (namely XSS).
Back on the topic of parsing HTML w/o RegExp we obviously have the techniques in a web browser and I couldn't stop myself writing a quick and dirty textNode replacer in web JS, working in Chrome 107:
((html, fun) => {
const el = document.createElement('body')
el.innerHTML = html
const X = new XPathEvaluator, R = X.evaluate('//*[text()]', el)
const A = []; for (let n; n = R.iterateNext();) A.push(n) // mutating el while iterating XPathResult is illegal
for (let n of A) fun(n)
return el.innerHTML})
('This is a text and "specific_string"',
n => n.innerHTML = n.innerHTML
.replace(/specific_string/, '<b>replaced</b>'))

why isn't this javascript regex split function working?

I'm trying to split a string by either three or more pound signs or three or more spaces.
I'm using a function that looks like this:
var produktDaten = dataMatch[0].replace(/\x03/g, '').trim().split('/[#\s]/{3,}');
console.log(produktDaten + ' is the data');
I need to clean the data up a bit, hence the replace and trim.
The output I'm getting looks like this:
##########################################################################MA-KF6###Beckhoff###EL1808 BECK.EL1808###MA-KF7###Beckhoff###EL1808 BECK.EL1808###MA-KF12###Beckhoff###EL1808 BECK.EL1808###MA-KF13###Beckhoff###EL1808 BECK.EL1808###MA-KF14###Beckhoff###EL1808 BECK.EL1808###MA-KF15###Beckhoff###EL1808 BECK.EL1808###MA-KF16###Beckhoff###EL1808 BECK.EL1808###MA-KF19###Beckhoff###EL1808 BECK.EL1808 is the data
How is this possible? Irrespective of the input, shouldn't the pound and multiple spaces be deleted by the split?
You passed a string to the split, the input string does not contain that string. I think you wanted to use
/[#\s]{3,}/
like here:
var produktDaten = "##########################################################################MA-KF6###Beckhoff###EL1808 BECK.EL1808###MA-KF7###Beckhoff###EL1808 BECK.EL1808###MA-KF12###Beckhoff###EL1808 BECK.EL1808###MA-KF13###Beckhoff###EL1808 BECK.EL1808###MA-KF14###Beckhoff###EL1808 BECK.EL1808###MA-KF15###Beckhoff###EL1808 BECK.EL1808###MA-KF16###Beckhoff###EL1808 BECK.EL1808###MA-KF19###Beckhoff###EL1808 BECK.EL1808";
console.log(produktDaten.replace(/\x03/g, '').trim().split(/[#\s]{3,}/));
This /[#\s]{3,}/ regex matches 3 or more chars that are either # or whitespace.
NOTE: just removing ' around it won't fix the issue since you are using an unescaped / and quantify it. You actually need to quantify the character class, [#\s].

string replace using a regex

I have a string after Json.stringify in javascript using node. I wanted to replace the text in the string which starts with 'ab' then followed by some numbers(atleast one digit), with 'ab^^^^^^' where the number of '^' s should be equal to the number of digits after ab. The text starting with ab can occur atleast once, In this example it occurs twice. I need help in regex and replacing the string
string - in this, text starting with ab occurs twice.
var str = JSON.stringify({"abc":{"idcardno":"ertyuiop","form":{"somestring":"This string:\n- can have multiple \nab12345ab5677\n","flag":"true","flag2":"false"},"anothertext":"samplestring","numbetstr":"7"}});
after the regex replace it should be like this
{"abc":{"idcardno":"ertyuiop","form":{"somestring":"This string:\n- can have multiple \na^^^^^ab^^^^\n","flag":"true","flag2":"false"},"anothertext":"samplestring","numbetstr":"7"}}
Edit
As per the post below the below will be the contents of obj.abc.form.string, coming in multiple lines. How do I do the regex(above mentioned) replace of this object?
This string:
- can have multiple
ab12345ab56778
Don't process stringifed JSON with regexp. Process the JavaScript object itself, then stringify. In your case, assuming obj is the input:
obj.abc.form.somestring = transform(obj.abc.form.somestring);
str = JSON.stringify(obj);
where transform is a regexp/replace making the transformation you want.
#torazaburo is right, it's a bad practice to manipulate JSON directly. Once you get ahold of the string in obj.abc.form.somestring, though, you can use replace, passing a function:
str.replace(/ab\d+/g, function(match) {return match.replace(/\d/g,'^')})

Javascript:Replace single characters after the string

I'm trying to do something which seems fairly basic, but can't seem to get it working.
I'm trying to strip the characters after the last instance of an underscore.
I have this long Query String:
json_data=demo_title=Demo+title&proc1_script=script.sh+parameters&proc1_chk_make=on&outputp2_value=&demo_input_description=hola+mundo&outputp4_visible=on&outputp4_info=&inputdata1_max_pixels=1024000&tag=&outputp1_id=nanana&proc1_src_compresion=zip&proc1_chk_cmake=off&outputp3_description=&outputp3_value=&inputdata1_description=input+data+description&inputp2_description=bien%3F&inputp3_description=funciona&proc1_cmake=-D+CMAKE_BUILD_TYPE%3Astring%3DRelease+&outputp2_visible=on&outputp3_visible=on&outputp1_type=header&inputp1_type=text&demo_params_description=va+bien&outputp1_description=&inputdata1_type=image2d&proc1_chk_script=off&demo_result_description=win%3F&outputp2_id=nanfdsvfa&inputp1_description=funciona&demo_wait_description=boh&outputp4_description=&inputp2_type=integer&inputp2_id=papapa&outputp1_value=&outputp3_id=nananartrtrt&inputp3_id=pepepe&outputp3_type=header&inputp3_visible=+off&outputp1_visible=on&inputdata1_id=id_lsd&outputp4_value=&inputp2_visible=on&proc1_source=lsd-1.5.zip&inputp3_value=si&proc1_make=-j4+-C+&images_config_file=cfgmydemo.cfg&outputp2_type=header&proc1_subdir=xxx-1.5&proc1_url=http%3A%2F%2Fwww.ipol.im%2Fpub%2Falgo%2F...&inputdata1_image_depth=1x8i&inputp1_id=popopo&inputp1_value=si&inputp2_value=no&demo_data_filename=data_saved.cfg&inputdata1_info=info_lsd&outputp3_info=&inputdata1_image_format=.pgm&outputp1_info=&inputdata1_compress=False&inputp1_visible=on&proc1_id=lsd&outputp4_id=nana&outputp2_description=&outputp4_type=header&outputp2_info=&inputp3_type=float&&tag&inputp4_iddcksmdclk&inputp4_typetext&inputp4_descriptionkldmsclk&inputp4_valueklcdmkl&inputp4_infoclkdmscdl
Now I replace the separator = in separator %24+ and & in +%23+ using fr=fr.replace(/\&/g,"+%23+");
Separator
javascript Mako
= %24+
& +%23+
But the result is:
json_data%24+demo_title%24+Demo+title+%23+proc1_script%24+script.sh+parameters+%23+proc1_chk_make%24+on+%23+outputp2_value%24++%23+demo_input_description%24+hola+mundo+%23+outputp4_visible%24+on+%23+outputp4_info%24++%23+inputdata1_max_pixels%24+1024000+%23+tag%24++%23+outputp1_id%24+nanana+%23+proc1_src_compresion%24+zip+%23+proc1_chk_cmake%24+off+%23+outputp3_description%24++%23+outputp3_value%24++%23+inputdata1_description%24+input+data+description+%23+inputp2_description%24+bien%3F+%23+inputp3_description%24+funciona+%23+proc1_cmake%24+-D+CMAKE_BUILD_TYPE%3Astring%3DRelease++%23+outputp2_visible%24+on+%23+outputp3_visible%24+on+%23+outputp1_type%24+header+%23+inputp1_type%24+text+%23+demo_params_description%24+va+bien+%23+outputp1_description%24++%23+inputdata1_type%24+image2d+%23+proc1_chk_script%24+off+%23+demo_result_description%24+win%3F+%23+outputp2_id%24+nanfdsvfa+%23+inputp1_description%24+funciona+%23+demo_wait_description%24+boh+%23+outputp4_description%24++%23+inputp2_type%24+integer+%23+inputp2_id%24+papapa+%23+outputp1_value%24++%23+outputp3_id%24+nananartrtrt+%23+inputp3_id%24+pepepe+%23+outputp3_type%24+header+%23+inputp3_visible%24++off+%23+outputp1_visible%24+on+%23+inputdata1_id%24+id_lsd+%23+outputp4_value%24++%23+inputp2_visible%24+on+%23+proc1_source%24+lsd-1.5.zip+%23+inputp3_value%24+si+%23+proc1_make%24+-j4+-C++%23+images_config_file%24+cfgmydemo.cfg+%23+outputp2_type%24+header+%23+proc1_subdir%24+xxx-1.5+%23+proc1_url%24+http%3A%2F%2Fwww.ipol.im%2Fpub%2Falgo%2F...+%23+inputdata1_image_depth%24+1x8i+%23+inputp1_id%24+popopo+%23+inputp1_value%24+si+%23+inputp2_value%24+no+%23+demo_data_filename%24+data_saved.cfg+%23+inputdata1_info%24+info_lsd+%23+outputp3_info%24++%23+inputdata1_image_format%24+.pgm+%23+outputp1_info%24++%23+inputdata1_compress%24+False+%23+inputp1_visible%24+on+%23+proc1_id%24+lsd+%23+outputp4_id%24+nana+%23+outputp2_description%24++%23+outputp4_type%24+header+%23+outputp2_info%24++%23+inputp3_type%24+float+%23++%23+tag+%23+inputp4_iddcksmdclk+%23+inputp4_typetext+%23+inputp4_descriptionkldmsclk+%23+inputp4_valueklcdmkl+%23+inputp4_infoclkdmscdl
Now I am interested how to replace this = after the value jsondata.
Explain:
In the Query string there is the string json_data+%23+ and this +%23+ I want replace to =
How?
Strip the characters after the last instance of an underscore:
json_data.substring(0, json_data.lastIndexOf("_"));
Replace +%23+ with =
json_data.replace("+%23+", "=");
However, if you're trying to turn all the %xx into what they're supposed to be, you should url decode the string instead.
Which would probably have to be something like:
decodeURIComponent((json_data).replace('+', '%20'));

How do I read a list from a textarea with Javascript?

I am trying to read in a list of words separated by spaces from a textbox with Javascript. This will eventually be in a website.
Thank you.
This should pretty much do it:
<textarea id="foo">some text here</textarea>
<script>
var element = document.getElementById('foo');
var wordlist = element.value.split(' ');
// wordlist now contains 3 values: 'some', 'text' and 'here'
</script>
A more accurate way to do this is to use regular expressions to strip extra spaces first, and than use #Aron's method, otherwise, if you have something like "a b c d e" you will get an array with a lot of empty string elements, which I'm sure you don't want
Therefore, you should use:
<textarea id="foo">
this is some very bad
formatted text a
</textarea>
<script>
var str = document.getElementById('foo').value;
str = str.replace(/\s+/g, ' ').replace(/^\s+|\s$/g);
var words = str.split(' ');
// words will have exactly 7 items (the 7 words in the textarea)
</script>
The first .replace() function replaces all consecutive spaces with 1 space and the second one trims the whitespace from the start and the end of the string, making it ideal for word parsing :)
Instead of splitting by whitespaces, you can also try matching sequences of non-whitespace characters.
var words = document.getElementById('foo').value.match(/\S+/g);
Problems with the splitting method is that when there are leading or trailing whitespaces, you will get an empty element for them. For example, " hello world " would give you ["", "hello", "world", ""].
You may strip the whitespaces before and after the text, but there is another problem: When the string is empty. For example, splitting "" will give you [""].
Instead of finding what we don't want and split it, I think it is better to look for what we want.

Categories

Resources