I was helped on here some time ago in writing a regular expression to compare the location of the current page to that regex, and taking action if it's something specific. An example of that code is:
var re1 = new RegExp('^http://([^\.]+)\.domain\.com/subpage(.*)$');
if(window.location.href.match(re1))
{
// Do more
}
In this case, I could write some code that would only execute on that subpage. It's worked beautifully so far. But I've run into a problem where I need further assistance with regular expressions.
Imagine a site like this: http://websitenamepreview.testserver.designcompanyname.com/subpage
How can I adapt this code to work on such a URL? The only thing that will probably change, so long as it's run from this test server, is the subpage.
Using you original code, you could change it to ^http://([^\.]+\.)+domain\.com/subpage(.*)$
Of course, for this particular test data, it wouldn't match, because you don't have "domain.com" as the end of the domain in your test URL.
If you can give us a little more information about what parts are important, what parts are likely to change in the data that you's be matching, etc. we could probably make it even less complex.
Related
I am useless at Regex and I want to remove parts of a URL that are not always consistent.
The URL might be:
www.test.com /en/ restOfPath
or
www.test.com /en/en_gb/ restOfPath
Then depending on the country values might change to:
www.test.com /es/ restOfPath
or
www.test.com /es/es_es/ restOfPath
I am therefore looking to alway remove, the parts in bold, so that I can split the remained of the path, to create a logical naming that is language/location agnostic.
I am doing this as a work around to build out a data layer until the client can implement it properly when they launch their new website. I have managed to build an if else statement as a workaround which is a bit clunky but would like a cleaner solution.
Probably this will help you
(?:\/([a-z]{2})(?:\/([a-z]{2}_[A-Z]{2}))?)
This example is about to find first / with two alpha after that, and probably another / with aa_AA construction.
I got you code samples at regex101
I believe this is what you're after:
\/.*(?=\/.*?)
https://regex101.com/r/OZIseI/4
It uses a positive look ahead to exclude the last / from the match
I am trying to do some website testing through selenium and python. I did fill the page http://www.flightcentre.co.nz/ and submitted the form. But now the search result is taking me to a new page with URL - https://secure.flightcentre.co.nz/eyWD/results . How does my web driver now will handle this? I am doing this for the first time. Could any one help me by providing an example or point me to a right tutorial of this sort.
Thanks.
Ok since I tried to answer your other question I'll give it a go on this one although you are not exactly explaining what you want.
One thing to remember is Selenium is running your browser and not a traditional web scraper. Which means if the url changes it's not a big deal, the only time you have to change how you approach scraping in selenium is when you get a popup.
One thing you can do from your other code is when looking for a flight do a
driver.implicitly_wait(40)//40 is the amount of seconds
this will wait for at least 40 seconds before crashing, and then start whenever the page finishes loading, or whatever you want to do next is active in the dom.
Now if you are trying to scrape all of the flight data that comes up, that'll be fairly tricky. You could do a for loop and grab every element on a page and write it to a csv file.
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]')
for flights in class_for_departure_flight:
try:
with open('my_flights.csv', 'a', newline='') as flights_book:
csv_writer = csv.writer(flights_book, delimiter = ',')
csv_writer.writerow(flights.text)
except:
print("Missed a flight")
Things to take notice in this second part is I am using the CSV library in Python to write rows of data. A note you can append a bunch of data together and write it as one row like:
data = (flights, dates, times)
csv_writer.writerow(data)
and it will right all of those different things on the same row in a spreadsheet.
The other two big things that are easily missed are:
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]')
that is driver.find_elements_by_xpath, you'll notice elements is plural, which means it is looking for multiple objects with the same class_name and it will store them in an array so you can iterate over them in a for loop.
The next part is csv_writer.writerow(flights.text) when you iterate over your flights, you need to grab the text to do that you do flights.text. If you were to do this with just a search function you could do something like this as well.
class_for_departure_flight = driver.find_elements_by_xpath('//div[#class="iata"]').text
hopefully this helps!
This is a good place to start: http://selenium-python.readthedocs.org/getting-started.html
Here are some things about Selenium I've learned the hard way:
1) When the DOM refreshes, you lose your references to page objects (i.e. return from something like element = driver.find_element_by_id("passwd-id"), element is now stale)
2) Test shallow; each test case should do only one assert/validation of page state, maybe two. This enables you to take screen shots when there's a failure and save you from dealing with "is it a failure in the test, or the app?"
3) It's a big race condition between any JavaScript on the page, and Selenium. Use Explicit Waits to block Selenium while the JavaScript works to refresh the DOM.
To be clear, this is my experience with using Selenium; thus not universal to everyone's experience.
Good luck! Hope this is helpful.
Before anyone jumps in and says, "Oh!! that's a bad idea", I know it is.
I want to keep both the key and value in the query string to be not easily visible to the end user.
I have something like this google.com/?category=textile&user=user1
I need to make it unintelligible like this: google.com/?kasjdhfkashasdfsf32423
Is there any way to achieve this in javascript. I have already seen this
I have already seen this
and this.
but I don't think encoding will solve the problem. Also, this code is entirely in client side. I know that it is not secure but I just need this is a naive, weak defense.
Please help.
Edit
I apologize if my question was not clear earlier.
The URL google.com/?category=textile&user=user1 is being passed on from a different application.
The values passed in the query string directly controls what is being displayed to the user. As is, anyone with no technical knowledge can easily change the value and view the data corresponding to a different category or user. I need to make this unintelligible so that it is not obvious. If a user is a techie and figures out the encryption used, then it is fine. I need a stop-gap solution till we have a better architecture in place
You can use base64. Javascript has native functions to do that :
alert(btoa("category=textile&user=user1")); // ==> Y2F0ZWdvcnk9dGV4dGlsZSZ1c2VyPXVzZXIx
and to reverse it :
alert(atob("Y2F0ZWdvcnk9dGV4dGlsZSZ1c2VyPXVzZXIx")); // ==> category=textile&user=user1
Be careful to read the doc if you have unicode strings, it's a little different : https://developer.mozilla.org/en-US/docs/Web/API/Window.btoa
If you don't looking for serious strong crypto, you can use ROT13:
http://en.wikipedia.org/wiki/ROT13
This is enough for slightly obfuscate keys/values in the your URLs.
So i am using the jQuery UI library to open new dialog windows, when the new dialog windows are opened I am passing some parameters like this
open modal
The site works fine and no issues at all, my custompage.html just picks up those values that were passed and they are being used on the page, something like this:
var a = customfunctionget(param1); var b = customfunctionget(param2)....
I just received a report that we are vulnerable to Cross-Site Scripting attacks by replacing any of the params with something like this:
><script>alert(123)</script><param
Which I understand correctly what is supposed to happen but on any browser that I try to inject the script the alert is never displayed so the "script/injection" is not being processed, the custompage.html stops working as expected since we need the values to be entered correctly but there is nothing I can do on that respect.
Is there a magic pill that I am missing here? Most of the XSS information that I find does the same thing, try to inject an alert through a tag but other than me denying to display any content if the parameter is not well formed I dont know what else can be done.
Any recommendations, tutorials welcome.
One of the easiest things you can encode all <, >, and & characters with <, >, and &, respectively. Whenever a browser sees a <something> it thinks its a dom element. If you encode those characters, the browser will actually display them. This will foil people trying to execute <script>badstuff</script> on your site.
Note that people won't be able to do things like add <b> tags to things if you do this.
The above suggestion is a first step, but is by no means exhaustive.
I just found this, which seems like a good guide.
There is encodeURIComponent() function in Javascripts to encode special characters to avoid inserting scripts
I use DropBox and I've had some trouble reaching to my files from other computers:
I not always want to login to anything when I'm in a public computer, but I like being able to reach my stuff from wherever I am.
So I've made a simple application that when put in the public folder, ran and given the right UID, creates (still in your public folder) an HTML of all the content in the folder (including subfolders) as a tree of links.
But I didn't risk loading it anywhere, since there are slightly private things in there (yes, I know that the folder's name is "PUBLIC").
So I've came up with the idea to make it a simple login page, given the right password, the rest of the page should load. brilliant!, but how?
If I did this by redirecting to other HTML on the same folder, I'd still put the html link in the web history and the "url's accessed" history of the administrator. So I should generate itin the same page.
I've done it:
alt text http://dl.dropbox.com/u/3045472/validate.png
And currently the page is a textbox and a button, and only if you type in the right password (defined in the generator) the rest of the page (with the link-tree) loads. The fault is that everything (password, URL's) is easily reachable through the source code.
Now, assuming I only want to avoid silly people to get it all too easily, not make a bulletproof all-content-holding NSA certified website, I though about some ways to make these information a bit harder to get.
As you may have already figured, I use a streamwritter to write an html file (head, loop through links, bottom), then it's extremely configurable, and I can come up with a pretty messy-but-working c# code, though my javascript knowledge is not that good.
Public links in DropBox look like this:
Summarizing: How do I hide the URL's ande the password to show them (MAINLY the password, of course) in my source-code so that no that it should require some effort on reading ?
P.S.: It's not that personal, if someone REALLY wants it, it could never be 100% protected, and if it was that important, I wouldnt put it in the public folder, also, if the dude really wants to get it that hard, he should deserve it.
P.S. 2.: "Use the ultra-3000'tron obfuscator!!11" is not a real answer, since my javascript is GENERATED by my c# program.
P.S. 3.: I don't want other solutions as "use a serverside application and host it somewhere to redirect and bla bla" or "compress the links in a .RAR file and put a password in it" since I'm doing this ALSO to learn, and I want the thrill of it =)
Update 1:
The one answer so far gives a perfect way (according to this question) to hide my password.
Now I want a good way to hide the URL's, maby a code snippet of the example URL I gave being composed, and if it's too tricky, maby how to generate it in C#, or anything ?
Update 2:
I thought about maybe making three "obfuscating methods" and choosing them randomly in the runtime. So anyone who figures out how to read one XML, could only read about one third of them, and maybe having a hard time finding the other rest of this third..
Update 3:
Just thought about REGEX, the URL could be neatly crowded by dummy not-url-allowed characters added randomly that would be removed by something like:
regex.replace(url, ^[^\w\d/:-\.%]+$,"")
So the nosy dude should have to be pretty advanced into programming somehow, eh? could anyone tell me if it would work or not ?
Well, as it seems you already know, this is a rather poor choice of security mechanism, but if you insist...
Don't store the actual string in the source. Store, for example, its MD5 hash. Then, when the user types in a password, compute its MD5 hash and compare it with the expected one.
Check out:
MD5 in JavaScript
MD5 in C#
To elaborate on miorel's idea, you can also encrypt the whole page, using password as a key. Basically, encode all content into one big string, ask for the password and decrypt that string. If the password is wrong, it will show loads of rubbish, that is it. Like
content = "encrypted string"
function decrypt(str, key) { your algorithm of choice here }
document.write(decrypt(content, prompt('Password?')))
The only thing you need is a decrypt implementation in javascript - but that's easy to google out, for example here or here.
This also renders the separate 'login' page useless.
Granted, this is akin to asking how you can strip in public without people seeing you, but given that, I'm assuming that the password you are trying to store is the one to DropBox. I suppose you could obfuscate the password and store it in a cookie. That would at least prevent someone from simply viewing the source to see the password, but obviously wouldn't stop someone running something like Fiddler and seeing it.
[snipped server side suggestion]
EDIT: To munge the Urls, why don't you simply build the urls on the fly and have the links call a javascript function to get the url? Your server-side code would populate an array in this function with obfuscated urls and the calling code would simply pass an index into the array. Thus, on viewing the source, there would be no instances of "http" anywhere other than static unsecure links.
ADDITION Ok. now that I have a better bead on the problem, it is easier to devise solution. There are libraries for doing encryption on the net in javascript (e.g. http://point-at-infinity.org/jsaes/) but the problem comes down to key management. Since its javascript, it is going to be public but there are hoops you can devise to make it harder to determine the key. In general, those tricks involve indirection. For example, store a lengthy stream of random characters (e.g. 40-50 or more) that is generated by your C# code and stored in the HTM file. In addition, the C# code would would store into your javascript function an array numeric values that represent pointers into the long stream of text that were used by the C# code to encrypt the passwords (or just the whole url).