Simple hashing function that is HTML id-friendly and case sensitive - javascript

I get some strings from an external source, and I display them in spans on my page.
I need a way to get back to those strings using document.getElementById() or jQuery's $("#XXXX"), so along with each string I get some sort of an identifier, I use that identifier as the ID of the span.
The problem is that the identifier I get could contain chars like + for example. Which is not allowed as a value for the id attribute http://www.w3schools.com/tags/att_standard_id.asp
Additionally, these identifiers are case-sensitive. So I thought of using a hashing function like SHA or MD5, to hash the identifiers I get, then use them as ids for my spans, and I can apply the hashing function again to find my element.
This seems complicated for such a simple functionality. Is there a better way to do this? or maybe a very simple hashing function that would guarantee id-friendly chars and case-sensitivity? (HTML's id is not case-sensitive, that's another reason to consider hashing functions)

Can you ditch the identifier you get and just implement something simple like this:
var counter = 0;
function uniqueId(){
return "Id" + ++counter;
}

You could just increment the id's with a number and some sort of string to begin the ID.
The span id's would be "a1", "a2" etc.

I'm guessing that the problem is that you're thinking later you'll be getting the same strings and will want to transform them in the same way, and then use these to find the original corresponding elements?
If so, you'll just need to sanitize your strings carefully. A series of regular expressions could help you map from invalid to valid characters, and make the capitals unique. For instance, you could transform "A" into "-a-", and "+" into "-plus-".
A carefully chosen scheme should guarantee that the chances of a collision (i.e. someone giving you a string that looks like an escaped version of another string) should be very small, and in any case, detectable immediately.

Related

Target search results number in variable

I'm trying to target the number of search results on our website for each search term so that I can see how many results each one pulls in.
I'm working off of this article, but I can't get the javascript function correct to pull out the number (which could be as high as 2000) and put it into a variable.
<div class="search-results-text"><strong>732 results</strong> found for ‘<strong>search term</strong>’</div>
Hoping someone can help me out with the javascript function that would grab that number before "results". Thanks!
You would probably get away with a custom Javascript variable like this:
function() {
return document.querySelector('.search-results-text strong').innerText.split(" ")[0];
}
The querySelector with the CSS selector gets the Element, innerText is the text without the markup, the split splits the string up by whitespace, which gives you an array, and the first element of that array is your number (array are index starting with zero, so [0] refers to the first element).
This is not particularly elegant (for one you probably want to add some sort of error handling), and you could actually replace document.querySelector('.search-results-text strong').innerText with a DOM type variable in GTM (which by default returns the text of the element).
I don't think you can get the number with CSS selectors alone.

QuerySelector on ID with curly bracket in name

I'm creating a dynamic filter and this is working fine but I've one problem. I'm selecting all filters on a querySelectorAll function combined with a php get function. Unfortunately some of the dynamic content has weird names like:
(art) and more
With a split join function this will result in the following code:
document.querySelector('#(art)_and_more');
This will result into a error cause it's not a valid selection. Does anyone know a way how to solve this?
I would like to keep my names as they are cause it's part of a big system.
If it's an ID, then you'd use getElementById since by definition there can be only one match (IDs must be unique).
var element = document.getElementById("(art)_and_more");
In the general case, you'd use a quoted attribute selector:
var list = document.querySelectorAll("[id='(art)_and_more']");
// or
var list = document.querySelectorAll('[id="(art)_and_more"]');
...but again, IDs must be unique.

How to search for closest tag set match in JavaScript?

I have a set of documents, each annotated with a set of tags, which may contain spaces. The user supplies a set of possibly misspelled tags and I wants to find the documents with the highest number of matching tags (optionally weighted).
There are several thousand documents and tags but at most 100 tags per document.
I am looking on a lightweight and performant solution where the search should be fully on the client side using JavaScript but some preprocessing of the index with node.js is possible.
My idea is to create an inverse index of tags to documents using a multiset, and a fuzzy index that that finds the correct spelling of a misspelled tag, which are created in a preprocessing step in node.js and serialized as JSON files. In the search step, I want to consult for each item of the query set first the fuzzy index to get the most likely correct tag, and, if one exists to consult the inverse index and add the result set to a bag (numbered set). After doing this for all input tags, the contents of the bag, sorted in descending order, should provide the best matching documents.
My Questions
This seems like a common problem, is there already an implementation for it that I can reuse? I looked at lunr.js and fuse.js but they seem to have a different focus.
Is this a sensible approach to the problem? Do you see any obvious improvements?
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
You should be able to achieve what you want using Lunr, here is a simplified example (and a jsfiddle):
var documents = [{
id: 1, tags: ["foo", "bar"],
},{
id: 2, tags: ["hurp", "durp"]
}]
var idx = lunr(function (builder) {
builder.ref('id')
builder.field('tags')
documents.forEach(function (doc) {
builder.add(doc)
})
})
console.log(idx.search("fob~1"))
console.log(idx.search("hurd~2"))
This takes advantage of a couple of features in Lunr:
If a document field is an array, then Lunr assumes the elements are already tokenised, this would allow you to index tags that include spaces as-is, i.e. "foo bar" would be treated as a single tag (if this is what you wanted, it wasn't clear from the question)
Fuzzy search is supported, here using the query string format. The number after the tilde is the maximum edit distance, there is some more documentation that goes into the details.
The results will be sorted by which document best matches the query, in simple terms, documents that contain more matching tags will rank higher.
Is it better to keep the fuzzy step separate from the inverted index or is there a way to combine them?
As ever, it depends. Lunr maintains two data structures, an inverted index and a graph. The graph is used for doing the wildcard and fuzzy matching. It keeps separate data structures to facilitate storing extra information about a term in the inverted index that is unrelated to matching.
Depending on your use case, it would be possible to combine the two, an interesting approach would be a finite state transducers, so long as the data you want to store is simple, e.g. an integer (think document id). There is an excellent article talking about this data structure which is similar to what is used in Lunr - http://blog.burntsushi.net/transducers/

Regex to return all attributes of a web page that starts by a specific value

The question is simple, I need to get the value of all attributes whose value starts withhttp://example.com/api/v3?. For example, if a page contains
<iframe src="http://example.com/api/v3?download=example%2Forg">
<meta twitter="http://example.com/api/v3?return_to=%2F">
Then I should get an array/list with 2 member :http://example.com/api/v3?return_to=%2Fandhttp://example.com/api/v3?download=example%2Forg (the order doesn’t matter).
I don’t want the elements, just the attribute’s value.
Basically I need the regex that returns strings starting with http://example.com/api/v3?and ending with a space.
There is the CSS selector * meaning "any element".
There is no CSS selector meaning "any attribute with this value". Attribute names are arbitrary. While there are several attributes defined in the HTML specs, it's possible to use custom ones like the twitter attribute in your example. This means you'll have to iterate over all the attributes on a given element.
With out a global attribute value selector, you will need to manually iterate over all elements and values. It may be possible for you to determine some heuristics to help narrow down your search before going brute force.
A regular expression would likely look like this:
/http:\/\/example\.com\/api\/v3\?\S+/g
Make sure to escape each / and ? with a backslash. \S+ yields all subsequent non-space characters. You can also try [^\s"]+ instead of \S if you also want to exclude quote marks.
In my experience, though, regexes are usually slower than working on already parsed objects directly, so I’d recommend you try these Array and DOM functions instead:
Get all elements, map them to their attributes and filter those that start with http://example.com/api/v3?, reduce all attributes lists to one Array and map those attributes to their values.
Array.from(document.querySelectorAll("*"))
.map(elem => Object.values(elem.attributes)
.filter(attr => attr.value.startsWith("http://example.com/api/v3?")))
.reduce((list, attrList) => list.concat(attrList), [])
.map(attr => attr.value);
You can find polyfills for ES6 and ES5 functions and can use Babel or related tools to convert the code to ES5 (or replace the arrow functions by hand).

Safely using eval to use variable as an object name

As shown in this example
javascript-use-variable-as-object-name
I am using eval to use a DOM attribute to select an element from an array. Though there is no direct way for the user to change the input, I want to be as secure as possible and make sure that the variable is indeed an integer before I evaluated it.
Which of the following would be the best, most secure, way?
$(".listitem").click(function(){
var id = $(this).attr("record-id");
if(!isNaN(new Number(id))){
Storage.search.nearby.currec = rowsHolder[eval(id)];
}else{
// send email to admin, shut down
}
});
or
$(".listitem").click(function(){
var id = $(this).attr("record-id");
if(parseInt(id)){
Storage.search.nearby.currec = rowsHolder[eval(id)];
}else{
// send email to admin, shut down
}
});
More, but not required info:
Basically I am pulling down a large JSON string from online, containing an array of records. Upon building a table from the info using a for statement ( for(i in array) ), I push each row into an array called rowsHolder and give the tr an attribute of record-id="i". Then when the user clicks the row, I call the method you see above. I am using PhoneGap with JQuery Mobile.
As always, thanks for the input
-D
There is absolutely no reason to use eval here.
If your id is kind of a number, use parseFloat(id) to get it. Unnecessary as it would be converted back to a string when used as a property name, though.
If your id is an integer, use parseInt(id, 10) to get it. Unnecessary as it would be converted back to a string when used as a property name, though.
If your id is a string, just let it be a string. The property name you use it for would be one anyway.

Categories

Resources