Space complexity of finding non-repeating character in string

Space complexity of finding non-repeating character in string - javascript

Here is a simple algorithm exercise. The problem is to return the first non-repeating character. For example, I have this string: 'abbbcdd' and the answer is 'a' because 'a' appears before 'c'. In case it doesn't find any repeated characters, it will return '_'.
My solution works correctly, but my question is about the performance. The problem statement says: "Write a solution that only iterates over the string once and uses O(1) additional memory."
Here is my code:
console.log(solution('abbbcdd'))
function solution(str) {
let chars = buildCharMap(str)
for (let i in chars) {
if (chars[i] === 1) {
return i
}
}
return '_'
}
function buildCharMap(str) {
const charMap = {}
for (let i = 0; i < str.length; i++) {
!charMap[str[i]] ? charMap[str[i]] = 1 : charMap[str[i]]++
}
return charMap
}
Does my answer meet the requirement for space complexity?

The time complexity is straightforward: you have a loop over a string of length n, and another loop over an object with strictly at most n keys. The operations inside the loops take O(1) time, and the loops are consecutive (not nested), so the running time is O(n).
The space complexity is slightly more subtle. If the input were a list of numbers instead of a string, for example, then we could straightforwardly say that charMap takes O(n) space in the worst case, because all of the numbers in the list might be different. However, for problems on strings we have to be aware that there is a limited alphabet of characters which those strings could be formed of. If that alphabet has size a, then your charMap object can have at most a keys, so the space complexity is O(min(a, n)).
That alphabet is often explicit in the problem - for example, if the input is guaranteed to contain only lowercase letters, or only letters and digits. Otherwise, it may be implicit in the fact that strings are formed of Unicode characters (or in older languages, ASCII characters). In the former case, a = 26 or 62. In the latter case, a = 65,536 or 1,112,064 depending on if we're counting code units or code points, because Javascript strings are encoded as UTF-16. Either way, if a is a constant, then O(a) space is O(1) space - although it could be quite a large constant.
That means that in practice, your algorithm does use O(1) space. In theory, it uses O(1) space if the problem statement specifies a fixed alphabet, and O(min(a, n)) space otherwise; not O(n) space. Assuming the former, then your solution does meet the space-complexity requirement of the problem.
This raises the question of why, when analysing algorithms on lists of numbers, we don't likewise say that Javascript numbers have a finite "alphabet" defined by the IEEE 754 specification for floating point numbers. The answer is a bit philosophical; we analyse running time and auxiliary space using abstract models of computation which generally assume numbers, lists and other data structures don't have a fixed limit on their size. But even in those models, we assume strings are formed from some alphabet, and if the alphabet isn't fixed in the problem then we let the alphabet size be a variable a which we assume is independent of n. This is a sensible way to analyse algorithms on strings, because alphabet size and string length are independent in the problems we're usually interested in.

Related

How is this O(1) space and not O(n) space. firstNotRepeatingCharacter Challenge solution

I am having trouble understanding how the following solution is O(1) space and not O(n) space. The coding challenge is as follows:
Write a solution that only iterates over the string once and uses O(1) additional memory, since this is what you would be asked to do during a real interview.
Given a string s, find and return the first instance of a non-repeating character in it. If there is no such character then return '_'.
The following is a solution that is O(1) space.
function firstNotRepeatingCharacters(s: string) : string {
const chars: string[] = s.split('');
let duplicates = {};
let answer = '_';
let indexAnswer = Number.MAX_SAFE_INTEGER;
chars.forEach((element, index) => {
if(!duplicates.hasOwnProperty(element)) {
duplicates[element] = {
count: 1,
index
}
} else {
duplicates[element].count++;
duplicates[element].index = index;
}
});
for(const key in duplicates) {
if(duplicates[key].count === 1 && duplicates[key].index < indexAnswer) {
answer = key;
indexAnswer = duplicates[key].index;
}
}
return answer;
}
console.log(firstNotRepeatingCharacter('abacabad'));
console.log(firstNotRepeatingCharacter('abacabaabacaba'));
I do not understand how the above solution is O(1) space. Since we are iterating through our array we are mapping each element to an object (duplicate). I would think this would be considered O(n), could somebody clarify how this is O(1) for me. Thanks.

The memory usage is proportion to the number of distinct characters in the string. The number of distinct characters has an upper limit of 52 (or some other finite value) and the potential memory usage does not increase as n increases once each of the distinct characters has been seen.
Thus, there exists an upper limit on the memory usage that is constant (does not depend on n), so the memory usage is O(1).

Indeed this is an 0(1) complexity, but only on space constraints. Since we have an upper limit. This limit could be UTF-16, it could be the amount of English letters.
This is a constraint given by the Developer. Saying that, it's only a 0(1) in space constraints if the code above ran with a finite set of combinations.
A String it's limited by implementation to a 64 bit character "array". So the store capacity generally of a "String" type it's 2147483647 (2ˆ31 - 1) characters. That's not really what 0(1) represents. So virtually that's an 0(N) in space constraints.
Now the situation here it's totally different for time complexity constraints. It should be in the optimal scenario a 0(N) + 0(N - E) + 0(N).
Explaining:
1. First 0(N) the first loop goes through all the elements
2. Second 0(N) is about the deletion. The code delete's element's from the array.
3. 0(N - E) the second forEach loops the final popped array, so we have a constant E.
And that's supposing that the data structure is an Array.
There's a lot to Digg here.
TL;DR
It's not a 0(1).

The algorithm has O(min(a,n)) space complexity (where a is number of letters used for text cooding e.g. for UTF8 a>1M). For worst case: string with uniqe characters (in this case n<=a) e.g. abcdefgh the duplicates object has the same number of keys as number letters of input string - and what is clear on this case, the size of used memory depends on n.
The O(1) is only for case when string contains one repeated letter e.g. aaaaaaa.
Bonus: Your code can be "compressed" in this way :)
function firstNotRepeatingCharacters(s, d={}, r="_") {
for(let i=0; i<s.length; i++) d[s[i]]=++d[s[i]]|0;
for(let i=s.length-1; i>=0; i--) if(!d[s[i]]) r=s[i];
return r;
}
console.log(firstNotRepeatingCharacters('abacabad'));
console.log(firstNotRepeatingCharacters('abacabaabacaba'));

Why Javascript ===/== string equality sometimes has constant time complexity and sometimes has linear time complexity?

After I found that the common/latest Javascript implementations are using String Interning for perfomance boost (Do common JavaScript implementations use string interning?), I thought === for strings would get the constant O(1) time. So I gave a wrong answer to this question:
JavaScript string equality performance comparison
Since according to the OP of that question it is O(N), doubling the string input doubles the time the equality needs. He didn't provide any jsPerf so more investigation is needed,
So my scenario using string interning would be:
var str1 = "stringwithmillionchars"; //stored in address 51242
var str2 = "stringwithmillionchars"; //stored in address 12313
The "stringwithmillionchars" would be stored once let's say in address 201012 of memory
and both str1 and str2 would be "pointing" to this address 201012. This address could then be determined with some kind of hashing to map to specific locations in memory.
So when doing
"stringwithmillionchars" === "stringwithmillionchars"
would look like
getContentOfAddress(51242)===getContentOfAddress(12313)
or 201012 === 201012
which would take O(1)/constant time
JSPerfs/Performance updates:
JSPerf seems to show constant time even if the string is 16 times longer?? Please have a look:
http://jsperf.com/eqaulity-is-constant-time
Probably the strings are too small on the above:
This probably show linear time (thanks to sergioFC) the strings are built with a loop. I tried without functions - still linear time / I changed it a bit http://jsfiddle.net/f8yf3c7d/3/ .
According to https://www.dropbox.com/s/8ty3hev1b109qjj/compare.html?dl=0 (12MB file that sergioFC made) when you have a string and you already have assigned the value in quotes no matter how big the t1 and t2 are (e.g 5930496 chars), it is taking it 0-1ms/instant time.
It seems that when you build a string using a for loop or a function then the string is not interned. So interning happens only when you directly assign a string with quotes like var str = "test";

Based on all the Performance Tests (see original post) for strings a and b the operation a === b takes:
constant time O(1) if the strings are interned. From the examples it seems that interning only happens with directly assigned strings like var str = "test"; and not if you build it with concatenation using for-loops or functions.
linear time O(N) since in all the other cases the length of the two strings is compared first. If it is equal then we have character by character comparison. Else of course they are not equal. N is the length of the string.

According to the ECMAScript 5.1 Specification's Strict Equal Comparison algorithm, even if the type of Objects being compared is String, all the characters are checked to see if they are equal.
If Type(x) is String, then return true if x and y are exactly the same sequence of characters (same length and same characters in corresponding positions); otherwise, return false.
Interning is strictly an implementation thingy, to boost performance. The language standard doesn't impose any rules in that regard. So, its up to the implementers of the specification to intern strings or not.

First of all, it would be nice to see a JSPerf test which demonstrates the claim that doubling the string size doubles the execution time.
Next, let's take that as granted. Here's my (unproven, unchecked, and probably unrelated to reality) theory.
Compairing two memory addresses is fast, no matter how much data is references. But you have to INTERN this strings first. If you have in your code
var a = "1234";
var b = "1234";
Then the engine first has to understand that these two strings are the same and can point to the same address. So at least once these strings has to be compared fully. So basically here are the following options:
The engine compares and interns strings directly when parsing the code. In this case equals strings should get the same address.
The engine may say "these strings are two big, I don't want to intern them" and has two copies.
The engine may intern these strings later.
In the two latter cases string comparison will influence the test results. In the last case - even if the strings are finally interned.
But as I wrote, a wild theory, for theory's sage. I'd first like to see some JSPerf.

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms

A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Working with string (array?) of bits of an unspecified length

I'm a javascript code monkey, so this is virgin territory for me.
I have two "strings" that are just zeros and ones:
var first = "00110101011101010010101110100101010101010101010";
var second = "11001010100010101101010001011010101010101010101";
I want to perform a bitwise & (which I've never before worked with) to determine if there's any index where 1 appears in both strings.
These could potentially be VERY long strings (in the thousands of characters). I thought about adding them together as numbers, then converting to strings and checking for a 2, but javascript can't hold precision in large intervals and I get back numbers as strings like "1.1111111118215729e+95", which doesn't really do me much good.
Can I take two strings of unspecified length (they may not be the same length either) and somehow use a bitwise & to compare them?
I've already built the loop-through-each-character solution, but 1001^0110 would strike me as a major performance upgrade. Please do not give the javascript looping solution as an answer, this question is about using bitwise operators.

As you already noticed yourself, javascript has limited capabilities if it's about integer values. You'll have to chop your strings into "edible" portions and work your way through them. Since the parseInt() function accepts a base, you could convert 64 characters to an 8 byte int (or 32 to a 4 byte int) and use an and-operator to test for set bits (if (a & b != 0))

var first = "00110101011101010010101110100101010101010101010010001001010001010100011111",
second = "10110101011101010010101110100101010101010101010010001001010001010100011100",
firstInt = parseInt(first, 2),
secondInt = parseInt(second, 2),
xorResult = firstInt ^ secondInt, //524288
xorString = xorResult.toString(2); //"10000000000000000000"

Javascript array sort speed affected by string length?

Just wondering, I have seen diverging opinions on this subject.
If you take an array of strings, say 1000 elements and use the sort method. Which one would be faster? An array in which the strings are 100 characters long or one in which the strings are only 3 characters long?
I tried to test but I have a bug with Firebug at the moment and Date() appears too random.
Thank you!

It depends what the strings contain, if they contain different characters, the rest of the string doesn't have to be checked for comparison so it doesn't matter.
For example, "abc" < "bca" Here only the first character had to be checked.
You can read the specs for this: http://ecma-international.org/ecma-262/5.1/#sec-11.8.5
Specifically:
Else, both px and py are Strings
If py is a prefix of px, return false. (A String value p is a prefix of String value
q if q can be the result of concatenating p and some other String r. Note that any
String is a prefix of itself, because r may be the empty String.)
If px is a prefix of py, return true.
Let k be the smallest nonnegative integer such that the character at position k within px is
different from the character at position k within py. (There must be such a k, for neither
String is a prefix of the other.)
Let m be the integer that is the code unit value for the character at position k within
px.
Let n be the integer that is the code unit value for the character at position k within
py.
If m < n, return true. Otherwise, return false.

It really depends on how different the strings are, but I guess the differences would be minimal due to the fact that what's called to do the comparison is way slower than actually comparing the strings.
But then again, modern browsers use some special optimizations for sort, so they cut some comparisons to speed things up. And this would happen more often sorting an array of short strings.
And FYI, if you want to make some benchmark, use a reliable tool like jsPerf.

Develop Reference

JavaScript is the programming language of the Web.