Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FTR: This can be done with "only" two loops (opposed to the "naive" 3 in the article) -- without any extra data structures. It's sufficient to keep track of how many characters have been unique so far. For each "new" character, check whether it's different from all of them. If so, increase count. Otherwise, reset count to the distance to the match.

  def main():
    count = 1
    for i in range(1, len(SIGNAL)):
      for j in range(0, count):
        if SIGNAL[i - j - 1] == SIGNAL[i]:
          count = j
          break

      count = count + 1
  
      if count == 4:
        print(i + 1)
        break
If we consider the word length a constant this should be O(n).


I think you can also speed this up by incrementing i to the next candidate window end (increment by window_length - count). I.e. given "abcdefggqwertyu", if your previous window was "[abcdefg]" and you see that "g" conflicted so you reset count to 0, the next possible window is "[gqwerty]". So you don't have go through the intermediate windows like "[efgqgq]" because you know that count can never be 4 within there.

Edit: I think your solution already effectively does this, it's just that because you maintain invariant that the window always contains unique chars to avoid any additional datastructure, once the window start gets reset to "[g]" it needs to build back up the intermediate state one char a time to works it way back up to [gqwerty]. So the best case is approximately linear if count gets reset often, worst case is O(NW). The method I was thinking of directly tests the next window starting from its endpoint backwards, to do this you need a hashet. I think that would achieve on best case O(N/W) effectively sublinear, worst case is still O(NW) though.


That was my conclusion too, this is a super easy problem with a fixed window length. The bit hashing is neat but I was too confused by what seemed to be an over-complication of the problem to really appreciate it. Did you and I both miss something here?


I don't think so. As much as I love using bit operations: in this case I'd actually prefer a table of character counts for a "true" O(n) solution, as bit counting isn't guaranteed to be a "native" operation.


The article started with 3 loops, took it down to 2 then one. Why are you talking about 2 loops?


The article took it down to 2 by adding an extra hashtable. It's possible to go down to 2 without using an extra data structure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: