Challenge

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.
For example, “ACGAATTCCG” is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA chain.

Solution 1 - Iteration

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        res = set()
        _set = set()
        for i in range(len(s)):
            seq = s[i:i+10]
            if seq in _set and seq not in res:
                res.add(seq)
            _set.add(seq)
        return res

Time Complexity: 0(n*m)

Solution 2 - Rabin-Karp

class Solution:

    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        h = 0
        p  = 4
        tl = 10
        dic = {'A':1, 'C':2, 'G':3, 'T':4}
        nums = [dic[e] for e in s]
        seen = set()
        res = set()
        # mod no need
        for e in range(len(s) - 10 + 1):
            if e == 0:
                j = 9
                for i in range(10):
                    h += nums[i] * p ** j
                    j -=1
            else:
                previous_hash = nums[e-1] * (p ** 9)
                h = (h - previous_hash) * p + (nums[e + 9])
            if h in seen:
                res.add(s[e:e+10])
            seen.add(h)
        return res

Time complexity: O(n)