Skill Set - Heap for Top K Numbers


Skills for tackling LeetCode problems - Heap for Top K Numbers.

Heap can be used to efficiently solve problems dealing with K elements at a time from a set of given elements.

1. Top K Numbers

Given an unsorted array of numbers, find the K largest numbers in it.

We use a Min-Heap storing numbers in ascending order. Every time we find a larger number than the smallest number in the heap, we take out the smallest number and insert the larger number into the heap.

# Time complexity O(Klog K + (N - K)*log K) = O(N log K). It takes O(log K) to extract or insert the min number from the heap. We insert K numbers in the heap and then iterate through the remaining numbers, at every step, we extract the min and insert a new.
# Space complexity O(K)

from heapq import *

def find_k_largest_numbers(nums, k):
    minHeap = []
    for i in range(k): # add first k numbers
        heappush(minHeap, nums[i])
        
    for i in range(k, len(nums)): # replace the min in heap with the larger number in array.
        if nums[i] > minHeap[0]:
            heappop(minHeap)
            heappush(minHeap, nums[i])
            
    return list(minHeap)

2. Top K Elements

Given an unsorted array of numbers, find kth smallest number in it. Please note that it is the kth smallest number in the sorted order, not the kth distinct element.

We use a Max-Heap storing numbers in descending order. Every time we find a smaller number than the max number in the heap, we take out that largest number and insert the smaller number into the heap.

# Time complexity O(N log K). Space complexity O(K).
def find_Kth_smallest_number(nums, k):
    maxHeap = []
    for i in range(k): # add first k numbers
        heappush(maxHeap, -nums[i])
        
    for i in range(k, len(nums)): # replace the max in heap with the smaller number in array
        if -nums[i] > maxHeap[0]:
            heappop(maxHeap)
            heappush(maxHeap, -nums[i])
    return -maxHeap[0]

Alternative approach: we can also use a Min Heap storing all the numbers and extract top K from it.

# Time complexity O(N). Space complexity O(N).
def find_Kth_smallest_number(nums, k):
    minHeap = []
    for i in range(len(nums)):
        heappush(minHeap, nums[i])
    return minHeap[k-1]

3. K Closest Points to the Origin

Given an array of points in a 2D plane, find K closest points to the origin.

# Time complexity O(N log K). Space complexity O(K).

# class Point:
#     def __init__(self, x, y):
#         self.x = x
#         self.y = y
#     def __lt__(self, other):
#         return self.distance_from_origin() > other.distance_from_origin()
#     def distance_from_origin(self):
#         return (self.x * self.x) + (self.y * self.y)
#     def print_point(self):
#         print("[" + str(self.x) + ", " + str(self.y) + "] ", end='')
        
def find_closest_points(points, k):
    maxHeap = []
    
    for i in range(k):
        heappush(maxHeap, points[i])
        
    for i in range(k, len(points)):
        if points[i].distance_from_origin() < maxHeap[0].distance_from_origin():
            heappop(maxHeap)
            heappush(maxHeap, points[i])
            
    return list(maxHeap) # the numbers are not sorted

4. Connect Ropes

Given N ropes with different lengths, we need to connect these ropes into one big rope with minimum cost. The cost of connecting two ropes is equal to the sum of their lengths.

Following a greedy approach to connect the smallest ropes first will ensure the lowest cost. We use Min Heap to find the smallest ropes. Once we connect two ropes, we need to insert the resulting rope back in the heap so that it can connect with the remaining ropes.

# Time complexity O(N log N). Space complexity O(N).
def minimum_cost_to_connect_ropes(ropeLengths):
    minHeap = []
    for i in ropeLengths: # add all numbers to the heap
        heappush(minHeap, i)
        
    result, tmp = 0, 0
    while len(minHeap) > 1:
        tmp = heappop(minHeap) + heappop(minHeap) # add two smallest numbers
        result += tmp 
        heappush(minHeap, tmp) # insert back to the heap
        
    return result

5. Top K Frequent Numbers

Given an unsorted array of numbers, find the top K frequently occurring numbers in it.

We use a HeapMap to count the frequency of numbers. We then use Min Heap to find the K most frequently occurring numbers.

# Time complexity O(N + N log K). Space complexity O(N).
def find_k_frequent_numbers(nums, k):
    
    Hash = {} # get the frequency of numbers
    for i in nums:
        Hash[i] = Hash.get(i, 0) + 1
        
    minHeap = []
    for num, count in Hash.items():
        heappush(minHeap, (count, num))
        if len(minHeap) > k: 
            heappop(minHeap) # remove smallest numbers
            
    result = []
    while minHeap:
        result.append(heappop(minHeap)[1])
        
    return result

6. Frequency Sort

Given a string, sort it based on the decreasing frequency of its characters.

# Time complexity O(D log D) where D is the number of distinct characters in the input string. 
# Space complexity O(N).

def sort_character_by_frequency(str):
    Hash = {}
    for char in str:
        Hash[char] = Hash.get(char, 0) + 1
    maxHeap = []
    
    for char, count in Hash.items():
        heappush(maxHeap, (-count, char))
        
    result = []
    while maxHeap:
        count, char = heappop(maxHeap)
        for _ in range(-count):
            result.append(char)
    return ''.join(result)

7. Kth Largest Number in a Stream

Design a class to find the kth largest element in a stream of numbers. The class should have the following two things.1) the constructor of the class should accept an integer array containing initial numbers from the stream and an integer K. 2) The class should expose a function add(int num) which will store the given number and return the kth largest number.

# Time complexity O(log K) taken by 'add' to insert element to heap. Space complexity O(K).
class KthLargestNumberInStream:
    minHeap = []
    def __init__(self, nums, k):
        self.k = k
        
        for num in nums:
            self.add(num)
            
    def add(self, num):
        heappush(self.minHeap, num)
        if len(self.minHeap) > self.k:
            heappop(self.minHeap)
        return self.minHeap[0] # the kth largest number

8. K Closest Numbers

Given a sorted number array and two integers ‘K’ and ‘X’, find ‘K’ closest numbers to ‘X’ in the array. Return the numbers in the sorted order. ‘X’ is not necessarily present in the array.

First find the closest number ‘Y’ to ‘X’ by binary search. We then can search in both direction to find the K closest numbers. This can be done by pushing the neighbor numbers in a Min Heap, sorted by the absolute difference from ‘X’. Finally we can extract the top ‘K’ numbers from the Min Heap.

# Time complexity O(log N + K log K). O(log N) for binary search and O(K log K) for inserting  numbers in Min Heap, and sort the result list. 
# Space complexity O(K).
def find_closest_elements(arr, K, X):
    index = binary_search(arr, X) # find X by binary search
    
    l, r = index - K, index + K # search the neighbors
    l = max(l, 0)
    r = min(r, len(arr) - 1)
    
    minHeap = []
    for i in range(l, r + 1):
        heappush(minHeap, (abs(arr[i] - X), arr[i])) # sort by the abs difference between the neighbor and X
        
    result = []
    for _ in range(K): # append the closest K number into result
        result.append(heappop(minHeap)[1])
        
    result.sort()
    return result
    
def binary_search(arr, target):
    l, r  = 0, len(arr) - 1
    while l < r:
        m = l + (r - l) // 2
        if arr[m] == target:
            return m
        if arr[m] < target:
            l = m + 1
        else:
            r = m
    return l

Alternative approach using two pointers: Use binary search to find the number X. Then use two pointers to move back and forward to find the closest neighbors which gives the smallest difference. To keep the resulting list sorted we can use a Queue.

# Time complexity O(log N + K). O(log N) for binary search and O(K) for using two pointers. 
# Space complexity O(1).

from collections import deque
def find_closest_elements(arr, K, X):
    result = deque()
    index = binary_search(arr,X) # find the number X by binary search
    p1, p2 = index, index + 1 # two pointers
    n = len(arr)
    
    for _ in range(K):
        if p1 >= 0 and p2 < n: # two pointers are in the range
            if abs(X - arr[p1]) <= abs(X - arr[p2]): # arr[p1] is closer
                result.appendleft(arr[p1]) # since arr[p1] is smaller than X
                p1 -= 1 # move to the left
            else: # arr[p2] is closer
                result.append(arr[p2]) # since arr[p2] is larger than X
                p2 += 1 # move to the right
        elif p1 >= 0: # p2 is out of range
            result.appendleft(arr[p1]) # only consider arr[p1]
            p1 -= 1 # move to the left
        elif p2 < n: # p1 is out of range
            result.append(arr[p2]) # only ocnsider arr[p2]
            p2 += 1 # move to the right
    return result

9. Maximum Distinct Elements

Given an array of numbers and a number K, we need to remove K numbers from the array such that we are left with maximum distinct numbers.

We first count the frequency by HashMap and then use a Min Heap sorted by the frequencies. Follow a greedy approach, we will remove the least frequent number from the heap and try to make it distinct.

# Time complexity O(N log N + K log N). Inserting N numbers to the heap takes O(N log N). Extracting K numbers will take O(K log N).
# Space complexity O(N) taken by HashMap.

def find_maximum_distinct_elements(nums, k):
    distinctCount = 0
    if len(nums) <= k: # nothing left after removal
        return distinctCount
    
    Hash = {} # count the frequencies
    for i in nums: 
        Hash[i] = Hash.get(i, 0) + 1
    
    minHeap = []
    for num, count in Hash.items():
        if count == 1: # distinct
            distinctCount += 1
        else: # only include duplicates
            heappush(minHeap, (count, num))
            
    while k > 0 and minHeap:
        count, num = heappop(minHeap)
        k -= count - 1 # delete duplicates to make num distinct
        if k >= 0: # if k < 0, we cannot remove anymore
            distinctCount += 1
            
    if k > 0: # if we still need to remove, remove distinct numbers
        distinctCount -= k
        
    return distinctCount

10. Sum of Elements

Given an array, find the sum of all numbers between the k1’th and k2’th smallest elements of that array.

We first insert all numbers into a Min Heap, then remove the first ‘k1’ smallest numbers, then extract the next ‘k2 - k1 - 1’ numbers out of the heap. The sum of them will be the answer.

# Time complexity O(N log N). Space complexity O(N). N is the total input numbers.

def find_sum_of_elements(nums, k1, k2):
    minHeap = []
    for i in nums:
        heappush(minHeap, i)
        
    for _ in range(k1):
        heappop(minHeap)
    
    SUM = 0
    for _ in range(k2 - k1 - 1):
        SUM += heappop(minHeap)
    
    return SUM

Alternative approach: we can use a Max Heap to keep track of the top k2 numbers, and then add the top k2-k1-1 numbers in the max heap to find the sum of all numbers.

# Time complexity O(N log K2). Space complexity O(K2)
def find_sum_of_elements(nums, k1, k2):
    maxHeap = []
    for i in range(len(nums)):
        if i < k2 - 1: # add k2 numbers into Max Heap
            heappush(maxHeap, -nums[i])
        elif nums[i] < -maxHeap[0]: # add smaller numbers
            heappop(maxHeap)
            heappush(maxHeap, -nums[i])
            
    SUM = 0
    for _ in range(k2 - k1 -1):
        SUM += -heappop(maxHeap)
    return SUM

11. Rearranging String

Given a string, find if its letters can be rearranged in such a way that no two same characters come next to each other.

Use a HashMap to count the frequencies of the letters. In each step, we should append the letter to the result list in the order from the highest frequency to the lowest frequency.

# Time complexity O(N log N). Space complexity O(N).
def rearrange_string(str):
    Hash = {}
    for char in str: # count the frequencies of the letters
        Hash[char] = Hash.get(char, 0) + 1
        
    maxHeap = [] 
    for char, count in Hash.items(): # append to max heap ordered by frequencies
        heappush(maxHeap, (-count, char))
    
    preChar, preCount = None, 0
    result = []
    while maxHeap:
        count, char = heappop(maxHeap)
        
        if preChar and -preCount > 0: # add back if the frequency is still greater than 0
            heappush(maxHeap, (preCount, preChar))
            
        result.append(char)
        preChar = char
        preCount = count + 1
        
    # there cannot be two same letters near to each other, if it happens the while loop will terminate early
    return ''.join(result) if len(result) == len(str) else ""

  TOC