Python Reference Sheet for Interviews
A quick reference for coding interviews in Python, along with a few exercises.
Note: This is a non-exhaustive and ad-hoc post that I created for a peer.
Collections
- For transformations and filters, use list comprehensions:
nums = [-4, -2, 1, 5, 9] squares = [i**2 for i in nums]
- For iterating through a collection of tuples that are record-like, i.e. fixed format, use tuple-unpacking:
records = [ ("Steve", 10), ("Okoye", 5), ("Thor", 15), ("Steve", 9), ("Steve", 2) ] for name, votes in records: # do something
- Use sets when the desired output has unique elements:
candidates = {name for name, _ in records}
- Use generators for collections that are iterated through only once/are expensive to hold in memory at once:
def memory_heavy(database_conn): # Loads 1 million records into memory all_records = [database_conn.get(k) for k in range(1000000)] # Makes a list of 1 million scores scores = [score for name, score in all_records] return max(scores)
vs
def memory_light(database_conn): # all_records has the type `generator` # It is lazily evaluated, i.e. each item is fetched only on use all_records = (database_conn.get(k) for k in range(1000000)) # Similarly, scores is also lazily evaluated # If you were to use # [score for name, votes in all_records] # It would generate the entire list of 1 million items in memory scores = (score for name, score in all_records) return max(scores)
Remember that generators can be iterated through only once.
- Use memoization for recursive functions that may be queried more than once. For example, if you need to find the number of permutations of
n
distinct boys andm
distinct girls inn
boy slots andm
girl slots respectively, you needn! * m!
.def no_of_permutations(n: int, m: int) -> int: _max = max(m, n) + 1 factorial = [1] * _max for i in range(2, _max): factorial[i] = factorial[i - 1] * i return factorial[m] * factorial[n]
- If you are unable to come up with the memoized version, you can leverage the
@lru_cache
decorator:from functools import lru_cache @lru_cache() def factorial(n: int) -> int: if n == 0 or n == 1: return 1 return n * factorial(n - 1)
This will compute the factorial for each
n
only once. - Use
zip
to iterate through two collections in lock-step:names = ["Peter", "Carl", "Emma"] votes = [20, 10, 50] for name, vote in zip(names, votes): print(f"{name} received {vote} votes")
Sorting
Use sorted
along with the key
and reverse
args to leverage the most out of Python’s built-in sorting capabilities.
Sorting by number of votes
# Assuming records have the format (name, total_votes, timestamp)
def sort_by_votes(records):
# key is a function that accepts a single element
# and returns the parameter which you wish to use for sorting
return sorted(records, key=lambda x: x[1])
Sorting by timestamp, with latest record first
def sort_by_timestamp(records):
return sorted(records, key=lambda x: x[2], reverse=True)
Sorting by property
Suppose records
was a list of elements of the type VoteRecord
class VoteRecord:
def __init__(self, name, total_votes, timestamp):
self._name = name
self._total_votes = total_votes
self._timestamp = timestamp
then your key can be more descriptive:
sorted(records, key=lambda x: x._total_votes)
If you have a function that extracts some property from the element, you can use it directly
def name_length(record):
return len(record._name)
# Sorts records by increasing length of name
sorted(records, key=name_length)
Multi-key sorting a.k.a breaking ties
Tuples also have a natural sorting order, which you can leverage to sort collections based on more than one property.
For example, this sorts the records in ascending order of votes, breaking ties by ascending order of name
sorted(records, key=lambda row:(row._total_votes, row._name))
Graphs
Assume that the input is of the format List[Tuple[Int, Int, Int]]
, i.e. a list of tuples where each tuple is of the format (src, dest, weight). This is flexible enough to all four kinds of the Punnett square of graphs (directed/undirected * weighted/unweighted)
The total number of nodes in the graph can either be specified, or derived as:
def number_of_nodes(tuples: List[Tuple[Int, Int, Int]]) -> int:
srcs = (src for src, _, _ in tuples)
dests = (dest for _, dest, _ in tuples)
return max(max(srcs), max(dests))
Representations
There are two popular formats for dealing with graphs:
Adjacency Matrix
def make_adjacency_matrix(
tuples: List[Tuple[Int, Int, Int]],
n: int
) -> List[List[Int]]:
"""
Arguments
---------
tuples: A list of graph records of the format (src, dest, weight)
n: The total number of nodes in the graph
Returns
-------
matrix: A 2-D matrix where matrix[src][dest] is equal to
the weight of the edge from src to dest
"""
# [[math.inf]*n]*n does not work because arrays are pass-by-reference
# So the same internal array reference is copied n times
# which is not what we want
matrix = [[math.inf]*n for _ in range(n)]
for src, dest, weight in tuples:
matrix[src][dest] = weight
return matrix
The advantage this offers is that you can query for the weight between two nodes in O(1)
.
Adjacency List
def make_adjacency_list(
tuples: List[Tuple[Int, Int, Int]],
n: int
) -> Dict[Int, List[Tuple[Int, Int]]]:
"""
Arguments
---------
tuples: A list of graph records of the format (src, dest, weight)
n: The total number of nodes in the graph
Returns
-------
adj_list: A dict where adj_list[src] is a list of tuples (dest, weight)
"""
adj_list = defaultdict(list)
for src, dest, weight in tuples:
adj_list[src].append((dest, weight))
return adj_list
The advantage is that you can get all the neighbors of a node in O(1)
. This is particularly useful when the graph is sparse, and a adjacency matrix would be mostly empty (count(math.inf
) ~= n2)
Algorithms
We use a deque
i.e. a double-ended queue that serves as a stack as well as a queue, depending on how you use it.
We assume an unweighted graph for these.
BFS
from collections import deque
def bfs(adj_list, start):
queue = deque([start])
visited = set()
while queue:
# Dequeue the next node to visit
current = queue.popleft()
# If we have already processed this node, skip
if current in visited:
continue
# Do something with the current node
# You could print it, add it to a data structure, perform some mutation on it
process(current)
# Add before we see adjacent nodes
# This is important because a node may have an edge to itself
# And we want to avoid cycles
visited.add(current)
# all nodes reachable from the current node that have not been visited
# Note that this is a generator
to_visit = (
node
for node in adj_list[current]
if node not in visited
)
# Add all these nodes to the queue
queue.extend(to_visit)
# Note that the queue may contain duplicate elements, but it is guaranteed
# that we process each node only once because of the visited set
DFS
from collections import deque
def dfs(adj_list, start):
stack = deque([start])
visited = set()
while stack:
# Pop the next node to visit
# Note that this is the only difference from bfs
current = stack.pop()
if current in visited:
continue
process(current)
visited.add(current)
to_visit = (
node
for node in adj_list[current]
if node not in visited
)
stack.extend(to_visit)
Note how using an adjacency list makes these algorithms fairly simple. If we were using an adjacency matrix, the to_visit
step would need an extra condition:
to_visit = (
node
for node in range(n)
if (
adj_matrix[current][node] and
i not in visited
)
)
This is not just extra code, it also has higher complexity: finding adjacent nodes becomes an O(n)
operation.
Miscellaneous
- Python
math
provides:fabs
: Absolute valuegcd
: Greatest Common Divisor (LCM can be derived as(m*n)/math.gcd(m, n)
)math.sqrt
,math.factorial
: Self-explanatorymath.log
,math.log2
,math.log10
: Natural, base-2 and base-10 logmath.degrees
andmath.radians
: Convert radians to degrees, and degrees to radians, respectively
- Python
itertools
provides:combinations(iterable, r)
: Combinations of sizer
fromiterable
in lexicographic order.combinations(["cat"], 2)
would result in[("c", "a"), ("c", "t"), ("a", "t")]
permutations
is similarproduct(iterable1, iterable2)
results in a cross product. Soproduct("AB", "CD")
would give[('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]
- Use the
join
method on strings to concatenate iterables into strings:"".join(["c", "a", "t"]) # gives "cat" ".".join(["10", "0", "0", "1"]) # gives "10.0.0.1"
- Instead of
5 < a and a < 100
, you can use5 < a < 100
which evaluates to the same thing.
Exercises
- Given a list of numbers, print their GCD and LCM:
gcd_and_lcm([10, 6, 12]) # (2, 60) gcd_and_lcm([18, 12, 32, 5]) # (1, 1440)
- Given a string
s
, generate all possible subsequences of lengthk
:subsequences("cats", 2) # ["ca", "ct", "cs", "at", "as", "ts"] subsequences("cats", 3) # ["cat", "cas", "ats"]
- Given a graph in the above canonical format, find the length of the shortest path from a given node
src
to a nodedest
- Given integers
n
andk
, print everyk
th Tribonacci number up to (and including)n
, where the Tribonacci relation is given by:
Note that $T_0$ is not defined.
tribonacci(n=15, k=5) # [7, 147, 3136]