time complexity
big O notation
Big O notation measures the asymptotic growth of a function. f (n) = O(g(n)) if for all sufficiently large n, f (n) is at most a constant factor larger than g(n).
Ω and Θ notation
We say f (n) = Ω(g(n)) if g(n) = O(f (n)).
We say f (n) = Θ(g(n)) if f (n) = O(g(n)) and g(n) = O(f (n)).
types of complexity
Worst-case complexity: what is the largest possible running time on any input of size n?
Average-case complexity: what is the average running time on a random input of size n?
Best-case complexity: what is the smallest possible running time on any input of size n?
Graph algorithm
ways of representing graphs
adjacency matrix
For graph with n vertices this is an n × n matrix A, where $A{ij}$ = 1 if there is an edge from node i to node j, $A{ij}$ = 0 otherwise.
If the graph is undirected, the matrix is symmetric
adjacency lists
For each vertex, keep a list of its neighbors.
incidence matrix
The incidence matrix of an undirected graph with n vertices and m edges is an n × m matrix B where $B{ij}$ = 1 if the i’th vertex is part of the j’th edge, $B{ij}$ = 0 otherwise.
Two fundamental graph exploration algorithms
Depth First Search (DFS)
Breadth First Search (BFS)
For the BFS tree, this gives the shortest (fewest number of steps) paths from s to all other nodes
Greedy Algorithms
Greedy algorithms are algorithms that build a solution step by step by always choosing the currently best option.
Interval Scheduling
Input: A list of intervals
Output: Maximum number of these intervals that can be chosen without getting any overlaps.
Solution:Pick the one that ends first.
Prove correctness of such an algorithm: Common strategy for analyzing greedy algorithms: prove that the algorithm always “stays ahead” of the optimal solution.
Job Scheduling With Minimum Lateness
Input: A list of jobs, each job has a deadline di
, and a duration ti
(how long it takes to finish the job)
Output: Smallest possible maximum lateness in a schedule for doing
all jobs.
Solution: Pick the job with smallest di.
Shortest path
It is helpful to instead consider a more general problem. Let us try
to find the shortest paths from s to all other vertices: Dijkstra’s algorithm: we have some set D of vertices we have found
the shortest path to, and each step we add a new vertex to D. add the vertex outside D which is closest to
s when using only vertices in D as intermediate vertices.
Divide & Conquer
Algorithms that split the input into significantly smaller parts, recursively solves each part, and then combines the subresults (somehow).
Merge sort
O(n log n).
Polynomial multiplication
$T(n) = O(n^{1.59})$.
Using FFT, get time O(n log n) for Polynomial Multiplication
Unit cost model and Bit cost model
Unit cost model: assume all numbers fit in machine registers so that basic arithmetic operations take constant time.
Bit cost model: account for size of numbers and the time it takes to manipulate them.
Integer multiplication
Karatsuba’s algorithm: $T(n) = O(n^{1.59})$.
Master Theorem
Dynamic Programming
Split a problem into smaller subproblems such that results from one
subproblem can be reused when solving others
Fibonacci numbers
The Fibonacci numbers are a classic number sequence in
mathematics, defined by the linear recurrence
f0 = 0; f1 = 1; and fn = fn−1 + fn−2 for n ≥ 2
Weighted Interval Scheduling
Input: A list of intervals [s1; t1]; [s2; t2]; : : : ; [sn; tn], each interval
[si
; ti
] has a weight wi
Output: What is the maximum total weight of these intervals that
can be chosen without getting any overlaps
Knapsack
Input: A capacity C and a list of objects, each object has a value vi
and weight wi
Output: Subset S of objects such that
$\sum{i∈S} wi ≤ C$ and $\sum{i∈S} vi$ is maximized.
top-down and bottom-up fashion
top-down fashion: we start at the end result and
recursively compute results for relevant subproblems.
bottom-up fashion: we iteratively compute results for larger and larger subproblems.
Characteristics of dynamic programming
A problem is amenable to dynamic program if we can define a set
of subproblems such that:
- The number of different subproblems is as small as possible.
- There is some ordering of subproblems from “small” to “large”
- The value of a subproblem can be efficiently computed given
the values of some set of smaller subproblems.
Sequence Alignment
Input: Strings x and y of lengths m and n, parameters ‹ and ¸
Output: Minimum cost of an alignment of x and y with parameters $\sigma$ and $\alpha$.
$\alpha is the cost of aligning two different characters with each other
$\sigma$ is the cost of not aligning a character
Matrix Chain Multiplication
Network Flow
The Max-Flow problem
Input: Flow network G.
Output: Flow f maximizing the value v(f ).
Solution: The Ford-Fulkerson Algorithm O(C(m + n))
or the scaling algorithm with O(m2
log(C)) or Edmonds-Karp algorithm with O(nm(n + m)) .
Edge Cuts
An edge cut of a graph is a set of edges such that their removal would disconnect the graph.
Minimum s-t-Cut
Input: A flow network G with source s and sink t.
Output: An s-t cut A; B of G minimizing the capacity c(A; B).
The Max-Flow-Min-Cut Theorem
For every flow network G, the maximum flow from s to t equals the
minimum capacity of an s-t cut in G.
Vertex Cuts
A vertex cut in a graph is a set of vertices such that if we remove
them, the graph splits into more than one connected component.
Matchings
A matching in a graph is a set M of edges such that no vertex appears in more than one edge of M.Of particular interest to us will be bipartite graphs.
Maximum Bipartite Matching
Input: A bipartite graph G
Output: A matching M in G of maximum possible size.
Edge-Disjoint Paths
Given a directed graph with source and sink, what is maximum
number of edge-disjoint paths from s to t?
(edge-disjoint = no edge used by more than one path)
Project Selection?