- Sorting Algorithm
- Sorting 分类:稳定排序和不稳定排序
- Quick Sort
- Merge Sort - Divide and Conquer
- Binary Search
- Insertion Sort
- Bubble Sort
- Graph Algorithm
- BFS 广度优先搜索
- DFS 深度优先搜索
- Dijkstra - 最短路径算法 - TODO
- 二叉树遍历算法 - 基本思想:递归
- 最小生成树算法
- Dynamic Programming
- 基本思想
- 经典动态规划问题总结
- KMP算法
Sorting Algorithm
Sorting 分类:稳定排序和不稳定排序
- 稳定排序:排序前后两个相等的数相对位置不变,则算法稳定。
e.g. 插入排序,冒泡排序,归并排序。
- 不稳定排序:排序前后两个相等的数相对位置发生了变化,则算法不稳定。
e.g. 希尔排序、直接选择排序、堆排序、快速排序
Quick Sort
- 与merge sort一样,快速排序也是一种divide and conquer algorithm。它选择一个元素作为pivot,并围绕所选pivot对给定数组进行partition。有以下选择pivot的方式:
- 始终选择第一个元素作为pivot。
- 始终选择最后一个元素作为pivot。
- 选择一个随机元素作为pivot。
- 选择中位数作为pivot。
- quickSort 中的关键过程是 partition()。partition的目标是,给定一个数组和数组的一个元素 x 作为 pivot,将 x 放在排序数组中的正确位置并将所有较小的元素(小于 x)放在 x 之前,并将所有更大的元素(大于比 x) 在 x 之后。所有这些都应该在线性时间内完成。
- 一种partition的逻辑为,从最左边的元素开始,跟踪较小(或等于)元素的索引为 i。在遍历时,如果我们找到一个更小的元素,我们将当前元素与 arr[i] 交换。否则,我们忽略当前元素。
Pseudo Code
Quicksort(A, left, right)
if |A| = 0 then return //A is empty
end if
split = Partition(A, left, right)
Quicksort(A, left, split − 1)
Quicksort(A, split + 1, right)
Initial call: Quicksort(A, 1, n)
Partition(A, left, right)
pivot = A[right]
split = left − 1
for j=left to right−1 do
if A[j] ≤ pivot then
swap(A[j], A[split + 1])
split = split + 1
end if
end for
swap(pivot, A[split + 1]) //place pivot after A[split] (why?)
return split + 1 //the final position of pivot
Implementation by 选择最后一个元素作为pivot
# Python3 implementation of QuickSort
# Function to find the partition position
def partition(array, low, high):
# Choose the rightmost element as pivot
pivot = array[high]
# Pointer for greater element
i = low - 1
# Traverse through all elements
# compare each element with pivot
for j in range(low, high):
if array[j] <= pivot:
# If element smaller than pivot is found
# swap it with the greater element pointed by i
i = i + 1
# Swapping element at i with element at j
(array[i], array[j]) = (array[j], array[i])
# Swap the pivot element with
# e greater element specified by i
(array[i + 1], array[high]) = (array[high], array[i + 1])
# Return the position from where partition is done
return i + 1
# Function to perform quicksort
def quick_sort(array, low, high):
if low < high:
# Find pivot element such that
# element smaller than pivot are on the left
# element greater than pivot are on the right
pi = partition(array, low, high)
# Recursive call on the left of pivot
quick_sort(array, low, pi - 1)
# Recursive call on the right of pivot
quick_sort(array, pi + 1, high)
- Time complexity - O(n*logn)
Merge Sort - Divide and Conquer
- Merge sort 的工作原理是将数组划分为更小的子数组,对每个子数组进行排序,然后将已排序的子数组合并回一起以形成最终的排序数组。
- 可以把它想象成一种递归算法,不断地将数组分成两半,直到无法进一步分割为止。
Pseudo Code
step 1: start
step 2: declare array and left, right, mid variable
step 3: perform merge function.
if left == right
return
mid= (left+right)/2
mergesort(array, left, mid)
mergesort(array, mid+1, right)
merge(array, left, mid, right)
step 4: Stop
merge (A, left, right, mid)
L = A[left, mid]
R = A[mid + 1, right]
Maintain two pointers p_L, p_R, initialized to point to the first
elements of L, R, respectively
while both lists are nonempty do
Let x, y be the elements pointed to by pL, pR
Compare x, y and append the smaller to the output
Advance the pointer in the list with the smaller of x, y
end while
Append the remainder of the non-empty list to the output.
Remark: the output is stored directly in A[left, right], thus the
subarray A[left, right] is sorted after merge(A, left, right, mid).
Implementation
# Python program for implementation of MergeSort
def mergeSort(arr):
if len(arr) > 1:
# Finding the mid of the array
mid = len(arr)//2
# Dividing the array elements
L = arr[:mid]
# into 2 halves
R = arr[mid:]
# Sorting the first half
mergeSort(L)
# Sorting the second half
mergeSort(R)
i = j = k = 0
# Copy data to temp arrays L[] and R[]
while i < len(L) and j < len(R):
if L[i] <= R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
# Checking if any element was left
while i < len(L):
arr[k] = L[i]
i += 1
k += 1
while j < len(R):
arr[k] = R[j]
j += 1
k += 1
- Time complexity
- by Master Theorem
Binary Search
- Given a sorted array arr[] of n elements, write a function to search a given element x in arr[] and return the index of x in the array.
- Algorithm
- 将array升序排列。
- 将第一个元素设为low index,最后一个元素设为high index。
- 将middle index设为 (high + low) / 2
- 如果middle index上的元素就是target,则返回middle。
- 如果target小于middle index上的元素,则将 high 设为 middle - 1。
- 如果target大于middle index上的元素,则将 low 设为 middle + 1。
- 重复3-6直至找到target或者no such target found when no more element。
Recursive Method
# Python3 Program for recursive binary search.
# Returns index of x in arr if present, else -1
def binarySearch(arr, l, r, x):
# Check base case
if r >= l:
mid = l + (r - l) // 2
# If element is present at the middle itself
if arr[mid] == x:
return mid
# If element is smaller than mid, then it
# can only be present in left subarray
elif arr[mid] > x:
return binarySearch(arr, l, mid-1, x)
# Else the element can only be present
# in right subarray
else:
return binarySearch(arr, mid + 1, r, x)
else:
# Element is not present in the array
return -1
Iteration Method
def binarySearch(v, To_Find):
lo = 0
hi = len(v) - 1
# This below check covers all cases, so need to check
# for mid=lo-(hi-lo)/2
while hi - lo > 1:
mid = (hi + lo) // 2
if v[mid] < To_Find:
lo = mid + 1
else:
hi = mid
if v[lo] == To_Find:
print("Found At Index", lo)
elif v[hi] == To_Find:
print("Found At Index", hi)
else:
print("Not Found")
- Running Time
The time complexity of the binary search algorithm is O(log n). The best-case time complexity would be O(1) when the central index would directly match the desired value. Binary search worst case differs from that. The worst-case scenario could be the values at either extremity of the list or values not in the list.
Insertion Sort
过程
该算法通过重复从数组的未排序部分中取出一个元素并将其插入数组已排序部分中的正确位置来对项目数组进行排序。
Input: Array A
n = length(A)
1. 外部 for loop 从索引“1”开始并运行“n-1”次迭代,其中“n”是数组的长度。
2. 内部 while loop从外部 for 循环的当前索引 i 开始,
并将每个元素与其左邻居进行比较。 如果一个元素小于它的左邻居,则交换元素。
3. 内部 while loop 继续将元素向左移动,直到它小于其左侧的元素停止while loop。
4. 内部 while loop 完成后,当前索引"j"处的元素即为数组已排序部分中的正确位置。
5. 外层的 for 循环继续遍历数组,直到所有元素都在正确的位置并且数组已完全排序。
insertionSort(A: list of sortable items)
n = length(A)
for i = 1 to n - 1 do
j = i
while j > 0 and A[j-1] > A[j] do
swap(A[j], A[j-1])
j = j - 1
end while
end for
end algorithm
- Time complexity - O(N^2)
Bubble Sort
- 比较相邻的元素,如果第一个比第二个大就进行交换,对每一对相邻元素做同样的工作。排序算法稳定,时间复杂度 O(n2),空间复杂度 O(1)。
Graph Algorithm
BFS 广度优先搜索
- Breadth-first search (BFS): explore G starting from s outward in all possible directions, adding reachable nodes one layer at a time.
- First add all nodes that are joined by an edge to the source: these nodes form the first layer. If G is unweighted, these are the nodes at distance 1 from s.
- Then add all nodes that are joined by an edge to a node in the first layer: these nodes form the second layer.
- If G is unweighted, these are the nodes at distance 2 from s.
- And so on and so forth.
- Facts of BFS
- is the set of nodes that are at distance from s. Equivalently, the length of the shortest s-v path for all equals .
Pseudo Code with explanation
BFS(G = (V, E), s ∈ V )
# discovered array表示一个vertex是否已被访问过
# 如果被访问过则记为1,没有则记为0
array discovered[V] initialized all vertices to 0 ---- O(n)
# The dist array记录了source到每一个vertex的最短路径
# 除了source自身,所有vertex的初始路径都为无穷大
array dist[V] initialized to infinity array ---- O(n)
# The parent array记录了每一个vertex的parent
parent[V] initialized to NIL ---- O(n)
# q记录正在被处理或者当前被访问的vertices
queue q ---- O(1)
# Initialize with source ---- O(1)
discovered[s] = 1
dist[s] = 0
parent[s] = NIL
# 因为source是第一个visited的vertex, 将当前要处理的vertex s加入queue
enqueue(q, s) ---- O(1)
# 当当前需要处理的queue不为空时
while size(q)>0 do
# 取出第一个vertex 'u'并检索所有从'u'延伸出的edges(u, v)
u =dequeue(q)
# 对于每一个与'u'相连的vertex 'v'
for (u, v) ∈ E do
# 如果'v'还未被访问过,即discovered[v] == 0
if discovered[v] == 0 then
# 则标记'v'为已访问过,即discovered[v] == 1
discovered[v] = 1
# 对于unweighted graph,source到'v'的距离 = source到'u'的距离 + 1
dist[v] = dist[u] + 1
# 更新当前vertex 'v'的parent为'u'
parent[v] = u
# 把'v'添加进q作为下一个layer要检索的vertex
enqueue(q, v)
# 如果vertex'v'已被访问过,则进入下一个for loop检索未被访问的'v'
end if
end for
end while
- Time complexity analysis
前面初始化部分的时间复杂度为O(n),因为只需要遍历每个节点一次。对于while循环来讲,q的最大数量就是vertices的总数,所以while loop不会超过|V|次迭代,即O(|V|)。对于for loop,检索每一个和’u’临近的节点v,显然不会超过|E|次迭代,这一步的时间实际上与’u’的degrees成比例。因此,检查图中与所有顶点相邻的所有边所花费的总时间与所有顶点的degrees之和成正比,即 2|E| (因为每条边都对恰好两个顶点)。
综上所述,BFS的时间复杂度为O(|V| + |E|)。
Implementation - without tracking the distance and parent
# Python3 Program to print BFS traversal
# from a given source vertex. BFS(int s)
# traverses vertices reachable from s.
from collections import defaultdict
# This class represents a directed graph
# using adjacency list representation
class Graph:
# Constructor
def __init__(self):
# default dictionary to store graph
self.graph = defaultdict(list)
# function to add an edge to graph
def addEdge(self, u, v):
self.graph[u].append(v)
# Function to print a BFS of graph
def BFS(self, s):
# Mark all the vertices as not visited
visited = [False] * (max(self.graph) + 1)
# Create a queue for BFS
queue = []
# Mark the source node as
# visited and enqueue it
queue.append(s)
visited[s] = True
while queue:
# Dequeue a vertex from
# queue and print it
s = queue.pop(0)
print(s, end=" ")
# Get all adjacent vertices of the
# dequeued vertex s. If a adjacent
# has not been visited, then mark it
# visited and enqueue it
for i in self.graph[s]:
if visited[i] == False:
queue.append(i)
visited[i] = True
应用
- 1. A natural way to solve 2-colorability(bipartite-ness)
- If a graph contains an odd-length cycle, then it is not 2-colorable.
bipartite(G = (V,E)) 1. Start BFS from any vertex; color it red. 2. Color white all nodes in the first layer L1 of the BFS tree. If there is an edge between two nodes in L1, output no and stop. 3. Otherwise, continue from layer L1, coloring red the vertices in even layers and white in odd layers. 4. If BFS terminates and all nodes in V have been explored (hence 2-colored), output yes.
- 2. 找到所有的SCC
AllConnectedComponents(G = (V, E)) 1. Start with an arbitrary node s; run BFS(G, s) and output the resulting BFS tree as one connected component. 2. Continue with any node u that has not been visited by BFS(G, s); run BFS from u and output the resulting BFS tree as one connected component. 3. Repeat until all nodes in V have been visited.
DFS 深度优先搜索
- Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.
- Basic idea: Start from the root or any arbitrary node and mark the node and move to the adjacent unmarked node and continue this loop until there is no unmarked adjacent node. Then backtrack and check for other unmarked nodes and traverse them. Finally, print the nodes in the path.
Algorithm with explanation
# array explored来记录一个vertex是否已被访问
DFS(G = (V, E))
# Initialize the global variable time to 0
time = 0
# 对于每一个V中的vertex,标记每一个vertex的初始状态为未被访问
for u ∈ V do
explored[u] = 0
end for
# 对于每一个V中的vertex u
for u ∈ V do
# 如果'u'尚未被访问,则call检索function search(u)
if explored[u] == 0 then
Search(u)
end if
end for
Search(u)
previsit(u)
#更改'u'的标记为已被访问
explored[u] = 1
# 对于每一个与'u'相连的vertex 'v'
for (u, v) ∈ E do
# 如果'v'未被访问,则call search(v)
if explored[v] == 0 then
Search(v)
end if
end for
postvisit(u)
# start(u) is the time when u is pushed in the stack
# finish(u) is the time when u is popped from the stack
previsit(u)
time += 1
start(u) = time
push(stack, u)
postvisit(u)
time += 1
finish(u) = time
pop(stack, u)
Here is a flow chart of how this algorithm works.
- Time complexity analysis
The time complexity of the DFS algorithm is O(|V| + |E|), where |V| is the number of vertices in the graph, and |E| is the number of edges in the graph. The initialization of the 'explored' array takes O(|V|) time, and the Search function is called for each vertex in the graph, which takes O(|V|) time. The total time taken to examine all edges in the graph is proportional to the sum of the degrees of all vertices, which is 2|E|. Therefore, the time complexity of examining all edges in the graph is O(|E|). Overall, the time complexity of the DFS algorithm is O(|V| + |E|).
Implementation
# Python3 program to print DFS traversal
# from a given graph
from collections import defaultdict
# This class represents a directed graph using
# adjacency list representation
class Graph:
# Constructor
def __init__(self):
# default dictionary to store graph
self.graph = defaultdict(list)
# function to add an edge to graph
def addEdge(self, u, v):
self.graph[u].append(v)
# A function used by DFS
def DFSUtil(self, v, visited):
# Mark the current node as visited
# and print it
visited.add(v)
print(v, end=' ')
# Recur for all the vertices
# adjacent to this vertex
for neighbour in self.graph[v]:
if neighbour not in visited:
self.DFSUtil(neighbour, visited)
# The function to do DFS traversal. It uses
# recursive DFSUtil()
def DFS(self, v):
# Create a set to store visited vertices
visited = set()
# Call the recursive helper function
# to print DFS traversal
self.DFSUtil(v, visited)
- 应用
- 1. 检测cycle: G = (V, E) has a cycle if and only if DFS(G) yields a back edge.
- 2. Topological sorting in DAGs (DAG - Directed acyclic graph)
Dijkstra - 最短路径算法 - TODO
- Given a graph and a source vertex in the graph, find the shortest paths from the source to all vertices in the given graph.
Pseudocode with an explanation - Greedy Algorithm 贪心算法
Dijkstra-v1(G = (V, E, w), s ∈ V)
Initialize(G, s)
# Initialize the set S with only source vertex s.
S = {s}
# This loop continues until S contains all vertices in the graph.
while S != V do
# This loop iterates over all vertices x
# that are not in S and have at least one edge from S.
For every x ∈ V − S with at least one edge from S compute
d(x) = min {dist[u] + w_{ux}}
u ∈ S,(u,x) ∈ E
Select v such that d(v) = min d(x)
x ∈ V−S
S = S ∪ {v}
dist[v] = d(v)
prev[v] = u
end while
# This function initializes the distances and predecessors of all vertices in the graph.
# It sets the distance of the source vertex s to 0 and the distance of all other vertices to infinity.
Initialize(G, s)
for v ∈ V do
dist[v] = ∞
prev[v] = NIL
end for
dist[s] = 0
二叉树遍历算法 - 基本思想:递归
前序遍历 - Preorder - 根结点 ---> 左子树 ---> 右子树
def preorderTraversal(root):
if root != None:
# Visit the root node
print(root.val)
# Recursively traverse the left subtree
preorderTraversal(root.left)
# Recursively traverse the right subtree
preorderTraversal(root.right)
中序遍历 - Inorder - 左子树 ---> 根结点 ---> 右子树
def inorderTraversal(root):
if root != None:
# Recursively traverse the left subtree
inorderTraversal(root.left)
# Visit the root node
print(root.val)
# Recursively traverse the right subtree
inorderTraversal(root.right)
后序遍历 - Postorder - 左子树 ---> 右子树 ---> 根结点
def postorderTraversal(root):
if root != None:
# Recursively traverse the left subtree
postorderTraversal(root.left)
# Recursively traverse the right subtree
postorderTraversal(root.right)
# Visit the root node
print(root.val)
最小生成树算法
- 最小权重生成树(minimum weight spanning tree)的简称。对于有 n 个结点的原图,生成原图的极小连通子图,使其包含原图中的所有 n 个结点,并且有保持图连通的最少的边,或者权重和最小的边。
- 普里姆算法:取图中任意一个顶点 v 作为生成树的根,之后往生成树上添加新的顶点 w。在添加的顶点 w 和已经在生成树上的顶点v 之间必定存在一条边,并且该边的权值在所有连通顶点 v 和 w 之间的边中 取值最小。之后继续往生成树上添加顶点,直至生成树上含有 n-1 个顶点为止。
- 克鲁斯卡尔算法:先构造一个只含 n 个顶点的子图 SG,然后从权值最小的边开始,若它的添加不使 SG 中产生回路,则在 SG 上加上这条边,如此重复,直至加上 n-1 条边为止。
Dynamic Programming
基本思想
讲一个问题拆解成子问题,直到子问题可以解决,把子问题的答案保存起来,从而减少重复计算。随后根据子问题的答案进行反推,得到原问题解。
经典动态规划问题总结
KMP算法
前缀:不包含尾字符的所有子串
后缀:不包含首字符的所有子串
KMP:找到最长的相等前后缀
前缀表
e.g. aabaaf
例题:匹配串:aabaabaaf;模式串:aabaaf
从匹配串的首开始匹配,到模式串中的f出现时,与匹配串中的b不匹配, 那么不需要一个一个向后移动匹配串中的指针,而是直接跳到模式串中最长相等前缀的下一个位置继续匹配,此题中相当于index=2,即b字符的位置。