ISOCPP std-proposals List: [std-proposals] Enhanced binomial min heap as high speed priority

From: David G. Pickett <dgpickett_at_[hidden]>
Date: Wed, 14 Feb 2024 19:02:57 +0000 (UTC)

A binomial heap, min or max, nay be the fastest sort and priority queue. It has very high locality of reference, averaging 1/2 log2(N) nodes. I implemented one using a simple node structure of templated value, node pointers next and child, and char rank. Since rank is the tree size 2^rank, a char is quite adequate. The root is a rank ordered, forward linked list of partially sorted binomial trees, a forest. In comparison to online reference implementations of binomial heap, I made a few improvements. The locality of reference comes from pushing losers into the basement to be utterly ignored until the winner is removed by pop() as min or max.

My first improvement is that I defer final determination of min or max out of push() into to top()/pop(), as the user may push() many times without any pop() or top() activity, so this improves the already low O(1) push() speed. Since many problems are solved without emptying the heap (Dijkstra GPS, Prim MST, running median), low push() cost with higher pop()/top() cost is usually a substantial time savings.

My second improvement is, in top() and pop(), to force merge all trees in the root forest into one tree, possibly assigning a fake higher rank to a smaller tree with a better key. For a forest of N trees, the loser trees are pushed (N-1)/2 on average levels down. This has very small cost and saves future sorting effort, narrowing near future root forest size, another enhancement if the heap is discarded not empty. Just finding the min or max is slightly less effort than merging, but merely finding saves no future sort effort.
I keep the root forest of trees in a forward linked list sorted by ascending rank. Each single node insert is a rank 0 tree (2^0 elements). For a heap of N elements, there is a root tree for each one bit in the binary value of N, like for n=7 root contains rank 0, 1, and 2 trees. (Any of my forced merges may muddy that distribution a bit. If you top() a heap of 7, it merges rank 0 with 1 making a fake rank 2 that it merges with the rank 2 into a fake rank 3 single root tree containing 7 not 8 elements. It still has greater size than any rank 2 tree and smaller size than any rank 4 tree, but I am sure it gives math purists a rash! :D)
Compared to array based priority queues and to any BST, the push speed is O(1) not O(log N), and the locality of reference in mainly just 1/2 log2(N) root nodes/values on average. Many array based priority queues also swap values up to log(N) times, while a tree based heap does not do repetitive swaps. Array based queues have a copy cost to expand if the original or last capacity is inadequate, but a heap does not.
In my simple test version, I did not use a comparison or less object, just requiring the type to support or override operator<.

Heaps like this also have a meld cost of O(log N) of the smaller heap, as it only requires the root trees of the smaller to be merged into the forest of the larger. Array based heaps would be O(N). A copy constructor with deep copy would be O(N), as all nodes and values need to be copied. A search, delete, or key update would be O(N), as half the nodes on average need to be compared. Strict Fibonacci heap designs, a variation on the binomial heap, require delete and update. I have yet to discern if they search by key or retain a reference provided at push() time. I am not sure such operations are common. They are often better supported by using an AVL BST as a min/max heap/priority queue, where the search is O(log N), but of course the push is then also O(log N), depending on N and how often you search. The JAVA PriorityQueue has a O(N) contains() and remove(), and has copy expense to expand if the original or last size in inadquate. https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/PriorityQueue.html My JAVA binomial heap outperforms the JAVA PriorityQueue as well, so I am trying to introduce it there!
As a sort, it is probably the fastest sort, because the high locality of reference enhances cache hits, and reduces VM page faults. In base form, it does not provide a stable sort, but it could be enhanced with a hidden secondary key of unsigned long push sequence added to the node. While this would increase the memory footprint, the locality of reference makes this still a likely winner in speed in a modern CPU and system, compared to either any array based priority queue similarly modified to be a stable sort or any BST (where equal values can be pushed right for a stable sort).

Received on 2024-02-14 19:03:00