log in | about 

I have written a rather thorough description of algorithms that one can use to merge sorted lists or arrays of integers. Feel free to vote for this description on Quora. Here I decided to duplicate my answer (slightly revised and improved).

The choice of the merging algorithm depends on (1) the distribution of data (2) the hardware that you use. In that, there are several major approaches or a combination thereof that can be used:

  1. Classic k-way intersection with the priority queue. I believe it's described in Knuth. All the lists should be sorted in advance. You read the smallest values from each list and put them into the queue. More specifically, you put the pair (value, list id). Then you extract the smallest value using the queue and output it. If it came from list K, you extract the smallest value from the list K and push the smallest pair (value, K) to the priority queue (while simultaneously removing it from list K) . And so on so forth.

    Priority queue is not especially fast, in particular, because working with a queue entails a lot of branching (can be slow on both CPUs and GPUs due to branch misprediction). Therefore, other approaches may be more efficient sometimes.

  2. Pairwise merge sort. It is a well-known algorithm, so I won't describe it here. However, if you merge two lists, where one is much shorter than other methods can do better.

    In particular, you can iterate through a shorter list and find an insertion point in the large list using an exponential search (a fancier and more efficient version of the binary search). We used this approach in the context of list intersection, but the same method works well for unions.

  3. Using bitmasks. If your lists are represented as bitmasks, merge is super fast. Extraction of the result can be a bit tricky. However, using modern CPU instructions, you can do it rather easily. Daniel Lemire covered the topic of bitmap decoding extensively. Alternatively, one can use hashing.

    Encoding the whole list as a bitmap can be wasteful. This is why, people use some hybrid approaches where only a part of the list is encoded as a bitmap. If you have a sorted list as an input, it can actually may make sense to convert it first to a bitmap and then carry out a union/intersection using the bitmap.

  4. Using the ScanCount algorithm. Imagine that the minimum number is zero and the maximum number is M. You can create a table with M+1 elements that are all set to zero initially. To carry out a merge, you have to iterate over lists that you merge. If, during the iteration you encounter the number X, you set the element X in the table to one (or increment it if you need to know the number of lists that contain the number). Finally, you iterate over the (M+1)-element table and check which elements are non-zero. Bonus: input lists do not have to be sorted!

    The table may have byte or bit elements. Zeroing table elements before merging can be done in several ways. One very simple approach is via the library function memset (it's memset in C/C++ may have different name in other languages). Though this seems to be naive, memset can zero about 10 billion integers per second for cache-resident data. See the test program here.
    ScanCount can be surprisingly efficient.

    To fit data into cache, you need to reuse the same small table M ~ 60-100K elements. In practice, of course your numbers will be larger than M. However, you can split your inputs and process each split separately.

  5. To conclude, I would mention that there are more advanced so-called adaptive algorithms, which I don't remember off the top of my head. Google something like "adaptive list intersection", or "adaptive list merging".