Skip to content

Benchmarking packingcubes

Setup

We use the code in benchmarks/performance to run our benchmarks. It's similar in functionality to the timeit module (which it was based on), with further customization.

For each search object, we construct a dataset of size \(n\) with random data and pass it to the creation method. Psuedocode:

dataset = InMemory(positions=random_positions)
for name, creation_method in creation_dict.items():
    search_objects[name], bench[name] = benchmark(
        creation_method,
        dataset.copy()
    )
Since the packingcubes versions actually sort the data, we reset the dataset before each timing run. This is excluded from the results.

We then generate a number of random spheres (10 for now) that overlap the dataset (they do not need to be entirely contained). These are our "search balls" and we set their radii such that \(\sim m\) particles are contained:

for i in range(10):
    centers[i] = random_point_in_dataset(dataset)
    radii[i], actual_numbers[i] = smallest_radii_containing_points(dataset, center, m)

Finally, for every search method, we average the time taken to search over all search balls:

for name, (so_name, search_method) in search_dict.items():
    search_results, bench[name] = benchmark(
        search_objects[so_name].search_method, 
        (centers, radii)
    )
    if can_verify(search_method):
        verify_search(search_results, actual_numbers)
We do the equivalent for query type methods, except we use 100 particles.

For some of the search methods (anywhere a strict search is available), we verify that we return the correct number of particles as well.

We benchmark across the following dataset sizes and search ball sizes:

\(m \\ n\) \(10^{4}\) \(10^{5}\) \(10^{6}\) \(10^{7}\) \(10^{8}\)
\(10^{3}\)
\(10^{4}\) 1
\(10^{5}\) 1 1

We benchmark the following functions:

  • Creation

    Search Object (Library)
    PackedTree
    Cubes
    OpTree
    KDTree (SciPy)
  • Search

    Name Search Object Function/Method
    packed-search PackedTree get_particle_indices_in_sphere
    packli-search2 PackedTree get_particle_index_list_in_sphere
    packnumb-search2 PackedTree get_particle_indices_in_sphere3
    cubes-search2 ParticleCubes get_particle_indices_in_sphere
    cubeli-search ParticleCubes get_particle_index_list_in_sphere
    optree-search OpTree query_ball_point
    kdtree-search4 KDTree query_ball_point
  • Query

    Name5 Search Object Function/Method
    packed2 PackedTree get_closest_particles
    packed (jitted) PackedTree get_closest_particles3
    optree Optree query
    kdtree KDTree query

Finally, we show results across 3 different machines:

Name Details
Laptop 8-core laptop, used for development
Stampede3 80 cores per node, NVDIMM partition on the TACC Stampede3 cluster (link)
CI 4-core VM. The ubuntu-latest runner on GitHub (link)

Results

Raw Timing

Here we show the raw timing results. For all benchmarks, we do an untimed trial run first, to sort out any precompiling/warmup.

  • Creation Creation - Raw Times

  • Search Search - Raw Times

  • Query Query - Raw Times

Expected Scaling

Here we show the timing for the various tasks divided by the expected growth rate, normalized at \(n=10^6\).

Consider, for example, tree creation, which involves (and is dominated by) a sorting of the data. Sorting is classically an \(O(n\log n)\) process, and, if we normalize and divide out \(n\log n\), we would expect a flat line. We can see this behavior in the first panel for the PackedTree. The KDTree does a little worse than \(n \log n\) (the slight upwards trend); Cubes does slightly better6.

  • Creation Creation - Expected

  • Search Search - Expected

  • Query Query - Expected

VS SciPy

Here we normalize to the SciPy results from KDTree.

  • Creation Creation - vs. SciPy

  • Search Search - vs. SciPy

  • Query Query - vs. SciPy

Latest CI Run

In the following images, we show the latest CI run only, highlighting the most up-to-date performance:

  • Creation times CI Creation - Raw Times

  • Search times CI Search - Raw Times

  • Query times CI Query - Raw Times


  1. For these cases, the search result should include the entire dataset. This is trivially easy to do with PackedTrees (it's effectively just np.arange(len(ds))). It's slightly more complicated for ParticleCubes, since you need to append multiple cubes, but there's still no tree traversal. This explains the extremely small results for those cases. 

  2. Used in regression testing only 

  3. Called from within another jitted function 

  4. Used in comparison benchmarking only 

  5. Names in caption. Actual option names are generally name + 'q', e.g. --packq-search or --packnumbq-search. Use the help option (-h or --help for more. 

  6. This is much more likely due more to the normalization and/or parallelization then because we've achieved an improvement over general search.