Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

General performance optimizations

See original GitHub issue

Tasks

For the past couple weeks I’ve been investigating datashader’s performance and how we can improve upon it. I’m now documenting my remaining tasks, in case I get pulled away on a different project. Below is a list of tasks/issues that I’m currently addressing:

Extend the filetimes.py and filetimes.yml benchmarking environment to find the optimal file format for datashader/dask (Issue #129)
Benchmark numba compared to handwritten ufuncs in vaex (Issue #310)
Gather perf information about dask locking behavior (Issue #314)
Investigate why Cachey leads to better runtime performance for repeat datashader aggregations
Document memory usage findings (Issue #305)
Investigate how datashader’s performance changes with data types (doubles vs floats, etc) (Issue #305)
Verify that repeat aggregations no longer depend on file format (Issue #129)
Investigate distributed scheduler vs threaded scheduler for single-machine use case (#331, #332, #334)
Identify issues hindering distributed scheduler from performing more effectively - credit goes to @martindurant ( #332, #336, #337 )

Performance takeaways

Below are some performance-related takeaways that fell out of my experiments and optimizations with datashader and dask:

General

Use the latest version of numba (>=0.33). This includes bugfixes providing ~3-5x speedups for many cases (numba/numba#2345, numba/numba#2349, numba/numba#2350)
When interacting with data on the filesystem, store it in the Apache Parquet format when possible. Snappy compression should be used when writing out parq files, and the data should rely on categorical dtypes (when possible) before writing the parq files, as parquet supports categoricals in its binary format (#129)
Use the categorical dtype for columns with data that takes on a limited, fixed number of possible values. Categorical columns use a more memory-efficient data representation and are optimized for common operations such as sorting and finding uniques. Example of how to convert a column to the categorical dtype:
```
df[colname] = df[colname].astype('category')
```
There is promise with enhancing datashader’s performance even further by using single-precision floats (np.float32) instead of double-precision floats (np.float64). In past experiments this cut down the time to load data off of disk (assuming the data was written out in single-precision float) as well as datashader’s aggregation times. Care should be taken using this approach, as using single-precision (in any software application, not just datashader) leads to different numerical results than double-precision (#305)
When using pandas dataframes, there will be a speedup if you cache the cvs.x_range and cvs.y_range variables, and pass them back into the Canvas() constructor during future instantiations. As of #344 , dask dataframes automatically memoize the x_range and y_range calculations; this works for dask because dask’s dataframes are immutable, unlike pandas (#129)

Single machine

A rule-of-thumb for the number of partitions to use while converting pandas dataframes into dask dataframes is multiprocessing.cpu_count(). This allows dask to use one thread per core for parallelizing computations (#129)

When the entire dataset fits into memory at once, persist the dataframe as a Dask dataframe prior to passing it into datashader (#129). One example of how to do this:

from dask import dataframe as dd
import multiprocessing
dask_df = dd.from_pandas(df, npartitions=multiprocessing.cpu_count()).persist()
...
cvs = datashader.Canvas(...)
agg = cvs.points(dask_df, ...)

When the entire dataset doesn’t fit into memory at once, use the distributed scheduler (#331) without persisting (there is an outstanding issue #332 that illustrates the problem with the distributed scheduler + persist).

Multiple machines

Use the distributed scheduler to farm computations out to remote machines. client.persist(dask_df) may help in certain cases, but be sure to include distributed.wait() to block until the data is read into RAM on each worker (#332)

Issue Analytics

State:
Created 6 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

2reactions

gbrenercommented, Apr 20, 2017

Benchmarking with numba exposed a bug affecting performance under multithreaded workloads. Once it is fixed, there should be a significant performance increase to datashader (at least 3x in many cases): https://github.com/numba/numba/issues/2345

0reactions

jbednarcommented, May 8, 2017

Thanks for all the great work, @gbrener! Reflecting these recommendations into our documentation is now on our to-do list.

Top Results From Across the Web

Performance Optimization - an overview

Performance optimization, also known as “performance tuning”, is usually an iterative approach to making and then monitoring modifications to an application and ...

General Performance Optimization and Tuning

The Ethernet Network Adapter general tuning can be performed during installation by modifying some of Windows registries as explained in Registry Tuning ......

General Performance Optimization Guide

Learn how to optimize your gaming performance with this guide, including how to increase FPS and reduce motion sickness.

18 Tips for Website Performance Optimization

Optimizing and speeding up your website is always something that should be top priority. Check out these 18 tips for website performance ......

6 Performance Optimization

The following is a general checklist to keep in mind when developing Java applications. Do not make code more complicated than necessary. Write...