jpayne@69: Metadata-Version: 2.1 jpayne@69: Name: threadpoolctl jpayne@69: Version: 3.5.0 jpayne@69: Summary: threadpoolctl jpayne@69: Home-page: https://github.com/joblib/threadpoolctl jpayne@69: License: BSD-3-Clause jpayne@69: Author: Thomas Moreau jpayne@69: Author-email: thomas.moreau.2010@gmail.com jpayne@69: Requires-Python: >=3.8 jpayne@69: Description-Content-Type: text/markdown jpayne@69: Classifier: Intended Audience :: Developers jpayne@69: Classifier: License :: OSI Approved :: BSD License jpayne@69: Classifier: Programming Language :: Python :: 3 jpayne@69: Classifier: Programming Language :: Python :: 3.8 jpayne@69: Classifier: Programming Language :: Python :: 3.9 jpayne@69: Classifier: Programming Language :: Python :: 3.10 jpayne@69: Classifier: Programming Language :: Python :: 3.11 jpayne@69: Classifier: Programming Language :: Python :: 3.12 jpayne@69: Classifier: Topic :: Software Development :: Libraries :: Python Modules jpayne@69: jpayne@69: # Thread-pool Controls [![Build Status](https://dev.azure.com/joblib/threadpoolctl/_apis/build/status/joblib.threadpoolctl?branchName=master)](https://dev.azure.com/joblib/threadpoolctl/_build/latest?definitionId=1&branchName=master) [![codecov](https://codecov.io/gh/joblib/threadpoolctl/branch/master/graph/badge.svg)](https://codecov.io/gh/joblib/threadpoolctl) jpayne@69: jpayne@69: Python helpers to limit the number of threads used in the jpayne@69: threadpool-backed of common native libraries used for scientific jpayne@69: computing and data science (e.g. BLAS and OpenMP). jpayne@69: jpayne@69: Fine control of the underlying thread-pool size can be useful in jpayne@69: workloads that involve nested parallelism so as to mitigate jpayne@69: oversubscription issues. jpayne@69: jpayne@69: ## Installation jpayne@69: jpayne@69: - For users, install the last published version from PyPI: jpayne@69: jpayne@69: ```bash jpayne@69: pip install threadpoolctl jpayne@69: ``` jpayne@69: jpayne@69: - For contributors, install from the source repository in developer jpayne@69: mode: jpayne@69: jpayne@69: ```bash jpayne@69: pip install -r dev-requirements.txt jpayne@69: flit install --symlink jpayne@69: ``` jpayne@69: jpayne@69: then you run the tests with pytest: jpayne@69: jpayne@69: ```bash jpayne@69: pytest jpayne@69: ``` jpayne@69: jpayne@69: ## Usage jpayne@69: jpayne@69: ### Command Line Interface jpayne@69: jpayne@69: Get a JSON description of thread-pools initialized when importing python jpayne@69: packages such as numpy or scipy for instance: jpayne@69: jpayne@69: ``` jpayne@69: python -m threadpoolctl -i numpy scipy.linalg jpayne@69: [ jpayne@69: { jpayne@69: "filepath": "/home/ogrisel/miniconda3/envs/tmp/lib/libmkl_rt.so", jpayne@69: "prefix": "libmkl_rt", jpayne@69: "user_api": "blas", jpayne@69: "internal_api": "mkl", jpayne@69: "version": "2019.0.4", jpayne@69: "num_threads": 2, jpayne@69: "threading_layer": "intel" jpayne@69: }, jpayne@69: { jpayne@69: "filepath": "/home/ogrisel/miniconda3/envs/tmp/lib/libiomp5.so", jpayne@69: "prefix": "libiomp", jpayne@69: "user_api": "openmp", jpayne@69: "internal_api": "openmp", jpayne@69: "version": null, jpayne@69: "num_threads": 4 jpayne@69: } jpayne@69: ] jpayne@69: ``` jpayne@69: jpayne@69: The JSON information is written on STDOUT. If some of the packages are missing, jpayne@69: a warning message is displayed on STDERR. jpayne@69: jpayne@69: ### Python Runtime Programmatic Introspection jpayne@69: jpayne@69: Introspect the current state of the threadpool-enabled runtime libraries jpayne@69: that are loaded when importing Python packages: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import threadpool_info jpayne@69: >>> from pprint import pprint jpayne@69: >>> pprint(threadpool_info()) jpayne@69: [] jpayne@69: jpayne@69: >>> import numpy jpayne@69: >>> pprint(threadpool_info()) jpayne@69: [{'filepath': '/home/ogrisel/miniconda3/envs/tmp/lib/libmkl_rt.so', jpayne@69: 'internal_api': 'mkl', jpayne@69: 'num_threads': 2, jpayne@69: 'prefix': 'libmkl_rt', jpayne@69: 'threading_layer': 'intel', jpayne@69: 'user_api': 'blas', jpayne@69: 'version': '2019.0.4'}, jpayne@69: {'filepath': '/home/ogrisel/miniconda3/envs/tmp/lib/libiomp5.so', jpayne@69: 'internal_api': 'openmp', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libiomp', jpayne@69: 'user_api': 'openmp', jpayne@69: 'version': None}] jpayne@69: jpayne@69: >>> import xgboost jpayne@69: >>> pprint(threadpool_info()) jpayne@69: [{'filepath': '/home/ogrisel/miniconda3/envs/tmp/lib/libmkl_rt.so', jpayne@69: 'internal_api': 'mkl', jpayne@69: 'num_threads': 2, jpayne@69: 'prefix': 'libmkl_rt', jpayne@69: 'threading_layer': 'intel', jpayne@69: 'user_api': 'blas', jpayne@69: 'version': '2019.0.4'}, jpayne@69: {'filepath': '/home/ogrisel/miniconda3/envs/tmp/lib/libiomp5.so', jpayne@69: 'internal_api': 'openmp', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libiomp', jpayne@69: 'user_api': 'openmp', jpayne@69: 'version': None}, jpayne@69: {'filepath': '/home/ogrisel/miniconda3/envs/tmp/lib/libgomp.so.1.0.0', jpayne@69: 'internal_api': 'openmp', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libgomp', jpayne@69: 'user_api': 'openmp', jpayne@69: 'version': None}] jpayne@69: ``` jpayne@69: jpayne@69: In the above example, `numpy` was installed from the default anaconda channel and comes jpayne@69: with MKL and its Intel OpenMP (`libiomp5`) implementation while `xgboost` was installed jpayne@69: from pypi.org and links against GNU OpenMP (`libgomp`) so both OpenMP runtimes are jpayne@69: loaded in the same Python program. jpayne@69: jpayne@69: The state of these libraries is also accessible through the object oriented API: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import ThreadpoolController, threadpool_info jpayne@69: >>> from pprint import pprint jpayne@69: >>> import numpy jpayne@69: >>> controller = ThreadpoolController() jpayne@69: >>> pprint(controller.info()) jpayne@69: [{'architecture': 'Haswell', jpayne@69: 'filepath': '/home/jeremie/miniconda/envs/dev/lib/libopenblasp-r0.3.17.so', jpayne@69: 'internal_api': 'openblas', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libopenblas', jpayne@69: 'threading_layer': 'pthreads', jpayne@69: 'user_api': 'blas', jpayne@69: 'version': '0.3.17'}] jpayne@69: jpayne@69: >>> controller.info() == threadpool_info() jpayne@69: True jpayne@69: ``` jpayne@69: jpayne@69: ### Setting the Maximum Size of Thread-Pools jpayne@69: jpayne@69: Control the number of threads used by the underlying runtime libraries jpayne@69: in specific sections of your Python program: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import threadpool_limits jpayne@69: >>> import numpy as np jpayne@69: jpayne@69: >>> with threadpool_limits(limits=1, user_api='blas'): jpayne@69: ... # In this block, calls to blas implementation (like openblas or MKL) jpayne@69: ... # will be limited to use only one thread. They can thus be used jointly jpayne@69: ... # with thread-parallelism. jpayne@69: ... a = np.random.randn(1000, 1000) jpayne@69: ... a_squared = a @ a jpayne@69: ``` jpayne@69: jpayne@69: The threadpools can also be controlled via the object oriented API, which is especially jpayne@69: useful to avoid searching through all the loaded shared libraries each time. It will jpayne@69: however not act on libraries loaded after the instantiation of the jpayne@69: `ThreadpoolController`: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import ThreadpoolController jpayne@69: >>> import numpy as np jpayne@69: >>> controller = ThreadpoolController() jpayne@69: jpayne@69: >>> with controller.limit(limits=1, user_api='blas'): jpayne@69: ... a = np.random.randn(1000, 1000) jpayne@69: ... a_squared = a @ a jpayne@69: ``` jpayne@69: jpayne@69: ### Restricting the limits to the scope of a function jpayne@69: jpayne@69: `threadpool_limits` and `ThreadpoolController` can also be used as decorators to set jpayne@69: the maximum number of threads used by the supported libraries at a function level. The jpayne@69: decorators are accessible through their `wrap` method: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import ThreadpoolController, threadpool_limits jpayne@69: >>> import numpy as np jpayne@69: >>> controller = ThreadpoolController() jpayne@69: jpayne@69: >>> @controller.wrap(limits=1, user_api='blas') jpayne@69: ... # or @threadpool_limits.wrap(limits=1, user_api='blas') jpayne@69: ... def my_func(): jpayne@69: ... # Inside this function, calls to blas implementation (like openblas or MKL) jpayne@69: ... # will be limited to use only one thread. jpayne@69: ... a = np.random.randn(1000, 1000) jpayne@69: ... a_squared = a @ a jpayne@69: ... jpayne@69: ``` jpayne@69: jpayne@69: ### Switching the FlexiBLAS backend jpayne@69: jpayne@69: `FlexiBLAS` is a BLAS wrapper for which the BLAS backend can be switched at runtime. jpayne@69: `threadpoolctl` exposes python bindings for this feature. Here's an example but note jpayne@69: that this part of the API is experimental and subject to change without deprecation: jpayne@69: jpayne@69: ```python jpayne@69: >>> from threadpoolctl import ThreadpoolController jpayne@69: >>> import numpy as np jpayne@69: >>> controller = ThreadpoolController() jpayne@69: jpayne@69: >>> controller.info() jpayne@69: [{'user_api': 'blas', jpayne@69: 'internal_api': 'flexiblas', jpayne@69: 'num_threads': 1, jpayne@69: 'prefix': 'libflexiblas', jpayne@69: 'filepath': '/usr/local/lib/libflexiblas.so.3.3', jpayne@69: 'version': '3.3.1', jpayne@69: 'available_backends': ['NETLIB', 'OPENBLASPTHREAD', 'ATLAS'], jpayne@69: 'loaded_backends': ['NETLIB'], jpayne@69: 'current_backend': 'NETLIB'}] jpayne@69: jpayne@69: # Retrieve the flexiblas controller jpayne@69: >>> flexiblas_ct = controller.select(internal_api="flexiblas").lib_controllers[0] jpayne@69: jpayne@69: # Switch the backend with one predefined at build time (listed in "available_backends") jpayne@69: >>> flexiblas_ct.switch_backend("OPENBLASPTHREAD") jpayne@69: >>> controller.info() jpayne@69: [{'user_api': 'blas', jpayne@69: 'internal_api': 'flexiblas', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libflexiblas', jpayne@69: 'filepath': '/usr/local/lib/libflexiblas.so.3.3', jpayne@69: 'version': '3.3.1', jpayne@69: 'available_backends': ['NETLIB', 'OPENBLASPTHREAD', 'ATLAS'], jpayne@69: 'loaded_backends': ['NETLIB', 'OPENBLASPTHREAD'], jpayne@69: 'current_backend': 'OPENBLASPTHREAD'}, jpayne@69: {'user_api': 'blas', jpayne@69: 'internal_api': 'openblas', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libopenblas', jpayne@69: 'filepath': '/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so', jpayne@69: 'version': '0.3.8', jpayne@69: 'threading_layer': 'pthreads', jpayne@69: 'architecture': 'Haswell'}] jpayne@69: jpayne@69: # It's also possible to directly give the path to a shared library jpayne@69: >>> flexiblas_controller.switch_backend("/home/jeremie/miniforge/envs/flexiblas_threadpoolctl/lib/libmkl_rt.so") jpayne@69: >>> controller.info() jpayne@69: [{'user_api': 'blas', jpayne@69: 'internal_api': 'flexiblas', jpayne@69: 'num_threads': 2, jpayne@69: 'prefix': 'libflexiblas', jpayne@69: 'filepath': '/usr/local/lib/libflexiblas.so.3.3', jpayne@69: 'version': '3.3.1', jpayne@69: 'available_backends': ['NETLIB', 'OPENBLASPTHREAD', 'ATLAS'], jpayne@69: 'loaded_backends': ['NETLIB', jpayne@69: 'OPENBLASPTHREAD', jpayne@69: '/home/jeremie/miniforge/envs/flexiblas_threadpoolctl/lib/libmkl_rt.so'], jpayne@69: 'current_backend': '/home/jeremie/miniforge/envs/flexiblas_threadpoolctl/lib/libmkl_rt.so'}, jpayne@69: {'user_api': 'openmp', jpayne@69: 'internal_api': 'openmp', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libomp', jpayne@69: 'filepath': '/home/jeremie/miniforge/envs/flexiblas_threadpoolctl/lib/libomp.so', jpayne@69: 'version': None}, jpayne@69: {'user_api': 'blas', jpayne@69: 'internal_api': 'openblas', jpayne@69: 'num_threads': 4, jpayne@69: 'prefix': 'libopenblas', jpayne@69: 'filepath': '/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so', jpayne@69: 'version': '0.3.8', jpayne@69: 'threading_layer': 'pthreads', jpayne@69: 'architecture': 'Haswell'}, jpayne@69: {'user_api': 'blas', jpayne@69: 'internal_api': 'mkl', jpayne@69: 'num_threads': 2, jpayne@69: 'prefix': 'libmkl_rt', jpayne@69: 'filepath': '/home/jeremie/miniforge/envs/flexiblas_threadpoolctl/lib/libmkl_rt.so.2', jpayne@69: 'version': '2024.0-Product', jpayne@69: 'threading_layer': 'gnu'}] jpayne@69: ``` jpayne@69: jpayne@69: You can observe that the previously linked OpenBLAS shared object stays loaded by jpayne@69: the Python program indefinitely, but FlexiBLAS itself no longer delegates BLAS calls jpayne@69: to OpenBLAS as indicated by the `current_backend` attribute. jpayne@69: ### Writing a custom library controller jpayne@69: jpayne@69: Currently, `threadpoolctl` has support for `OpenMP` and the main `BLAS` libraries. jpayne@69: However it can also be used to control the threadpool of other native libraries, jpayne@69: provided that they expose an API to get and set the limit on the number of threads. jpayne@69: For that, one must implement a controller for this library and register it to jpayne@69: `threadpoolctl`. jpayne@69: jpayne@69: A custom controller must be a subclass of the `LibController` class and implement jpayne@69: the attributes and methods described in the docstring of `LibController`. Then this jpayne@69: new controller class must be registered using the `threadpoolctl.register` function. jpayne@69: An complete example can be found [here]( jpayne@69: https://github.com/joblib/threadpoolctl/blob/master/tests/_pyMylib/__init__.py). jpayne@69: jpayne@69: ### Sequential BLAS within OpenMP parallel region jpayne@69: jpayne@69: When one wants to have sequential BLAS calls within an OpenMP parallel region, it's jpayne@69: safer to set `limits="sequential_blas_under_openmp"` since setting `limits=1` and jpayne@69: `user_api="blas"` might not lead to the expected behavior in some configurations jpayne@69: (e.g. OpenBLAS with the OpenMP threading layer jpayne@69: https://github.com/xianyi/OpenBLAS/issues/2985). jpayne@69: jpayne@69: ### Known Limitations jpayne@69: jpayne@69: - `threadpool_limits` can fail to limit the number of inner threads when nesting jpayne@69: parallel loops managed by distinct OpenMP runtime implementations (for instance jpayne@69: libgomp from GCC and libomp from clang/llvm or libiomp from ICC). jpayne@69: jpayne@69: See the `test_openmp_nesting` function in [tests/test_threadpoolctl.py]( jpayne@69: https://github.com/joblib/threadpoolctl/blob/master/tests/test_threadpoolctl.py) jpayne@69: for an example. More information can be found at: jpayne@69: https://github.com/jeremiedbb/Nested_OpenMP jpayne@69: jpayne@69: Note however that this problem does not happen when `threadpool_limits` is jpayne@69: used to limit the number of threads used internally by BLAS calls that are jpayne@69: themselves nested under OpenMP parallel loops. `threadpool_limits` works as jpayne@69: expected, even if the inner BLAS implementation relies on a distinct OpenMP jpayne@69: implementation. jpayne@69: jpayne@69: - Using Intel OpenMP (ICC) and LLVM OpenMP (clang) in the same Python program jpayne@69: under Linux is known to cause problems. See the following guide for more details jpayne@69: and workarounds: jpayne@69: https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md jpayne@69: jpayne@69: - Setting the maximum number of threads of the OpenMP and BLAS libraries has a global jpayne@69: effect and impacts the whole Python process. There is no thread level isolation as jpayne@69: these libraries do not offer thread-local APIs to configure the number of threads to jpayne@69: use in nested parallel calls. jpayne@69: jpayne@69: jpayne@69: ## Maintainers jpayne@69: jpayne@69: To make a release: jpayne@69: jpayne@69: - Bump the version number (`__version__`) in `threadpoolctl.py` and update the jpayne@69: release date in `CHANGES.md`. jpayne@69: jpayne@69: - Build the distribution archives: jpayne@69: jpayne@69: ```bash jpayne@69: pip install flit jpayne@69: flit build jpayne@69: ``` jpayne@69: jpayne@69: and check the contents of `dist/`. jpayne@69: jpayne@69: - If everything is fine, make a commit for the release, tag it and push the jpayne@69: tag to github: jpayne@69: jpayne@69: ```bash jpayne@69: git tag -a X.Y.Z jpayne@69: git push git@github.com:joblib/threadpoolctl.git X.Y.Z jpayne@69: ``` jpayne@69: jpayne@69: - Upload the wheels and source distribution to PyPI using flit. Since PyPI doesn't jpayne@69: allow password authentication anymore, the username needs to be changed to the jpayne@69: generic name `__token__`: jpayne@69: jpayne@69: ```bash jpayne@69: FLIT_USERNAME=__token__ flit publish jpayne@69: ``` jpayne@69: jpayne@69: and a PyPI token has to be passed in place of the password. jpayne@69: jpayne@69: - Create a PR for the release on the [conda-forge feedstock](https://github.com/conda-forge/threadpoolctl-feedstock) (or wait for the bot to make it). jpayne@69: jpayne@69: - Publish the release on github. jpayne@69: jpayne@69: ### Credits jpayne@69: jpayne@69: The initial dynamic library introspection code was written by @anton-malakhov jpayne@69: for the smp package available at https://github.com/IntelPython/smp . jpayne@69: jpayne@69: threadpoolctl extends this for other operating systems. Contrary to smp, jpayne@69: threadpoolctl does not attempt to limit the size of Python multiprocessing jpayne@69: pools (threads or processes) or set operating system-level CPU affinity jpayne@69: constraints: threadpoolctl only interacts with native libraries via their jpayne@69: public runtime APIs. jpayne@69: