Lightning-AI / pytorch-lightning UNCLAIMED

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

0 0 0 Python

Fix ddp_notebook CUDA fork check to allow passive initialization (#21402)

* Fix ddp_notebook CUDA fork check to allow passive initialization

The previous implementation used torch.cuda.is_initialized() which returns
True even when CUDA is passively initialized (e.g., during library imports
or device availability checks). This caused false positives in environments
like Kaggle notebooks where libraries may query CUDA without creating a
context.

This fix uses PyTorch's internal torch.cuda._is_in_bad_fork() function,
which more accurately detects when we're in an actual bad fork state (i.e.,
CUDA was initialized with a context and then the process was forked).

The change allows passive CUDA initialization while still catching genuine
problematic cases. Falls back to the old check for older PyTorch versions
that don't have _is_in_bad_fork.

Fixes #21389

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: Add callable check and improve test coverage for CUDA fork check

- Add callable() check before calling _is_in_bad_fork to ensure robustness
- Add test_check_for_bad_cuda_fork_with_is_in_bad_fork() to test new detection path
- Ensures test coverage for both the new _is_in_bad_fork and fallback paths

* docs: Update Fabric changelog for DDP notebook CUDA fork check fix (#21402)

* test: Add mock for torch.cuda._is_in_bad_fork in test_check_for_bad_cuda_fork

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bhimraj Yadav <[email protected]>

Adele committed 3mo ago

419b37b61997c2ec2ac53391d9ede27a80315054

Parent: 5130530

Committed by GitHub <[email protected]> on 12/18/2025, 9:20:12 AM