-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Insights: deepspeedai/DeepSpeed
Overview
Could not load contribution data
Please try again later
5 Pull requests merged by 5 people
-
hf tp+zero training doc.
#7151 merged
Mar 21, 2025 -
Update container version that runs on A6000 tests.
#7153 merged
Mar 20, 2025 -
Enhance Gaudi2 CI/Nightly Coverage with Model Parallelism and Linear Tests
#7146 merged
Mar 19, 2025 -
Correct the BACKWARD_PREFETCH_SUBMIT mismatch
#7120 merged
Mar 17, 2025 -
Conditionally quote env vars
#7071 merged
Mar 17, 2025
8 Pull requests opened by 7 people
-
[NFC] Typo fix in SP layer.
#7152 opened
Mar 19, 2025 -
DeepCompile for enhanced compiler integration
#7154 opened
Mar 19, 2025 -
Avoid graph break by removing redundant requires_grad attr change
#7158 opened
Mar 20, 2025 -
Add destroy to tests to free memory
#7160 opened
Mar 20, 2025 -
fixed: Modified the topkgating function and modified the test_moe file for testing
#7163 opened
Mar 21, 2025 -
Add DataStates-LLM: Asynchronous Checkpointing Engine Support
#7166 opened
Mar 21, 2025 -
Link AutoTP blog in the front page
#7167 opened
Mar 21, 2025 -
Fix pre-compile on cpu-only machines
#7168 opened
Mar 22, 2025
3 Issues closed by 3 people
-
[BUG] `PydanticDeprecatedSince20` coming from `DeepSpeedZeroConfig`
#7149 closed
Mar 21, 2025 -
Does DeeepSpeed support autotp and pp in llm training?
#7147 closed
Mar 19, 2025
12 Issues opened by 12 people
-
nv-ds-chat CI test failure
#7169 opened
Mar 23, 2025 -
Install DeepSpeed fail with setuptools-77.0.3
#7165 opened
Mar 21, 2025 -
[BUG] Enabling hpZ causes an abnormally large loss.
#7164 opened
Mar 21, 2025 -
Add DCO check guidance.
#7162 opened
Mar 21, 2025 -
[BUG] circular import on `DeepSpeedTransformerInference`
#7159 opened
Mar 20, 2025 -
[BUG] AttributeError: module 'deepspeed' has no attribute 'init_inference'
#7157 opened
Mar 20, 2025 -
[REQUEST] Support for Expert Optimizer State Partitioning with ZeRO Optimization in DeepSpeed MoE
#7156 opened
Mar 20, 2025 -
[BUG] Receiving CUDA error: invalid argument using pytorch 2.7 with deepspeed 0.16.4 with Cuda 12.8
#7150 opened
Mar 19, 2025 -
[REQUEST]Does the current version support distributed fine-tuning on mac devices (M2-M4)?
#7148 opened
Mar 18, 2025 -
static loss scale with stage 0 occured error
#7145 opened
Mar 17, 2025 -
[REQUEST] Support for Nvidia 50 Series GPUs: Pytorch >=2.6 and CUDA 12.8 required
#7144 opened
Mar 17, 2025
17 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Enable torch.autocast with ZeRO
#6993 commented on
Mar 21, 2025 • 2 new comments -
async tp allreduce
#7115 commented on
Mar 21, 2025 • 1 new comment -
[XPU] Support XCCL on deepspeed side
#7113 commented on
Mar 17, 2025 • 0 new comments -
Variable batch size and LR scheduler
#7104 commented on
Mar 21, 2025 • 0 new comments -
Unpin once transformers latest is fixed
#7088 commented on
Mar 20, 2025 • 0 new comments -
Enable ZeRO set/get APIs for NVMe offload
#7046 commented on
Mar 21, 2025 • 0 new comments -
Add `pyproject.toml` with legacy build backend to keep most logic in `setup.py`
#7033 commented on
Mar 20, 2025 • 0 new comments -
Improve overflow handling in ZeRO
#6976 commented on
Mar 21, 2025 • 0 new comments -
Fix: forbid repeated deepspeed.initialize on training objects
#6874 commented on
Mar 21, 2025 • 0 new comments -
[BUG] Zero2 offload overflow
#5241 commented on
Mar 23, 2025 • 0 new comments -
nv-nightly CI test failure
#7140 commented on
Mar 23, 2025 • 0 new comments -
safe_get_full_grad & safe_set_full_grad
#7117 commented on
Mar 21, 2025 • 0 new comments -
[BUG] DeepSpeed accuracy issue for torch.compile if activation checkpoint function not compiler disabled
#6718 commented on
Mar 21, 2025 • 0 new comments -
[BUG] Batch inference DDP + zero stage 3 = inference code hangs
#7128 commented on
Mar 20, 2025 • 0 new comments -
[BUG] OOM when train 70B models using deepspeed 0.16.4
#7116 commented on
Mar 19, 2025 • 0 new comments -
[REQUEST] torch.compile + DeepSpeed
#4677 commented on
Mar 19, 2025 • 0 new comments -
[BUG?] zero++ init problem
#7066 commented on
Mar 18, 2025 • 0 new comments