Running Stable Diffusion locally means free, private, unlimited image generation — if you have the right hardware. This guide covers setup on both Mac (Apple Silicon) and Windows (NVIDIA GPU).
Hardware Requirements
Minimum (functional):
- NVIDIA GPU with 6GB VRAM (GTX 1060 or better)
- OR Apple Silicon Mac (M1 or better, 16GB RAM recommended)
- 20GB free storage for models
Recommended:
- NVIDIA RTX 3080/4070 or better (10-24GB VRAM)
- OR Apple Silicon Mac with 32GB+ RAM
- 50-100GB storage for multiple models and LoRAs
CPU-only: Works but generation takes minutes per image instead of seconds.
Choose Your Interface
AUTOMATIC1111 (A1111): Most features, largest community, steeper learning curve. Best for power users.
ComfyUI: Node-based workflow builder. Harder to learn, more powerful for complex pipelines. Best for automation.
Forge: Fork of A1111 with better performance on newer models. Recommended for 2026.
This guide covers both A1111/Forge and a quick ComfyUI start.
Windows Setup: AUTOMATIC1111 / Forge
Prerequisites
Install these in order:
- Python 3.10.x — NOT 3.11+, NOT the latest
- Git
- CUDA drivers (from NVIDIA website, if using NVIDIA GPU)
Installation
# Clone the repository
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
# On first run, it installs everything automatically
./webui-user.bat
First run takes 15-30 minutes — it downloads dependencies automatically.
Alternative: Forge (Recommended for Performance)
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
./webui.bat
Forge runs 20-40% faster than A1111 on the same hardware.
Mac Setup (Apple Silicon)
Prerequisites
# Install Homebrew if not installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install dependencies
brew install cmake protobuf rust [email protected] git wget
Installation
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
./webui.sh
Mac uses MPS (Metal Performance Shaders) instead of CUDA. Performance is good on M2/M3 chips.
Downloading Models
Models go in: stable-diffusion-webui/models/Stable-diffusion/
Recommended Starting Models
Realistic Imagery:
- Stable Diffusion XL Base 1.0 (from HuggingFace: stabilityai/stable-diffusion-xl-base-1.0)
- Realistic Vision v6 (civitai.com — excellent for portraits)
Artistic:
- Dreamshaper XL
- SDXL-Lightning (4-step, very fast)
Download via command line:
# From HuggingFace
cd stable-diffusion-webui/models/Stable-diffusion/
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
Or download .safetensors files from Civitai.com and move them to the models directory.
First Generation
- Open http://localhost:7860 in your browser (A1111 starts a web server)
- Select a model from the Checkpoint dropdown
- Type a positive prompt and optionally a negative prompt
- Click Generate
Basic Prompt Structure
[Subject], [setting/context], [style/medium], [lighting], [quality modifiers]
Example:
Positive: A woman reading a book in a cozy library, afternoon sunlight through tall windows, oil painting, warm colors, highly detailed, masterpiece
Negative: ugly, bad anatomy, deformed, low quality, blurry, watermark, text, nsfw
Key Settings Explained
Sampling Method: DPM++ 2M Karras (balanced), Euler a (creative), DDIM (fast)
Sampling Steps:
- Fast: 15-20 steps (lower quality)
- Standard: 25-30 steps (good balance)
- Detailed: 40-50 steps (diminishing returns)
CFG Scale: How closely to follow the prompt
- Low (3-5): More creative, less literal
- Medium (7): Default, good balance
- High (10-15): Strict prompt following, can be harsh
Size:
- SDXL native: 1024×1024
- Portrait: 832×1216 or 768×1024
- Landscape: 1216×832
ComfyUI Setup
ComfyUI is more complex but enables powerful automation:
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
# Run
python main.py
Access at http://localhost:8188
ComfyUI uses a node-based workflow. Each node represents an operation; you connect them to build generation pipelines. Start with the default workflow and expand from there.
LoRA Models
LoRA (Low-Rank Adaptation) models fine-tune specific styles or characters.
Install: Drop .safetensors files into models/Lora/
Use in prompt:
<lora:model_name:0.7>
The number (0.7) is the weight (0.1-1.0). Higher = stronger influence.
Find LoRAs: Civitai.com has thousands of community-trained LoRAs for styles, characters, and concepts.
Optimization Tips
For NVIDIA GPUs:
- Add
--xformersto webui-user.bat for faster generation - Use
--medvramwith 6-8GB VRAM GPUs - Use
--lowvramwith 4GB VRAM (slower but functional)
For Apple Silicon:
- Use
--no-halfif you see color artifacts - Generation speed is best on M2 Pro/Max and later chips
Model format: Prefer .safetensors over .ckpt files — faster to load and safer.
Common Issues
CUDA out of memory: Reduce image resolution or add --medvram flag
Black/green images on Mac: Add --no-half flag
Slow on CPU: Normal — CPU generation is 50-100x slower than GPU. Consider cloud options.
Model not showing: Check the file is in the correct folder and refresh the checkpoint list
Privacy
Everything runs locally. No images or prompts are sent anywhere. This makes local Stable Diffusion ideal for:
- NSFW content (follows local laws)
- Unreleased product images
- Client work with confidentiality requirements
- Any use case where you don’t want a third party seeing your generations