How GPUse Billing Works
You only pay for actual GPU usage - not for provisioning, model loading, or idle time.
No Warmup Charges
Model loading is FREE
(~2-5 min)
Per-Second Billing
$0.0002028/second
(~$0.73/hr)
Auto Scale-to-Zero
Stops billing when idle
(after 5-15 min no requests)
When Billing Starts and Stops
PROVISIONING
FREE (~2-5 min)
- Container starts and downloads model weights
- Model loads into GPU memory
- Health checks verify readiness
Billing: $0.00
FIRST REQUEST
BILLING STARTS
- First HTTP request hits endpoint
- Billing activates immediately
- Actual work begins
Billing: $0.0002028/sec
ACTIVE USAGE
BILLING CONTINUES
- Workload is processing
- Instance stays warm for fast responses
- Charges continue during pauses
Billing: $0.0002028/sec
SCALING DOWN
BILLING CONTINUES
- No requests for 5-15 minutes
- Instance scales to zero
- Billing stops when scaled to zero
Billing: $0.0002028/sec
What You're Actually Paying For
Serverless GPU Instance
| Component | Per Second | Per Hour | Per Day (8 hrs) |
|---|---|---|---|
| GPUse Total Rate | $0.0002028 | ~$0.73 | ~$5.84 |
Billing Granularity
- Minimum: 100 milliseconds (0.1 seconds)
- Increment: Per-second billing
- Instance Minimum: 1 minute from startup to termination
- Grace Period: 5 minutes FREE per project (before any billing)
What Does This Actually Cost?
Quick Inference Job
Transcribe a 30-minute podcast with Whisper
- Provisioning: 3 minutes (FREE)
- Processing: 5 minutes ($0.06)
- $0.0002028 × 300 sec = $0.06
- Scaling down: 5-15 min
Total: ~$0.24
Development Session
Test and iterate on a model deployment
- Provisioning: 4 minutes (FREE)
- Active development: 2 hours
- $0.73/hr × 2 = $1.46
- Scaling down: 5-15 min
Total: ~$1.64
Production Batch Job
Process 1000 images with SDXL
- Provisioning: 5 minutes (FREE)
- Processing: 8 hours continuous
- $0.73/hr × 8 = $5.84
- Scaling down: 5-15 min
Total: ~$6.02
GPUse vs Traditional GPU Providers
| Feature | Traditional Providers | GPUse |
|---|---|---|
| Charged for provisioning | [✓] Yes (you pay for setup) | [✗] No (FREE warmup period) |
| Charged for model loading | [✓] Yes (~5-10 min) | [✗] No (FREE until first request) |
| Billing starts | At deployment | At first request |
| Billing granularity | Per hour (rounded up) | Per second |
| Scale to zero | Manual or not available | Automatic (5-15 min idle) |
| Minimum charge | 1 hour | 1 minute |
| Cold start time | Charged | FREE |
| Idle time cost | Full rate | Auto scales to zero |
Bottom Line: Traditional providers charge ~$0.84-1.20/hr from deployment.
GPUse charges $0.73/hr ONLY when you're using the GPU.
You Stop Paying When...
Billing Stops When
- Instance completes scaling to zero
- You manually stop the instance via
stop_compute - Grace period expires without checkout
Billing Continues During
- Active processing and requests
- Scale-to-zero period (5-15 min after last request)
Frequently Asked Questions
How does billing differ from traditional providers?
Traditional providers charge from the moment you click "deploy". GPUse charges from your first request - meaning provisioning and model loading are FREE.
Why do you charge during the scale-to-zero period?
Serverless GPUs stay warm for a period of time waiting for follow-up requests, which prevents 2-5 minute delays on every request. The instance will scale to zero within 5-15 minutes to save you paying for compute time that you're not using.
What if I forget to stop an instance?
The instance will automatically scale to zero after 5-15 minutes of no requests. You continue to be billed during this scale-down period, and billing stops once the instance becomes fully idle. You can also manually call stop_compute to terminate immediately and avoid the scale-down charges.
How accurate is the scale-to-zero timeline?
It can vary from 5-15 minutes depending on traffic patterns. GPUse does not control this timeline - it's managed by the underlying infrastructure. Scale-to-zero consistently occurs within this range, often completing in a shorter timeline.
Do I pay for failed requests or errors?
Yes - billing is based on instance runtime, not successful requests. If your workload fails, you still pay for the GPU time used. This is why the grace period is valuable for testing.
What happens if I run out of credits?
Your instances will continue running until they scale to zero or are manually stopped. You'll receive low-balance warnings via the MCP tools and dashboard.
Is the grace period per account or per project?
Per account. Use unique project_id values to get 5 minutes FREE for each account.
Why is provisioning FREE if it uses cloud resources?
We believe you should only pay for actual work. Model loading and container startup are infrastructure overhead, not your compute work. This policy sets GPUse apart from traditional providers.
How do I view my billing costs?
We provide detailed billing per GPU instance. Log in to your dashboard to view usage history showing each instance with timestamps and costs. You can filter by active instances or view terminated instances in the history tab. Each entry shows the exact runtime and cost for that GPU instance.
Why do I see irregular billing intervals in my usage history?
Our system checks your GPU usage every 30 seconds, but each check takes a few seconds to complete. This means billing intervals appear slightly irregular in your usage history—you might see entries at 33, 35, or 38-second intervals instead of exactly 30 seconds.
You are always billed for the exact time your GPU was running. Note that billing continues during the scale-to-zero period (up to 15 minutes after your last request) until the instance fully terminates.
Start with 5 Minutes FREE
Test GPUse risk-free with our grace period. No credit card required.
Your agent provisions a GPU, completes the work, then surfaces the checkout link only if you want to continue.
Get StartedQuestions about billing? Email support@gpuse.com or log in to view detailed cost breakdowns for each GPU instance via your dashboard.