Self-Hosting n8n on Microsoft Azure (Part 2): Performance Tuning, Backups, Memory Issues, and Scaling

In Part 1, I covered how I deployed a production-ready, self-hosted n8n instance on Microsoft Azure using Docker, Caddy, and DuckDNS.
Once the system was live and real workflows started running, a new phase began — operational reality.
This is the part most tutorials skip:
Workflows start failing randomly
Memory usage spikes
OAuth tokens expire
The VM freezes at 3 a.m.
You realize “working” is not the same as “stable”
This article documents what I learned while tuning performance, solving memory issues, implementing backups, and preparing for scale on Azure.
1. Understanding n8n Performance in Production
n8n performance depends on four main factors:
VM resources (CPU, RAM, disk)
Workflow design (loops, concurrency, API calls)
Execution mode (main vs queue mode)
External services (APIs, databases, webhooks)
Early on, I learned that most performance problems are not n8n bugs — they’re architecture problems.
2. Memory Issues: The Most Common Failure Mode
One of the most frequent errors I encountered was:
“Workflow did not finish, possible out of memory issue”
Why This Happens
Common causes:
Large JSON payloads held in memory
Long-running loops
Multiple workflows executing simultaneously
Running everything on a low-RAM VM (1–2 GB)
Default Node.js memory limits
n8n keeps execution data in memory during runtime, which makes RAM the first bottleneck.
3. Fixing Out-of-Memory (OOM) Problems
3.1 Upgrading the VM
The fastest fix was increasing VM resources.
Recommended minimum for production:
4 GB RAM
2 vCPUs
On Azure, resizing a VM is straightforward and often cheaper than debugging memory crashes for days.
3.2 Increasing Node.js Memory Limits
By default, Node.js limits memory usage. This can be adjusted via environment variables:
NODE_OPTIONS=--max-old-space-size=4096
This allows n8n to use more memory safely when workflows grow in complexity.
3.3 Reducing Execution Data Size
In the n8n settings:
Disable saving full execution data unless needed
Use “Save successful executions: none or last”
This significantly reduces memory pressure and disk usage.
3.4 Workflow Design Optimization
Bad workflow design can kill even a powerful VM.
Best practices:
Avoid huge loops where possible
Use pagination instead of fetching everything at once
Split large workflows into smaller ones
Use “Execute Workflow” nodes to modularize logic
Clear unnecessary fields using Set nodes
Small design changes often produce massive stability improvements.
4. Performance Tuning at the Workflow Level
4.1 Controlling Concurrency
Running too many workflows simultaneously causes:
CPU spikes
Memory exhaustion
API rate limit issues
Control concurrency by:
Scheduling workflows intelligently
Avoiding heavy workflows running at the same time
Using queues (covered later)
4.2 Webhook vs Polling
Where possible:
Prefer webhooks over polling
Polling wastes CPU cycles and memory
Webhooks trigger workflows only when needed
This simple shift improves performance instantly.
5. Backups: The Thing You’ll Regret Not Doing
Sooner or later, something will break:
Accidental deletion
Disk failure
VM corruption
Bad updates
Backups are non-negotiable.
5.1 What Needs to Be Backed Up
At minimum:
n8n database (SQLite or PostgreSQL)
Encryption key
Docker volumes
Environment variables
Workflow data
If you lose the encryption key, all credentials are unrecoverable.
5.2 Moving from SQLite to PostgreSQL
For production, SQLite is not ideal.
PostgreSQL offers:
Better performance
Safer concurrent access
Easier backups
Better scaling support
On Azure, PostgreSQL can be:
Self-hosted in Docker
Or managed using Azure Database for PostgreSQL
Managed PostgreSQL reduces operational overhead significantly.
5.3 Automated Backups
Best practices:
Daily database dumps
Store backups outside the VM
Use Azure Blob Storage or another cloud bucket
Automate via cron or workflows
Backups should be tested, not just created.
6. Scaling n8n on Azure
At some point, vertical scaling (bigger VM) is not enough.
6.1 Vertical Scaling (Simplest)
Increase RAM and CPU
Easiest option
Works well up to a point
Azure allows VM resizing with minimal downtime.
6.2 Horizontal Scaling with Queue Mode
For serious workloads, n8n supports queue mode.
Queue mode:
Separates execution from the main instance
Uses Redis as a message broker
Allows multiple workers
Improves reliability and throughput
Architecture:
One main n8n instance (UI + API)
Multiple worker containers
Redis for job distribution
PostgreSQL as the database
This is where n8n becomes truly production-grade.
6.3 Azure Considerations for Queue Mode
On Azure:
Use a VM Scale Set or multiple VMs
Use managed Redis (Azure Cache for Redis)
Use managed PostgreSQL
Separate compute from storage
This reduces single points of failure.
7. Monitoring and Stability
Production systems need visibility.
Key things to monitor:
RAM usage
CPU usage
Disk space
Docker container health
Workflow execution failures
Azure Monitor + basic Docker logs go a long way.
8. Security and Stability Improvements
Additional production tips:
Restrict VM inbound ports (only 80/443/22)
Use SSH keys only
Rotate credentials regularly
Use environment variables, never hard-code secrets
Keep Docker images updated (Watchtower helps)
9. Real-World Lessons Learned
Most failures are architectural, not software bugs
Memory issues are normal when scaling
Backups are boring — until they save you
Vertical scaling is underrated
Queue mode is worth it when traffic grows
Azure is beginner-friendly if you respect its complexity
Final Thoughts
Running n8n in production teaches you far more than any managed automation platform ever will.
You stop thinking like:
“How do I make this workflow work?”
And start thinking like:
“How do I make this reliable at 2 a.m. when I’m asleep?”
That shift is the real value of self-hosting.



