🧰 Building a 1-Person MLOps Stack That Works
Because your team is just you, and that model still needs to ship

Forward-thinking IT Operations Leader with cross-domain expertise spanning incident & change management, cloud infrastructure (Azure, AWS, GCP), and automation engineering. Proven track record in building and leading high-performance operations teams that drive reliability, innovation, and uptime across mission-critical enterprise systems. Adept at aligning IT services with business goals through strategic leadership, cloud-native transformation, and process modernization. Currently spearheading application operations and monitoring for digital modernization initiatives. Deeply passionate about coding in Rust, Go, and Python, and solving real-world problems through machine learning, model inference, and Generative AI. Actively exploring the intersection of AI engineering and infrastructure automation to future-proof operational ecosystems and unlock new business value.
Gradient Descent Weekly — Issue #14
You’re training models. You’re deploying APIs. You’re monitoring logs.
You are… the ML team.
In today’s fast-moving landscape, startups, indie hackers, and intrapreneurs are often building ML systems solo. But MLOps isn’t just for 10-person DevOps teams at unicorns. It’s for anyone deploying machine learning—even if you’re a team of one.
So what does a lean, effective, maintainable MLOps stack look like for a solo ML engineer?
In this issue, we’ll break it down:
Tools that don’t require an army to maintain
Processes that keep you sane
Automation where it matters most
Let’s build an MLOps stack that just works.
🧠 First: What Do You Actually Need?
Let’s define the core lifecycle you need to support:
Data ingestion
Experiment tracking
Model training & versioning
Deployment (API or batch)
Monitoring & alerting
Retraining & rollback
Your stack should cover these without 50 services and a YAML nightmare.
🧱 The 1-Person MLOps Stack (Opinionated & Battle-Tested)
| Function | Tool (Recommended) | Why? |
| Data versioning | DVC or just Git + S3 | Lightweight, reproducible |
| Experiment tracking | MLflow (local or hosted) | Easy to integrate, minimal setup |
| Model training | Python scripts + Hydra | Flexible, config-driven |
| Packaging | Docker | Reusable, portable |
| Deployment (API) | FastAPI or BentoML | Minimal boilerplate |
| Deployment (infra) | AWS SageMaker, Render, or EC2 | Pick based on your skill/budget |
| Monitoring | Prometheus + Grafana, or Evidently | Visual, open-source |
| CI/CD | GitHub Actions | Built-in, free, flexible |
| Alerts | Slack + webhooks | Get notified when it breaks |
💡Golden Rule:
Keep it simple. If a tool feels like overhead, replace it with a script or a cron job.
🔧 Setting It Up — End to End
🔹 1. Local Dev + Versioning
Organize with
src/,data/,models/, andconfigs/Use Hydra or argparse for param control
Version your code with Git
Store data and models in S3, GCS, or HuggingFace Hub
dvc init
dvc add data/train.csv
dvc push
🔹 2. Track Experiments
Use MLflow (self-hosted or mlflow.set_tracking_uri() to log to S3).
with mlflow.start_run():
mlflow.log_params(params)
mlflow.log_metrics({"accuracy": acc})
mlflow.sklearn.log_model(model, "model")
✅ Track: model name, accuracy, F1, timestamp, config hash
🔹 3. Train and Save Models
Write modular scripts like:
train.py
inference.py
evaluate.py
Save model as:
joblib(sklearn)SavedModel(TensorFlow)state_dict.pt(PyTorch)
Log it with version + metadata.
🔹 4. Package as API
Use FastAPI to turn model into a REST endpoint.
@app.post("/predict")
def predict(input: InputData):
preds = model.predict(input.to_array())
return {"prediction": preds.tolist()}
Wrap with Docker:
FROM python:3.10
COPY . /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
🔹 5. Deploy It
Options:
Render (zero infra)
SageMaker (scalable, pay-per-use)
EC2 with Docker (cheap, DIY)
Cloud Run / Lambda (event-based)
One-liner deploy with Render or SageMaker CLI:
render.yaml + git push = deployed endpoint
🔹 6. Monitor & Alert
Use:
Evidently AI for drift and data quality checks
Prometheus + Grafana to monitor latency, throughput
Slack webhook for alerts on drop in accuracy or latency spike
Log all inputs and outputs for observability.
🔄 7. Retraining Workflow
Use GitHub Actions or Airflow (lightweight setup) to schedule:
Weekly retrain
On new data arrival
On drift detection
✅ Version everything. Retrain should always be reproducible.
💡 Tips to Stay Lean and Sane
🧪 Automate tests for data sanity and model behavior
🔁 Reuse components (data loaders, models, configs)
🎯 Track only what matters (no vanity metrics)
🛑 Don’t over-engineer — YAML fatigue is real
🧘 Aim for reproducibility, not perfection
🚀 Real-World Solo Stack in Action
Use case: Fraud detection ML API for a fintech MVP
Stack:
FastAPI+scikit-learnDocker+RenderMLflowfor trackingS3for model storageEvidentlyfor data driftGitHub Actionsfor retraining
One person. One model. Live in prod.
Maintained with less than 2 hours/week.
🧠 Final Thoughts: You Don’t Need a Team—Just a Plan
MLOps isn’t about fancy stacks.
It’s about repeatability, reliability, and speed—even solo.
You don’t need Kubernetes, Kafka, or an enterprise platform to be production-grade.
You need:
A way to test
A way to deploy
A way to monitor
A way to fix things fast
That’s it.
Keep it lean. Keep it yours. And keep shipping.
🔮 Up Next on Gradient Descent Weekly:
- The MLOps Tool Fatigue Problem: Too Many Tools, Too Little ROI






