🧰 Building a 1-Person MLOps Stack That Works

Gradient Descent Weekly — Issue #14

You’re training models. You’re deploying APIs. You’re monitoring logs.
You are… the ML team.

In today’s fast-moving landscape, startups, indie hackers, and intrapreneurs are often building ML systems solo. But MLOps isn’t just for 10-person DevOps teams at unicorns. It’s for anyone deploying machine learning—even if you’re a team of one.

So what does a lean, effective, maintainable MLOps stack look like for a solo ML engineer?
In this issue, we’ll break it down:

Tools that don’t require an army to maintain
Processes that keep you sane
Automation where it matters most

Let’s build an MLOps stack that just works.

🧠 First: What Do You Actually Need?

Let’s define the core lifecycle you need to support:

Data ingestion
Experiment tracking
Model training & versioning
Deployment (API or batch)
Monitoring & alerting
Retraining & rollback

Your stack should cover these without 50 services and a YAML nightmare.

🧱 The 1-Person MLOps Stack (Opinionated & Battle-Tested)

Function	Tool (Recommended)	Why?
Data versioning	DVC or just Git + S3	Lightweight, reproducible
Experiment tracking	MLflow (local or hosted)	Easy to integrate, minimal setup
Model training	Python scripts + Hydra	Flexible, config-driven
Packaging	Docker	Reusable, portable
Deployment (API)	FastAPI or BentoML	Minimal boilerplate
Deployment (infra)	AWS SageMaker, Render, or EC2	Pick based on your skill/budget
Monitoring	Prometheus + Grafana, or Evidently	Visual, open-source
CI/CD	GitHub Actions	Built-in, free, flexible
Alerts	Slack + webhooks	Get notified when it breaks

💡Golden Rule:
Keep it simple. If a tool feels like overhead, replace it with a script or a cron job.

🔧 Setting It Up — End to End

🔹 1. Local Dev + Versioning

Organize with src/, data/, models/, and configs/
Use Hydra or argparse for param control
Version your code with Git
Store data and models in S3, GCS, or HuggingFace Hub

dvc init
dvc add data/train.csv
dvc push

🔹 2. Track Experiments

Use MLflow (self-hosted or mlflow.set_tracking_uri() to log to S3).

with mlflow.start_run():
    mlflow.log_params(params)
    mlflow.log_metrics({"accuracy": acc})
    mlflow.sklearn.log_model(model, "model")

✅ Track: model name, accuracy, F1, timestamp, config hash

🔹 3. Train and Save Models

Write modular scripts like:

train.py
inference.py
evaluate.py

Save model as:

joblib (sklearn)
SavedModel (TensorFlow)
state_dict.pt (PyTorch)

Log it with version + metadata.

🔹 4. Package as API

Use FastAPI to turn model into a REST endpoint.

@app.post("/predict")
def predict(input: InputData):
    preds = model.predict(input.to_array())
    return {"prediction": preds.tolist()}

Wrap with Docker:

FROM python:3.10
COPY . /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

🔹 5. Deploy It

Options:

Render (zero infra)
SageMaker (scalable, pay-per-use)
EC2 with Docker (cheap, DIY)
Cloud Run / Lambda (event-based)

One-liner deploy with Render or SageMaker CLI:

render.yaml + git push = deployed endpoint

🔹 6. Monitor & Alert

Use:

Evidently AI for drift and data quality checks
Prometheus + Grafana to monitor latency, throughput
Slack webhook for alerts on drop in accuracy or latency spike

Log all inputs and outputs for observability.

🔄 7. Retraining Workflow

Use GitHub Actions or Airflow (lightweight setup) to schedule:

Weekly retrain
On new data arrival
On drift detection

✅ Version everything. Retrain should always be reproducible.

💡 Tips to Stay Lean and Sane

🧪 Automate tests for data sanity and model behavior
🔁 Reuse components (data loaders, models, configs)
🎯 Track only what matters (no vanity metrics)
🛑 Don’t over-engineer — YAML fatigue is real
🧘 Aim for reproducibility, not perfection

🚀 Real-World Solo Stack in Action

Use case: Fraud detection ML API for a fintech MVP
Stack:

FastAPI + scikit-learn
Docker + Render
MLflow for tracking
S3 for model storage
Evidently for data drift
GitHub Actions for retraining

One person. One model. Live in prod.
Maintained with less than 2 hours/week.

🧠 Final Thoughts: You Don’t Need a Team—Just a Plan

MLOps isn’t about fancy stacks.
It’s about repeatability, reliability, and speed—even solo.

You don’t need Kubernetes, Kafka, or an enterprise platform to be production-grade.
You need:

A way to test
A way to deploy
A way to monitor
A way to fix things fast

That’s it.

Keep it lean. Keep it yours. And keep shipping.

🔮 Up Next on Gradient Descent Weekly:

The MLOps Tool Fatigue Problem: Too Many Tools, Too Little ROI

🧰 Building a 1-Person MLOps Stack That Works

🧠 First: What Do You Actually Need?

🧱 The 1-Person MLOps Stack (Opinionated & Battle-Tested)

🔧 Setting It Up — End to End

🔹 1. Local Dev + Versioning

🔹 2. Track Experiments

🔹 3. Train and Save Models

🔹 4. Package as API

🔹 5. Deploy It

🔹 6. Monitor & Alert

🔄 7. Retraining Workflow

💡 Tips to Stay Lean and Sane

🚀 Real-World Solo Stack in Action

🧠 Final Thoughts: You Don’t Need a Team—Just a Plan

🔮 Up Next on Gradient Descent Weekly:

Comments

More from this blog

🚀 Imagining an OpenAI-like Company in India: Building the Future of Artificial Intelligence

🛰️ The LLM Observability Stack: What to Track and Why

🪦 Prompt Engineering Is Dead. Long Live Prompt Architectures

🧲 How to Build a Vector Database That Doesn’t Suck

🤖 RAG vs Fine-Tuning: Which One Is Right for You?

Command Palette

🧠 First: What Do You Actually Need?

🧱 The 1-Person MLOps Stack (Opinionated & Battle-Tested)

🔧 Setting It Up — End to End

🔹 1. Local Dev + Versioning

🔹 2. Track Experiments

🔹 3. Train and Save Models

🔹 4. Package as API

🔹 5. Deploy It

🔹 6. Monitor & Alert

🔄 7. Retraining Workflow

💡 Tips to Stay Lean and Sane

🚀 Real-World Solo Stack in Action

🧠 Final Thoughts: You Don’t Need a Team—Just a Plan

🔮 Up Next on Gradient Descent Weekly:

Comments

More from this blog