Skip to main content

Command Palette

Search for a command to run...

🧰 Building a 1-Person MLOps Stack That Works

Because your team is just you, and that model still needs to ship

Published
4 min read
🧰 Building a 1-Person MLOps Stack That Works
B

Forward-thinking IT Operations Leader with cross-domain expertise spanning incident & change management, cloud infrastructure (Azure, AWS, GCP), and automation engineering. Proven track record in building and leading high-performance operations teams that drive reliability, innovation, and uptime across mission-critical enterprise systems. Adept at aligning IT services with business goals through strategic leadership, cloud-native transformation, and process modernization. Currently spearheading application operations and monitoring for digital modernization initiatives. Deeply passionate about coding in Rust, Go, and Python, and solving real-world problems through machine learning, model inference, and Generative AI. Actively exploring the intersection of AI engineering and infrastructure automation to future-proof operational ecosystems and unlock new business value.

Gradient Descent Weekly — Issue #14

You’re training models. You’re deploying APIs. You’re monitoring logs.
You are… the ML team.

In today’s fast-moving landscape, startups, indie hackers, and intrapreneurs are often building ML systems solo. But MLOps isn’t just for 10-person DevOps teams at unicorns. It’s for anyone deploying machine learning—even if you’re a team of one.

So what does a lean, effective, maintainable MLOps stack look like for a solo ML engineer?
In this issue, we’ll break it down:

  • Tools that don’t require an army to maintain

  • Processes that keep you sane

  • Automation where it matters most

Let’s build an MLOps stack that just works.

🧠 First: What Do You Actually Need?

Let’s define the core lifecycle you need to support:

  1. Data ingestion

  2. Experiment tracking

  3. Model training & versioning

  4. Deployment (API or batch)

  5. Monitoring & alerting

  6. Retraining & rollback

Your stack should cover these without 50 services and a YAML nightmare.

🧱 The 1-Person MLOps Stack (Opinionated & Battle-Tested)

FunctionTool (Recommended)Why?
Data versioningDVC or just Git + S3Lightweight, reproducible
Experiment trackingMLflow (local or hosted)Easy to integrate, minimal setup
Model trainingPython scripts + HydraFlexible, config-driven
PackagingDockerReusable, portable
Deployment (API)FastAPI or BentoMLMinimal boilerplate
Deployment (infra)AWS SageMaker, Render, or EC2Pick based on your skill/budget
MonitoringPrometheus + Grafana, or EvidentlyVisual, open-source
CI/CDGitHub ActionsBuilt-in, free, flexible
AlertsSlack + webhooksGet notified when it breaks

💡Golden Rule:
Keep it simple. If a tool feels like overhead, replace it with a script or a cron job.

🔧 Setting It Up — End to End

🔹 1. Local Dev + Versioning

  • Organize with src/, data/, models/, and configs/

  • Use Hydra or argparse for param control

  • Version your code with Git

  • Store data and models in S3, GCS, or HuggingFace Hub

dvc init
dvc add data/train.csv
dvc push

🔹 2. Track Experiments

Use MLflow (self-hosted or mlflow.set_tracking_uri() to log to S3).

with mlflow.start_run():
    mlflow.log_params(params)
    mlflow.log_metrics({"accuracy": acc})
    mlflow.sklearn.log_model(model, "model")

✅ Track: model name, accuracy, F1, timestamp, config hash

🔹 3. Train and Save Models

Write modular scripts like:

train.py
inference.py
evaluate.py

Save model as:

  • joblib (sklearn)

  • SavedModel (TensorFlow)

  • state_dict.pt (PyTorch)

Log it with version + metadata.

🔹 4. Package as API

Use FastAPI to turn model into a REST endpoint.

@app.post("/predict")
def predict(input: InputData):
    preds = model.predict(input.to_array())
    return {"prediction": preds.tolist()}

Wrap with Docker:

FROM python:3.10
COPY . /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

🔹 5. Deploy It

Options:

  • Render (zero infra)

  • SageMaker (scalable, pay-per-use)

  • EC2 with Docker (cheap, DIY)

  • Cloud Run / Lambda (event-based)

One-liner deploy with Render or SageMaker CLI:

render.yaml + git push = deployed endpoint

🔹 6. Monitor & Alert

Use:

  • Evidently AI for drift and data quality checks

  • Prometheus + Grafana to monitor latency, throughput

  • Slack webhook for alerts on drop in accuracy or latency spike

Log all inputs and outputs for observability.

🔄 7. Retraining Workflow

Use GitHub Actions or Airflow (lightweight setup) to schedule:

  • Weekly retrain

  • On new data arrival

  • On drift detection

✅ Version everything. Retrain should always be reproducible.

💡 Tips to Stay Lean and Sane

  • 🧪 Automate tests for data sanity and model behavior

  • 🔁 Reuse components (data loaders, models, configs)

  • 🎯 Track only what matters (no vanity metrics)

  • 🛑 Don’t over-engineer — YAML fatigue is real

  • 🧘 Aim for reproducibility, not perfection

🚀 Real-World Solo Stack in Action

Use case: Fraud detection ML API for a fintech MVP
Stack:

  • FastAPI + scikit-learn

  • Docker + Render

  • MLflow for tracking

  • S3 for model storage

  • Evidently for data drift

  • GitHub Actions for retraining

One person. One model. Live in prod.
Maintained with less than 2 hours/week.

🧠 Final Thoughts: You Don’t Need a Team—Just a Plan

MLOps isn’t about fancy stacks.
It’s about repeatability, reliability, and speed—even solo.

You don’t need Kubernetes, Kafka, or an enterprise platform to be production-grade.
You need:

  • A way to test

  • A way to deploy

  • A way to monitor

  • A way to fix things fast

That’s it.

Keep it lean. Keep it yours. And keep shipping.

🔮 Up Next on Gradient Descent Weekly:

  • The MLOps Tool Fatigue Problem: Too Many Tools, Too Little ROI