The challenge
A growing logistics network with unpredictable demand. The existing forecasting pipeline ran overnight in batch mode — too late, too coarse, blind to every seasonal or regional shift. Dispatchers were reacting to yesterday while today was already being decided.
The approach
We designed a real-time inference platform: a streaming pipeline on Kafka, GPU inference on Kubernetes and a multi-tenant Python API. Models were trained in PyTorch and adapted live — no more weekly retraining cycles, but continuous learning from every new shipment.
The focus was not on the largest model card, but on operational reliability: versioning, rollback safety, tenant isolation, and observability down to the request level.
The outcome
18% better forecasting accuracy compared to the legacy system. Sub-200ms latency on 99% of predictions. Dispatch now decides in the moment — not the next morning.
Stack
- Python
- FastAPI
- PyTorch
- Kafka
- Kubernetes
- PostgreSQL