Hands on labs and code to help you learn, measure, and build using architectural best practices.
-
Updated
Jan 14, 2026 - Python
Hands on labs and code to help you learn, measure, and build using architectural best practices.
Chaos Engineering Toolkit & Orchestration for Developers
Reliability engineering toolkit for Python - https://reliability.readthedocs.io/en/latest/
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
The k6 documentation website.
The Chaos Toolkit core library
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.
ARF is an agentic reliability intelligence platform that separates decision intelligence (OSS) from governed execution (Enterprise), enabling autonomous operations with deterministic safety guarantees.
Sample applications of supported integrations by Last9 Products
The open source toolbox for resilient operations
Reliability Report - A collaborative curated content site about Reliability Engineering
PARANSYS (Python pArametric Reliability Analysis on ANSYS) is a module that can connect the ANSYS software to Python scripts using APDL scripts. It has one connection class and two other classes for reliability analysis, using explicit or implicit limit states.
Deterministic runtime for agent evaluation
A resilient, fault‑tolerant telemetry analytics pipeline designed to validate, benchmark, and stress‑test high‑frequency sensor data streams under real‑world failure conditions. Includes chaos testing, DLQ repair, GPU‑accelerated ingestion, and end‑to‑end reliability validation for motorsport‑grade telemetry environments.
This work proposes a joint-probabilistic model between the remaining life and inspection observations, which is then used to perform prognostics on currently installed assets. At every new observation, the forward-looking belief on the asset's remaining life is Bayesian updated, granting dynamic estimations on its failure probability. Consequent…
The Open-PSA Model Exchange Format
Ultimate Autonomous Driving Systems Safety Engineering free tool which covers UL 4600, ISO/TR 5083, ISO 26262, ISO 21448, ISO 21434 and ISO/PAS 8800 qualitative and quantitative analyses with an additional method to assess the risk on prototypes, called PAL from PAL 1 to PAL5
Crash-safe distributed job execution with fencing tokens, lease recovery and deterministic failure validation.
An autonomous framework that detects, diagnoses, and remediates failures in data pipelines using telemetry, metadata lineage, and policy-driven recovery workflows.
A read-only SRE sentinel that detects system instability via variance, allocation rate, and amplification signals.
Add a description, image, and links to the reliability-engineering topic page so that developers can more easily learn about it.
To associate your repository with the reliability-engineering topic, visit your repo's landing page and select "manage topics."