By thecrazygm on Skatehive
Hey everyone, In my previous post, I talked about why Hive-Engine nodes sometimes "limp along" instead of failing over cleanly when a primary RPC goes bad. I pitched two paths: a minimal operational fix (Option A) and a broader architecture redesign (Option B). The consensus (and my own gut feeling for a first step) was clear: make it work first. I’ve spent the last week running that "Option A" fix on a live node, and I’m happy to report that the PR to the QA branch is now open. The PR is here: https://github.com/hive-engine/hivesmartcontracts/pull/134 What’s in the PR? I didn't just dump a theory into a pull request. I wanted a result that an operator could actually rely on. The final implementation includes: Request-level failover: Block reads now treat your streamNodes list as a proper failover chain. If one fetch fails, it tries the next node immediately instead of hanging. Scheduler-level demotion: If a node fails repeatedly, the scheduler "cools it down" and gives other nodes a s