mirror of
https://github.com/ipfs/ipfs-blog.git
synced 2026-05-18 07:15:46 +02:00
fixing relative paths
This commit is contained in:
@@ -50,7 +50,7 @@ Through a combination of crawling the network and attempting connections to all
|
||||
|
||||
1. As a first step, we wanted to figure out which buckets in the nodes’ routing tables do the affected nodes occupy. We found that they occupied the higher buckets of the nodes’ routing tables, which further meant that most likely PUTs would get slower, but GETs should not be affected too much. This is because the DHT lookup from the GET operation terminates when it hits *one* of the 20 closest peers to the target key, and the PUT operation terminates when it has found all the 20 closest peers. Since a significant portion of the network is unresponsive, the PUT operation certainly hits at least one unresponsive node, but the GET operation has good chances of finding at least one responsive node within the 20 closest.
|
||||
|
||||

|
||||

|
||||
|
||||
2. After further investigation and given the very large percentage of nodes that were affected by the resource manager misconfiguration, we started looking into the impact of the incident to the GET performance.
|
||||
|
||||
@@ -66,12 +66,12 @@ Through a combination of crawling the network and attempting connections to all
|
||||
|
||||
The PUT operation was slowed down by approximately 10%
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
The GET operation was also disrupted (in contrast to our initial assumption) and was slowed down by approximately 15%, at times reaching closer to 20%.
|
||||
|
||||

|
||||

|
||||
|
||||
4. We also experimented with even higher concurrency factors, in particular with `alpha = 20`, as a potential mitigation strategy. We repeated the same experiment with one extra set of runs: the case where we interact with all nodes in the network (i.e., we do not ignore unresponsive peers), but have higher concurrency factor.
|
||||
|
||||
@@ -91,11 +91,11 @@ In parallel, we kept monitoring the situation by instrumenting a PUT and GET mea
|
||||
|
||||
`kubo-v0.18.1` was [released on the 30th January 2023](https://github.com/ipfs/kubo/releases/tag/v0.18.1) and within the first 10 days, more than 8.5k nodes updated to this release. Our monitoring software allowed us to have an accurate view of the state of the network and observed that the update to the new kubo release brought significant performance increase for the GET operation - more than 40% at the 95th percentile on a sample of ~2k requests, compared to the situation before the `kubo-v0.18.1` release.
|
||||
|
||||

|
||||

|
||||
|
||||
We also monitored the situation compared to the pre-incident performance by running the experiment where we ignored the set of PeerIDs that were identified as affected by the misconfiguration. As a sample from more than 20k GET operations, in the figure below we show that the impact has reduced to ~5% (mid-February 2023).
|
||||
|
||||

|
||||

|
||||
|
||||
## Addressing the Root Cause 🔧
|
||||
|
||||
|
||||
Reference in New Issue
Block a user