I have the following issue and literally tried everything already and cant figure out, what's the root cause of this, how it is explained, and then also how to maybe fix it.
Cable Router 192.168.0.1 <-> OpenWRT Linux router, 192.168.0.2 (WAN), 10.0.0.199 (LAN) <-> LAN 10.0.0.x
So I am using double NAT, masquerading the WAN port of the OpenWRT router.
The issue now is, that downloads randomly slow down on all Windows clients after a few seconds. Starting at max speed of 500mbit and then slow down rapidly to around 60-120mbit. Sometimes it randomly just works for 20 minutes or so and then is broken again. Also then it can happen the download randomly is just stuck at 5MB/s or something over and over again.
It seems to be a weird issue exclusively to Windows and with double NAT in this setup.
The total weird issue though is, that some servers don't trigger this issue, even on the Windows clients.
The issue does not happen directly on the OpenWRT routet itself, or from a Linux client on LAN side. Just on all Windows 10 clients.
I have totally no idea, how this could be explained.
I tried everything so far, including resetting the tcp stack of Windows with no change.
There is no SQM / QoS set up on the OpenWRT router.
My theory is, that is somehow has to do with the TCP receive window and the tcp autotuning algorithm of Windows. I am not an expert in this, so it would be welcome, if someone who is, could say something about it. I looked into it the past day and as I understand, there is no RWIN anymore you can set manually in Windows 10 and it just uses an autotune algorithm for the tcp receive window.
If I actually do a "netsh int tcp set global autotuninglevel=disabled" on the Windows clients, my download rate instantly become stuck at perfectly same rate of 3MB/s and never goes over it. This is not normal too I think. And going back to the default value of "netsh int tcp set global autotuninglevel=normal" results in the issue above.
I read into the tcp autotuning algorithm and it seems to work like this, that it communicates with routers and the end server, so maybe here is the issue somehow, that OpenWRT or the double NAT somehow blocks this autotuning "talk" but I have no idea how. Or that the double NAT somehow interferences with it, that it adds some latency to the timestamp calculation of the autotuning calculation, and then Windows bumps down the TCP receive window on the fly and the above issue happens. This is just a theory of course I am not an expert in this. Anyone maybe can say something about it?
MS says about that: "When the receive window auto tuning feature is enabled, older routers, older firewalls, and older operating systems that are incompatible with the receive window auto tuning feature may sometimes cause slow data transfer or a loss of connectivity between Vista clients. When this occurs, users may experience slow performance. Or, the applications may crash. These older devices do not comply with the RFC 1323 standard. Some device manufacturers provide software that works around the hardware limitations."
If anyone have an idea, how this can be explained and fixed, I would be very welcomed about that!