We've been working through a problem recently where largish uploads of files would reliably fail - but not in the way you'd expect. A typical course of events would be:
- Client uploads file via SSH, SFTP or FTP
- Upload might drop out a couple of times during the process, but resume OK - or might not have any dropouts
- Upload finishes. Client & server confirm checksum of uploaded file - it doesn't match, so then retries the upload again
- Repeat ad infinitum.
Our particular problem was with Storage Craft Image Manager uploading images to a remote server.
There were several things that we thought might be the problem, including firewall firmware, Client software, server software, rate limiiting at server and / or client end, general Internet goodness / badness. So we worked through testing each of these areas:
- Updated versions of software for each of the components in question - didn't fix it
- Tested with different server / client software and different protocols (for example SSH vs SFTP vs FTP) - didn't help
- Upgraded firewall firmware to the latest recommended release
In the end the problem turned out to be TCP-MSS window size. There are plenty of good articles out there about what this is, so I won't try to go into detail on what it is and how it works. But in short, this value controls the maximum amount of data that can be sent in a single packet. Using Australia Post as an analogy... if I have a 1Kg parcel post bag, then the total weight of my parcel including the box / packaging cannot exceed 1Kg. So if I have a very light box, then I can put more inside it. But if I have a heavy box with a lot of padding, then the actual contents of the box must be lighter to still fit inside the 1Kg limit. Internet traffic works the same way - there are some limits that control how large each parcel (aka packet) can be. And often there are things that increase the size of the packaging, without you being aware of it. These include ADSL encapsulation, encryption headers amongst other things. So although in theory your computer generates a packet of size 1500 - on regular ADSL you won't be able to use more than 1350 to send data. In some cases it will be more extreme, and you might find (as we did) that your limit was actually closer to 1000 (or 67%) for a particular ADSL provider.
The problem is complicated by the fact that sometimes different people add more to the packaging as it makes its way across the Internet. This would be like sending your 1Kg parcel overseas - and then Customs sticking a big label on it that takes it over 1Kg... so the post office at the remote end rejects it!!!
So... how do you work around this in a fortigate?
- Define a firewall object for the destination. For example "MyDestinationServer"
- Create a new policy that is triggered *before* your other outbound policies
- Verify that traffic is hitting the policy once you've created it
Defining the firewall object:
config firewall address
set associated-interface "wan1"
set subnet 220.127.116.11 255.255.255.255
Creating a new policy with the TCP-MSS clamp:
config firewall policy
set srcintf "switch"
set dstintf "wan1"
set srcaddr "MyInternalClient"
set dstaddr "MyTargetLocation"
set action accept
set schedule "always"
set service "FTP" "FTP_GET" "FTP_PUT" "SSH"
set nat enable
set tcp-mss-sender 1000
The "tcp-mss-sender" is for outbound traffic, "tcp-mss-receiver" is for inbound. You need to tweak this number until you find a value that reliably works for you. In our case this happened to be 1000 - yours might be higher or lower.
You can then verify that the policy is being hit by heading to Firewall / Policy / Monitor by Policy / Bytes (this particular rule will typically show up quickly as a high usage policy if you're actively uploading something).
On the receive side you're looking to see things like:
- A reduced number of disconnects / dropouts / resumes
- A reduced number of failed uploads
- A resumed upload will often show an instantaneous upload speed far in excess of your link capacity. For example if it has previously uploaded 80% of a 1Gb file, then drops out... resume again... the resume might come back and say that the upload speed is currently 300MB / second when you link speed is actually 10MB :) . This is because of how it calculates the average speed for the current resumed upload.