tcp: TCP SACK documentation
This commit is contained in:
@@ -752,10 +752,46 @@ are then asked to lower such value, and to return it.
|
||||
PktsAcked is used in case the algorithm needs timing information (such as
|
||||
RTT), and it is called each time an ACK is received.
|
||||
|
||||
TCP SACK and non-SACK
|
||||
+++++++++++++++++++++
|
||||
To avoid code duplication and the effort of maintaining two different versions
|
||||
of the TCP core, namely RFC 6675 (TCP-SACK) and RFC 5681 (TCP congestion
|
||||
control), we have merged RFC 6675 in the current code base. If the receiver
|
||||
supports the option, the sender bases its retransmissions over the received
|
||||
SACK information. However, in the absence of that option, the best it can do is
|
||||
to follow the RFC 5681 specification (on Fast Retransmit/Recovery) and
|
||||
employing NewReno modifications in case of partial ACKs.
|
||||
|
||||
The merge work consisted in implementing an emulation of fake SACK options in
|
||||
the sender (when the receiver does not support SACK) following RFC 5681 rules.
|
||||
The generation is straightforward: each duplicate ACK (following the definition
|
||||
of RFC 5681) carries a new SACK option, that indicates (in increasing order)
|
||||
the blocks transmitted after the SND.UNA, not including the block starting from
|
||||
SND.UNA itself.
|
||||
|
||||
With this emulated SACK information, the sender behaviour is unified in these
|
||||
two cases. By carefully generating these SACK block, we are able to employ all
|
||||
the algorithms outlined in RFC 6675 (e.g. Update(), NextSeg(), IsLost()) during
|
||||
non-SACK transfers. Of course, in the case of RTO expiration, no guess about
|
||||
SACK block could be made, and so they are not generated (consequently, the
|
||||
implementation will re-send all segments starting from SND.UNA, even the ones
|
||||
correctly received). Please note that the generated SACK option (in the case of
|
||||
a non-SACK receiver) by the sender never leave the sender node itself; they are
|
||||
created locally by the TCP implementation and then consumed.
|
||||
|
||||
A similar concept is used in Linux with the function tcp_add_reno_sack. Our
|
||||
implementation resides in the TcpTxBuffer class that implements a scoreboard
|
||||
through two different lists of segments. TcpSocketBase actively uses the API
|
||||
provided by TcpTxBuffer to query the scoreboard; please refer to the Doxygen
|
||||
documentation (and to in-code comments) if you want to learn more about this
|
||||
implementation.
|
||||
|
||||
When SACK attribute is enabled for the receiver socket, the sender will not
|
||||
craft any SACK option, relying only on what it receives from the network.
|
||||
|
||||
Current limitations
|
||||
+++++++++++++++++++
|
||||
|
||||
* SACK is not supported
|
||||
* TcpCongestionOps interface does not contain every possible Linux operation
|
||||
* Fast retransmit / fast recovery are bound with TcpSocketBase, thereby preventing easy simulation of TCP Tahoe
|
||||
|
||||
|
||||
@@ -192,11 +192,11 @@ public:
|
||||
* \brief A base class for implementation of a stream socket using TCP.
|
||||
*
|
||||
* This class contains the essential components of TCP, as well as a sockets
|
||||
* interface for upper layers to call. This serves as a base for other TCP
|
||||
* functions where the sliding window mechanism is handled here. This class
|
||||
* provides connection orientation and sliding window flow control. Part of
|
||||
* this class is modified from the original NS-3 TCP socket implementation
|
||||
* (TcpSocketImpl) by Raj Bhattacharjea <raj.b@gatech.edu> of Georgia Tech.
|
||||
* interface for upper layers to call. This class provides connection orientation
|
||||
* and sliding window flow control; congestion control is delegated to subclasses
|
||||
* of TcpCongestionOps. Part of TcpSocketBase is modified from the original
|
||||
* NS-3 TCP socket implementation (TcpSocketImpl) by
|
||||
* Raj Bhattacharjea <raj.b@gatech.edu> of Georgia Tech.
|
||||
*
|
||||
* For IPv4 packets, the TOS set for the socket is used. The Bind and Connect
|
||||
* operations set the TOS for the socket to the value specified in the provided
|
||||
@@ -225,10 +225,15 @@ public:
|
||||
* Congestion control interface
|
||||
* ---------------------------
|
||||
*
|
||||
* Congestion control, unlike older releases of ns-3, has been splitted from
|
||||
* Congestion control, unlike older releases of ns-3, has been split from
|
||||
* TcpSocketBase. In particular, each congestion control is now a subclass of
|
||||
* the main TcpCongestionOps class. Switching between congestion algorithm is
|
||||
* now a matter of setting a pointer into the TcpSocketBase class.
|
||||
* now a matter of setting a pointer into the TcpSocketBase class. The idea
|
||||
* and the interfaces are inspired by the Linux operating system, and in
|
||||
* particular from the structure tcp_congestion_ops.
|
||||
*
|
||||
* Transmission Control Block (TCB)
|
||||
* --------------------------------
|
||||
*
|
||||
* The variables needed to congestion control classes to operate correctly have
|
||||
* been moved inside the TcpSocketState class. It contains information on the
|
||||
@@ -240,13 +245,13 @@ public:
|
||||
* (see for example cWnd trace source).
|
||||
*
|
||||
* Fast retransmit
|
||||
* ---------------------------
|
||||
* ----------------
|
||||
*
|
||||
* The fast retransmit enhancement is introduced in RFC 2581 and updated in
|
||||
* RFC 5681. It basically reduces the time a sender waits before retransmitting
|
||||
* RFC 5681. It reduces the time a sender waits before retransmitting
|
||||
* a lost segment, through the assumption that if it receives a certain number
|
||||
* of duplicate ACKs, a segment has been lost and it can be retransmitted.
|
||||
* Usually it is coupled with the Limited Transmit algorithm, defined in
|
||||
* Usually, it is coupled with the Limited Transmit algorithm, defined in
|
||||
* RFC 3042.
|
||||
*
|
||||
* In ns-3, these algorithms are included in this class, and it is implemented inside
|
||||
@@ -257,15 +262,39 @@ public:
|
||||
* recovery phase, the method EnterRecovery is called.
|
||||
*
|
||||
* Fast recovery
|
||||
* --------------------------
|
||||
* -------------
|
||||
*
|
||||
* The fast recovery algorithm is introduced RFC 2001, and it basically
|
||||
* The fast recovery algorithm is introduced RFC 2001, and it
|
||||
* avoids to reset cWnd to 1 segment after sensing a loss on the channel. Instead,
|
||||
* the slow start threshold is halved, and the cWnd is set equal to such value,
|
||||
* plus segments for the cWnd inflation.
|
||||
*
|
||||
* The algorithm is implemented in the ProcessAck method.
|
||||
*
|
||||
* RTO expiration
|
||||
* --------------
|
||||
*
|
||||
* When the Retransmission Time Out expires, the TCP faces a big performance
|
||||
* drop. The expiration event is managed in ReTxTimeout method, that basically
|
||||
* set the cWnd to 1 segment and start "from scratch" again.
|
||||
*
|
||||
* Options management
|
||||
* ------------------
|
||||
*
|
||||
* SYN and SYN-ACK options, which are allowed only at the beginning of the
|
||||
* connection, are managed in the DoForwardUp and SendEmptyPacket methods.
|
||||
* To read all others, we have set up a cycle inside ReadOptions. For adding
|
||||
* them, there is no a unique place, since the options (and the information
|
||||
* available to build them) are scattered around the code. For instance,
|
||||
* the SACK option is built in SendEmptyPacket only under certain conditions.
|
||||
*
|
||||
* SACK
|
||||
* ----
|
||||
*
|
||||
* The SACK generation/management is delegated to the buffer classes, namely
|
||||
* TcpTxBuffer and TcpRxBuffer. Please take a look on their documentation if
|
||||
* you need more informations.
|
||||
*
|
||||
*/
|
||||
class TcpSocketBase : public TcpSocket
|
||||
{
|
||||
|
||||
@@ -69,33 +69,55 @@ public:
|
||||
*
|
||||
* \brief Tcp sender buffer
|
||||
*
|
||||
* The class keeps track of all data that the application wishes to transmit
|
||||
* to the other end. When the data is acknowledged, it is removed from the buffer.
|
||||
* The class keeps track of all data that the application wishes to transmit to
|
||||
* the other end. When the data is acknowledged, it is removed from the buffer.
|
||||
* The buffer has a maximum size, and data is not saved if the amount exceeds
|
||||
* the limit. Packets can be added to the class through the method Add().
|
||||
* An important thing to remember is that all the data managed is strictly
|
||||
* sequential. It can be divided in blocks, but all the data follow a strict
|
||||
* ordering. That ordering is managed through SequenceNumber.
|
||||
* the limit. Packets can be added to the class through the method Add(). An
|
||||
* important thing to remember is that all the data managed is strictly
|
||||
* sequential. It can be divided into blocks, but all the data follow a strict
|
||||
* ordering. That order is managed through SequenceNumber.
|
||||
*
|
||||
* In other words, this buffer contains numbered bytes (e.g 1,2,3), and the class
|
||||
* is allowed to return only ordered (using "<" as operator) subsets (e.g. 1,2
|
||||
* or 2,3 or 1,2,3).
|
||||
* In other words, this buffer contains numbered bytes (e.g., 1,2,3), and the
|
||||
* class is allowed to return only ordered (using "<" as operator) subsets
|
||||
* (e.g. 1,2 or 2,3 or 1,2,3).
|
||||
*
|
||||
* The data structure underlying this is composed by two distinct packet lists.
|
||||
*
|
||||
* The first (SentList) is initially empty, and it contains the packets returned
|
||||
* by the method CopyFromSequence.
|
||||
*
|
||||
* The second (AppList) is initially empty, and it contains the packets coming
|
||||
* from the applications, but that are not transmitted yet as segments.
|
||||
*
|
||||
* To discover how the chunk are managed and retrieved from these lists, check
|
||||
* CopyFromSequence documentation.
|
||||
* The first (SentList) is initially empty, and it contains the packets
|
||||
* returned by the method CopyFromSequence. The second (AppList) is initially
|
||||
* empty, and it contains the packets coming from the applications, but that
|
||||
* are not transmitted yet as segments. To discover how the chunks are managed
|
||||
* and retrieved from these lists, check CopyFromSequence documentation.
|
||||
*
|
||||
* The head of the data is represented by m_firstByteSeq, and it is returned by
|
||||
* HeadSequence(). The last byte is returned by TailSequence().
|
||||
* In this class we store also the size (in bytes) of the packets inside the
|
||||
* SentList in the variable m_sentSize.
|
||||
* HeadSequence(). The last byte is returned by TailSequence(). In this class,
|
||||
* we also store the size (in bytes) of the packets inside the SentList in the
|
||||
* variable m_sentSize.
|
||||
*
|
||||
* SACK management
|
||||
* ---------------
|
||||
*
|
||||
* The SACK information is usually saved in a data structure referred as
|
||||
* scoreboard. In this implementation, the scoreboard is developed on top of
|
||||
* the existing classes. In particular, instead of keeping raw pointers to
|
||||
* packets in TcpTxBuffer we added the capability to store some flags
|
||||
* associated with every segment sent. This is done through the use of the
|
||||
* class TcpTxItem: instead of storing a list of packets, we store a list of
|
||||
* TcpTxItem. Each item has different flags (check the corresponding
|
||||
* documentation) and maintaining the scoreboard is a matter of travelling the
|
||||
* list and set the SACK flag on the corresponding segment sent.
|
||||
*
|
||||
* Inefficiencies
|
||||
* --------------
|
||||
*
|
||||
* The algorithms outlined in RFC 6675 are full of inefficiencies. In
|
||||
* particular, traveling all the sent list each time it is needed to compute
|
||||
* the bytes in flight is expensive. We try to overcome the issue by
|
||||
* maintaining a pointer to the highest sequence SACKed; in this way, we can
|
||||
* avoid traveling all the list in some cases. Another option could be keeping
|
||||
* a count of each critical value (e.g., the number of packets sacked).
|
||||
* However, this would be different from the algorithms in RFC. There are some
|
||||
* other possible improvements; if you wish, take a look and try to add some
|
||||
* earlier exit conditions in the loops.
|
||||
*
|
||||
* \see Size
|
||||
* \see SizeFromSequence
|
||||
|
||||
Reference in New Issue
Block a user