May 1995 The QNX 4 Network "Raw Packet" Interface ----------------------------------------- In addition to transmitting QNX packets over the various local area networks supported by QNX, there exists an operating system interface, or "hook" which allows a user-written program to transmit and receive non-QNX (we call them raw) packets over the network. Let's look at ethernet. An ethernet packet looks like the following: ------------------------------------------------------- | dst_nid | src_nid | type/length | data | crc | ------------------------------------------------------- 6 bytes 6 bytes 2 bytes 1500 bytes 4 bytes The type/length field is a odd one. In the old digital/intel/xerox (DIX) ethernet specification, it was defined as a "type", which allowed drivers to figure out what protocol a packet belongs to when they received it. There are many defined protocol "types", but here are a couple relevant ones: 0x0800 IP 0x0806 ARP 0x6004 DEC LAT 0x8014 SGI network games 0x809B Appletalk 0x8137 Novell 0x8138 Novell 0x814C SNMP 0x8203 QNX Then, the Institute of Electrical And Electronic Engineers jumped in with their specification of ethernet, IEEE 802.3. They changed some electrical stuff, which everyone pretty much went along with. But they also slightly changed the packet layout, replacing the two-byte "type" with a two-byte "length" of the following "data". Few people actually use the intended IEEE 802.3 packet layout. Fortunately, a valid "length" is always less than the smallest possible "type" of 0x800 or 2048, so when someone receives a packet they can always distinguish between an old DIX v2 packet with a "type", and an IEEE 802.3 packet with a "length". We actually tried to be "standard" and use an IEEE 802.3 "length" in our QNX ethernet packets early on, but ran into problems because we weren't the way everyone else (like TCP/IP) was with a "type" field. So, we applied for and got a protocol "type" field for QNX 4. Packet Reception ---------------- When a packet arrives in from the ethernet, the network driver looks at the "type" field, and if it's 0x8203, it's a QNX 4 packet. If it isn't, it must be for someone else. This allows many different applictions, handling many different protocols, to share the one ethernet network card being serviced by the QNX 4 network driver. For example, TCP/IP in QNX 4 uses the "raw packet" interface to the Network Manager to transmit and receive IP and ARP packets. When "Socket" starts up, it sends a "register" message to the Network Manager, saying that it wants to receive all packets with an ethernet type of 0x800 (IP). Then, it sends another message, saying that it wants to also handle packets with a type of 0x806 (ARP). Socket is actually handling two different ethernet protocols. In this "register" message, Socket includes segments and offsets point to functions which Net can quickly "far call" which perform the following things: 1) allocate a buffer (rx, step 1) 2) buffer is now filled (rx, step 2) 3) transmit complete (tx) This is pretty neat. What Net does is create an alias of Socket's code and data segments in his local descriptor table (LDT). For example, in Net's LDT, his code segment might be x5, and his data segment might be xD. And Socket's code segment of x5 might be aliased into Net's LDT as 25, and Socket's data segment of xD might be aliased into Net's LDT as x2D. Socket Net Net.ether1000 ------ ------- ------------- 5 code ----+ 5 code +---- 5 code D data --+ | D data | +-- D data | | 15 <--+ | | | 1D <----+ | +-> 25 +---> 2D We can also see that the Network Driver, Net.ether1000 has similarly aliased his code and data segments into Net's LDT. Net drivers get far called into all the time, so we want that interface to be very fast, with low overhead. So. When a packet arrives in from the ethernet, Net examines the header and if the type is 0x800 (IP), Net then far calls the function in Net's code segment of x25 to allocate a buffer for this packet. Socket's code then executes in Net's LDT as Net, to quickly allocate a buffer, with a very minimum of overhead. After Socket returns a pointer to the buffer, Net passes that pointer down in his far call back into the network driver, who is the only one who actually knows exactly how to talk to the network card. The network driver then copies the IP packet directly from the card to the buffer provided by Socket. This avoids a redundant copy of the packet. Extra copies are Evil and are to Be Avoided (tm). Now that the buffer has been filled with the packet, Net far calls into Socket again (via Net's code segment of x25) to function #2 above, telling Socket that the buffer is now filled. Socket is then free to put that buffer on a queue of received packets. See, until the second function is called, the packet buffer allocated in the first far call into Socket is kind of in limbo - Socket can't really peek into it, because the contents of it are indeterminate, sort of like a schroedinger cat :) Packet Transmission ------------------- Transmission is actually simpler. The "raw app", Socket allocates a "queue packet" (a data structure which points to a generic transmit request) and puts it on Net's input queue, just as Proc or the kernel would. When Net pulls Socket's queue packet off his input queue, he sees that it is a "raw" transmit request, and simply passes it down to the appropriate network driver. The network driver copies the data directly from Socket's buffer to the network card. Again, no extra copies. When the network driver completes transmitting the packet, he gives it's queue packet back to Net, and Net puts it on a queue of Socket's, and far calls Socket's function #3 which usually just returns a proxy for Net to trigger. Triggering the proxy wakes up Socket, and Socket can look at his packet queue of transmitted packet, and process them appropriately.