Writing Fsys Drivers for QNX 4.0 Revision 0.9, Beta Document. Caveat: This is a beta document, and, is error-free to the best of my knowledge. Quantum Software Systems Limited present's this information and any accompanying software "as is" without expressed or implied warantee of any kind. Related Documents: QNX 4.0 Administrator's Guide QNX 4.0 User's Guide WATCOM C Library Reference for QNX WATCOM C Language Reference Intel iAPX 286 Programmers Reference Manual Intel 80386 Programmer's Reference Manual. IBM Technical Reference Personal Computer AT. Fsys Driver Library for QNX 4.0. Abstract: This paper describes the interface between the filesystem and the device-specific driver routines. It provides a starting point for writing drivers for new devices. This paper assumes familiarity with multi-tasking operating systems, device drivers, and low level programming considerations. Overview: Traditional File Systems enforce a structure consisting of files and directories onto some form of permanent storage. Programs may then access and create files within this framework, without requiring any particular knowledge of the details of these devices. The File System, in turn, need only treat the storage device as a set of logically contiguous data blocks of some fixed size, which may be accessed in any order, much like an array of memory blocks. The role of the device driver is to convert this 'abstract' representation of a storage device into the operations to transfer data to and from a particular type of device. For maximum generality, the File System (fsys), contains NO device-specific code, and only maintains information on general classifications of drives. In particular, it is aware of HARD,FLOPPY,RAMDISK,REMOVABLE and TAPE. These may be extended in the future as need be. The disk driver is a process which 'links' to the filesystem, by providing a set of defined 'entry points', or subroutines, and a shared structure. Since this linking is performed at runtime, it is possible to start and stop various drivers at any time. In order attain the maximum throughput from devices, the filesystem (fsys) engineers it's device request according to a general model of devices, that is, if there are 5 blocks to be read from the disk, it will be faster if the request is sorted by increasing block address. This rule minimises the amount of time the drive spends relocating it's read-write heads. To furthur increase throughput, the filesystem attempts to present requests for multiple contiguous blocks. This minimises the communication overhead associated with the request. To avoid the filesystem throughput from bottle-necking on slow devices, the io requests are designed to be 'asynchronous'. That is, the filesystem requests a transfer of data, and the driver wakes up the filesystem when the transfer is complete. Between the time of the request, and the time of the 'wakeup', the filesystem is free to service or enqueue other requests, provided it does not attempt to issue any more requests for a drive which is busy. Restated the 'golden rule' of device drivers: "The driver shall not inhibit the progress of the filesystem". The driver is given a proxy by fsys at driver startup. The driver uses this proxy, and clearing it's busy flag, to tell the filesystem the event is complete and it can process the results. There are two types of entry points into the driver, synchronous and asynchronous. An asynchronous entry point must ALWAYS trigger a proxy, even if the request can be satisfied immediately. The remainder of this document delves into the details of building device drivers. The first section 'A little INTEL architecture' provides some of the basic concepts of the intel architecture, and some of the caveats on driver writing in this architecture. The second section 'the driver interface' describes the mechanism used to link the driver into fsys, and introduces the data structures used. The third section 'the data structures' describes the memory shared between the driver and the filesystem. The fourth section 'the entry points' describes the function calls made by the filesystem. 1: A little INTEL architecture: The intel processor(s) implement address space protection based upon arbitrary length segments of memory. The upper bound for the segment length is 65536 (2**16) for the intel '86,'186,'286 series, and 4294967296 (2**32) for the '386,'486 series. A 'logical address' consists of 2 components: the 'selector', which specifies the desired memory segment, and the 'offset' which is added to the base address of the memory segment to form a physical address. A selector is an index into a table of descriptors of segments. The descriptor holds such information as the base physical address of the segment, and the maximum size of the segment. It also maintains some flags relating to the type of segment, ie. code, data, read-only, system. These flags are of little interest to the driver writer. The selector contains three fields within a 16-bit value: Privity Level, Descriptor Table, Table index. The Privity Level allows the operating system to control access to certain instructions. The Descriptor table (1 bit) specifies one of to descriptor tables: Local Descritor Table (LDT) or Global Descriptor Table (GDT). Each process 'owns' an LDT. All processes may access the GDT. For furthur info, consult the INTEL Programmers Manual for the desired component. When disk drivers are coded, some attention must be paid to details of addressability. Fsys adopts the principle code and data selectors of the driver process. Unfortunately, due to the architecture of the descriptor tables, the selectors have different values when duplicated in fsys's tables, however refer to the same memory segments. For example, a selector of value 'x0004' in the driver process may be duplicated to 'x0064' in fsys's space. It is important that any code in the driver, which may be executed directly by fsys (via far call or interrupt) NEVER explicitly reference a selector number in its LDT. Fortunately, the Watcom Compiler, wcc, in small model will not generate references to either of the 'code' or 'data' segment, unless explicitly required (to promote a 'near' pointer to a 'far' pointer). Further, by specifying Stack Segment != Data Segment (-zu), it will promote addresses using the current values of the segment registers, rather than using constants for them. On occassion where the driver must promote a pointer, it should do so by the following: MK_FP(my_ds(), pointer_value) 1: The driver as a process: The driver *may* run as a process, a necessity for any driver which operates in an 'io-polled' mode as opposed to interrupt driven, or which requires other system processing (perhaps access to network or other services). Since a driver running as a process will have a separate address space from the Filesystem, accessing cache blocks requires some extra work. The current release of the File System does not support a mechanism to permit sharing the address space very effectively (see DMA controller). The driver may establish addressability by 'arming' Fsys's LDT to it's process (see qnx_segment_arm()). Qnx_segment_info() may then be used to establish the sizeof the LDT, thus gain alias selectors for the FileSystem address space. A future release of the filesystem will support a mechanism to ease this process. 2: Using the DMA controller. DMA buffers, using standard AT dma channels, must be 'DMA-aligned'. That is, the physical memory address cannot span a 64 Kbyte page of memory. This is because the dma-controller is only 16-bits wide, and the upper 8 bits of the physical address are provided by a special page register. If the 16-bit lower address overflows, the page register will not get the carry, thus memory corruption may occur. Some hardware interfaces are not limited by this overflow problem, notably the Adaptec SCSI controllers, IBM PS/2 ESDI and IBM PS/2 SCSI. Using a mechanism in the driver_init() call, 2: the driver interface A disk driver is a process, which may execute based upon it's own scheduling parameters. At some point, the process binds to the filesystem by sending it a message specifying it's desired parameters. There is a standard routine for performing this operation, outlined below: int fsys_mount_driver( char *fsys_name, unsigned short ndevs, struct _disk *device_table, int (**jmp_table)(), unsigned short max_string, unsigned short max_par, void *stack_ptr); This routine finds a process with locally registered name 'fsys_name', and builds a message to bind the driver to this process. The parameter 'ndevs' tells the fsys the maximum number of physical devices attached to this driver. The parameter 'device_table' is a communication area between fsys and the driver. There is assumed to be one structure for each of 'ndevs', with the structure 'struct _disk' described later in this document. The 'jmp_tbl' table contains the entry points to the driver. The structure of this table is described later in this document. The parameter 'max_string' specifies the largest string of sequential io to be performed in one request. This parameter should reflect such needs as the size of any DMA-type buffers, or limitations in the hardware. The parameter 'max_par' specifies how many of the physical devices may be performing IO operations simultaneously. Some devices, such as SCSI Host Adapters, are capable of simultaneously tracking IO request to all attached devices. To increase throughput, the filesystem can attempt to keep all devices busy if it knows the driver is capable of handling multiple requests. The parameter 'stack_ptr' specifies the stack frame to be used when the driver is called by the filesystem. While the process is Reply Blocked on fsys, fsys calls the driver_init and driver_open routines. This allows the driver to perform any necessary configuration steps while it has access to the fsys address space, and it still exists as a process in it's own right. Typically, the driver installs any interrupt handlers and allocates dma-buffers, during driver_init, and determines which devices, and parameters are available during driver_open. 3: Data Structures. The shared structure for a physical disk is as follows: struct _disk { long d_num_blks; unsigned short d_type; unsigned short d_cyls; unsigned short d_sctr_cyl; unsigned short d_heads; unsigned short d_sctr_trk; unsigned short d_sctr_base; unsigned short d_precomp; volatile unsigned short d_busy; volatile unsigned short d_nblks; volatile unsigned short d_retval; unsigned short d_reserved[4]; }; The fields have the following semantics: d_num_blks: The total number of blocks on the media. d_type: The classification of the drive: _UNMOUNTED (0) _FLOPPY (1) _HARD (2) _RAMDISK (3) _REMOVABLE (4) d_cyls: Total number of cylinders. d_sctr_cyl: Total number of sectors per cylinder. d_heads: Total number of heads. d_sctr_trk: Total number of sectors per track. d_sctr_base: The starting sector number (typically 1). d_precomp: Write precompensation cylinder. d_busy: A lock set by the driver informing the filesystem the device is busy. d_nblks: At the termination of an IO request, this field contains the number of blocks successfully transferred. After a driver control request, this field should be set to "1". d_retval: A success/fail code for the request. The following values are defined: _DRVR_OK (0) Request was successful. _DRVR_PART_IO (1) First d_nblks was successful. Next block caused an error. _DRVR_RESERVED (2) Future provision. _DRVR_UNSPEC (-1) General Purpose Failure. _DRVR_ERROR (x8001) Hardware failed temporarily ( ie. floppy door open). _DRVR_DISK_FAILURE(x8002) Hardware failed completely! The purpose of these codes is to aid the fsys in it's caching logic. For example, if there are 100 blocks to write to disk, and the first block returns (_DRVR_ERROR), it makes little sense to attempt the next 99 blocks. Cache Blocks: /* */ struct _block { unsigned short b_next_off; unsigned short b_reserved; long b_block; char __far *b_ptr; }; The cache blocks are a linked list of the above structure. For performance, each list will reside within a 80x86 segment, thus only the offset is required to traverse the list. The following macro will return the next cache block: #define CACHE_NEXT(fp) (((char __far *)fp-FP_OFFSET(fp))+fp->b_next_off) The b_block is the physical disk address of the block. The b_ptr points to where the block resides/will reside in memory. 4: The entry points. The 'jmp_tbl', contains the address of 8 routines, with the following semantics: jmp_tbl[_DRIVER_INITIALIZE] int (*)(pid_t drvr_proxy); This routine is called once, synchronously, when the driver is started. The 'drvr_proxy' should be remembered in the driver's space, as it is the proxy to trigger to wake-up the filesystem at the completion of an IO request. Typically, this routine is used to load an interrupt handler, as the interrupt handler, if the handler must be attached to 'fsys'. Since this routine is run in 'fsys' space, it is a convenient place to allocate DMA buffers, if necessary. This routine should return 0. If the driver returns non-zero, the attaching of the driver will be abandoned. jmp_tbl[_DRIVER_TERMINATE] int (*)(void); This routine is called once, synchronously, to unattach the driver. The driver should release any resources it has acquired, such as dma buffers and attached interrupts. jmp_tbl[_DRIVER_DISK_OPEN] int (*)(short int drive_id, short int protected_mode); This routine is called once for each device, [a]synchronously, to setup the parameters of each device. Any startup processing required for the device should be performed at this step. The call is designed to be asynchronous, but, currently is synchronous. Existing drivers treat the call as asynchronous, but performs a "Receive(driver_proxy,0,0);" before returning. The protected mode flag informs the process whether it is running in segment-translation mode, or real-address mode of the processor. This is useful for locating 'bios-parameter variables' in the setup code. jmp_tbl[_DRIVER_DISK_CLOSE] int (*)(int drive_id); This routine is called once for each device, [a]synchronously. @@@ This allows a device to be 'de-referenced', for what purpose? jmp_tbl[_DRIVER_IO] int (*)(short unsigned disknum, short unsigned request, short unsigned numblocks, struct _block __far *cacheptr); This routine initiates an IO request. The disknum is an index into the 'struct _disk' table passed when mounted. The request is either {READ_BLK(0), WRITE_BLK(1)}. The numblocks informs the driver of the number of sequential blocks to perform I/O on. The cacheptr is a pointer to the first in a chain of cache blocks. This routine must be declared as "#pragma aux (drvr_io) func_name;"ÿ To match the calling convention used by fsys. This routine is asynchronous, thus at completion of the request, a proxy must be triggered to awake fsys. jmp_tbl[_DRIVER_CONTROL] int (*)(short unsigned disknum, short unsigned request, short unsigned numblocks, void __far *cdata); This routine allows the driver to perform a specific service based upon a user-request. The user-request is issued by qnx_ioctl(), the short request value is from there, and the 'cdata' is a 'copy-in,copy-out' 512-byte buffer from/to the user data space. The semantics are the same as a driver_io request, except that driver_control must ALWAYS set d_nblks to 1. 5: Driver Library. int alloc_dma(dma_t *dbuf, pid_t opid, long nbytes) Description: Alloc_dma provides a convenient cover function for allocated DMA alligned buffers from proc. At some point, it should be changed to use 'nbytes' as a "target", and return how close to that "target" that can be achieved. void cache_to_buf(char __far *bufp, BLOCK __far *cp, unsigned nblocks) Description: cache_to_buf copies nblocks from 'cp' to 'bufp', walking the appropriate cache chains. very handy for 'copy-in' to a dma buffer. void buf_to_cache(BLOCK __far *cp, char __far *bufp, unsigned nblocks) Description: Buf_to_cache copyies nblocks from 'bufp' to 'cp', walking the cache chain. Very handy for 'copy-out' from a dma-buffer. int get_cpu_speed(void) Description: returns 'cpu_speed' portion from qnx_os_info. This value is stored in a static for use by 'usec_delay'. Handy for baising timing loops. void usec_delay(int x) Description Busy waits for >= 'x' micro-seconds. It errs on conservative side. May have to be changed if '586 goes super-scalar. int setup_dma(unsigned dflags, dma_t *iobufp, unsigned iosize, int channel) Description: setup_dma programs the dma controller for the appropriate action. The Action is incoded in dflags as the OR of the following manifests: DMA_PS2: use PS/2 specific DMA. DMA_WORD: iosize is in 16-bit quantities. DMA_OUTPUT: the action is OUPUT, (to io port) rather than INPUT. The phys_addr portion of '*iobufp' is loaded into the controller, the terminal count is defined by iosize, and the channel (0-7) is set. The DMA controller will now respond to DMA-REQUESTS on the desired channel. int clear_dma(unsigned dflags, int channel) Description: Clears the given channel. The only flag of interest is "DMA_PS2". This should be performed if an operation is being terminated, to return the dma controller to a known state (idle). long phys_addr(void __far *bufp) Description: Returns the physical address of the object pointed to by 'bufp'. Currently, this routine uses a call to 'proc', but this can easily be changed to deref the ldt/gdt for the value. int get_cmos(int addr) Description: Return entry 'addr' from the cmos memory. This is a package of routines for dumping info to the screen. Very handy from interrupt handlers that must be cautious of affecting timing or making kernel calls. int set_display(void __far *videoptr, int nrows, int ncols) Description: Define location of screen, and the number of rows and columns to use. the 'screen' is allways assumed to be 80 colums (80*2 bytes), but the 'ncols' allows you to specify only the corner of the screen to use, and specifying an offset in 'videoptr', you can move horizontally. Setting 'videoptr' to NULL will effectively disable the display. int set_attr(int attr) Description: Use 'attr' when printing on the screen. int disp_char(int c) Description: Display character c, then move to next location on screen. The screen wraps horizontally and vertically. Newline ('\n') is interpreted to clear the remainder of the line and move to the next line. int disp_str(char *s) Description: Display the 'null -terminated' string 's'. int disp_hex(int h) Description: Display the value 'h' in hex, with one trailing space. int disp_lhex(long h) Description: Display the long value 'lh' in hex, with one trailing space.