SSeettttiinngg uupp tthhee FFoouurrtthh BBeerrkkeelleeyy SSooffttwwaarree TTaappee** DDrraafftt ooff:: NNoovveemmbbeerr 1155,, 11998800 _W_i_l_l_i_a_m _N_. _J_o_y _O_z_a_l_p _B_a_b_a_o_g_l_u _K_e_i_t_h _S_k_l_o_w_e_r Computer Science Division Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, California 94720 The basic distribution tape can be used only on a DEC VAX-11/780** with RM03, RM05 or RP06 disks and with TE16, TU45 or TU77 tape drives. We have the ability to make tapes for systems with UNIBUS** disks, but such tapes are inher- ently rather system-specific, and will not be discussed here. The tape consists of some preliminary bootstrapping programs followed by one dump of a filesystem (see _d_u_m_p(8)|-|-) and one tape archive image (see _t_a_r(1)); if needed, individual files can be extracted after the initial construction of the filesystems. If you are set up to do it, it is a good idea immedi- ately to make a copy of the tape to guard against disaster. The tape is 9-track 1600 BPI and contains some 512-byte records followed by many 10240-byte records. There are interspersed tapemarks; end-of-tape is signalled by a double end-of-file. The tape contains binary images of the system and all the user level programs, along with source and manual sec- tions for them. There are about 5600 UNIX|- files alto- gether. The first tape file contains bootstrapping pro- grams. The second tape file is to be put on one filesystem called the `root filesystem', and contains essential bina- ries and enough other files to allow the system to run. The third tape file has all of the source and documentation. Altogether the files provided on the tape occupy approxi- mately 52000 512 byte blocks.|= ----------- *Portions of this document are adapted from ``Set- ting Up Unix/32V Version 1.0'' by Thomas B. London and John F. Reiser. ** DEC, VAX, UNIBUS and MASSBUS are trademarks of Digital Equipment Corporation. |- UNIX is a Trademark of Bell Laboratories. |-|-References of the form X(Y) mean the subsection named X in section Y of the UNIX programmer's man- ual. |=UNIX traditionally talks in terms of 512 charac- -2- MMaakkiinngg aa ddiisskk ffrroomm ttaappee Before you begin to work on the remainder of this docu- ment, be sure you have an up to date manual, and that you have applied all updates to the manual which were provided with it, in the correct order. Perform the following bootstrap procedure to obtain a disk with a root filesystem on it. 1. Mount the magtape on drive 0 at load point, making sure that the ring is not inserted. 2. Mount a disk pack on drive 0. It is preferable that the pack be formatted (e.g. using the standard DEC standalone utility); we provide programs for formatting RM03's and RP06's below, but RM05's must be pre-format- ted by the DEC utility. 3. Key in at location 50000 and execute the following boot program: You may enter in lower-case, the LSI-11 will echo in upper-case. The machine's printouts are shown in boldface, explanatory comments are within ( ). Ter- minate each line you type by carriage return or line- feed. ----------- ter blocks, and for consistency across different versions of UNIX and to avoid mass confusion, user programs in the Virtual Vax version of the system also talk in terms of 512 blocks, despite the fact that the file system allocates 1024 byte blocks of disk space. All user progras such as _l_s(1) and _d_u(1) speak in terms of 512 byte blocks; only sys- tem maintenance programs such as _m_k_f_s(8), _f_s_c_k(8) _d_u_m_p(8), and _d_f(1), speak to 1024 byte blocks. It is true that i/o is most efficient in 1024 byte quantities, but it is most natural for the user to think of this as ``2 blocks at a time.'' In any case, packs remain sectored 512 bytes per sector, and at the lowest driver levels the system deals with 512 byte disk records. -3- >>>>>>HALT >>>>>>UNJAM >>>>>>INIT >>>>>>D 50000 20009FDE >>>>>>D+ D0512001 >>>>>>D+ 3204A101 >>>>>>D+ C114C08F >>>>>>D+ A1D40424 >>>>>>D+ 008FD00C >>>>>>D+ C1800000 >>>>>>D+ 8F320800 >>>>>>D+ 10A1FE00 >>>>>>D+ 00C139D0 >>>>>>D+ 00000004 >>>>>>E 50000/NE:A ... (machine prints out values, check typing) >>>>>>START 50000 The tape should move and the CPU should halt at loca- tion 5002A. If it doesn't, you probably have entered the program incorrectly. Start over and check your typing. 4. Start the CPU with >>>>>>START 0 5. The console should type == If you have an RM-05 you must already have a formatted pack, and should skip to step 7. If the disk pack is otherwise already formatted, skip to step 6. Other- wise, format the pack with: (bring in standalone RP06 formatter) ==rp6fmt ffoorrmmaatt :: FFoorrmmaatt RRPP0066//RRMM0033 DDiisskk MMBBAA nnoo.. :: 0 (format spindle on mba uunniitt :: 0 (format unit zero) (this procedure should take about 20 minutes for an RP06, 10 for an RM03) (some diagnostic messages may appear here) uunniitt :: -1 (exit from formatter) == (back at tape boot level) 6. Next, verify the readability of the pack via -4- (bring in RP06 verifier) ==rpread ddrreeaadd :: RReeaadd RRPP0066//RRMM0033 DDiisskk ddiisskk uunniitt :: 0 (specify unit zero) ssttaarrtt bblloocckk :: 0 (start at block zero) nnoo.. bblloocckkss :: (default is entire pack) (this procedure should take about 10 minutes for a RP06) (some diagnostic messages may appear here) ## DDaattaa CChheecckk eerrrroorrss :: nnnn (number of soft errors) ## OOtthheerr eerrrroorrss :: xxxx (number of hard errors) ddiisskk uunniitt:: -1 (exit from rpread) == (back to tape boot) If the number of `Other errors' is not zero, considera- tion should be given to obtaining a clean pack before proceeding further. 7. Create the root file system with the following proce- dure: (bring in a standalone version of the _m_k_f_s (8) program) ==mkfs ffiillee ssyyss ssiizzee:: 7942 (number of 1024 byte blocks in root) ffiillee ssyysstteemm:: hp(0,0) (root is on drive zero; first filsys there) iissiizzee == 55007722 (number of inodes in root filesystem) mm//nn == 33 550000 (interleave parameters) == (back at tape boot level) You now have a empty UNIX root filesystem. To restore the data which you need to boot the system, type (bring in a standalone _r_e_s_t_o_r(8) program) ==restor TTaappee?? ht(0,1) (unit 0, second tape file) DDiisskk?? hp(0,0) (into root file system) LLaasstt cchhaannccee bbeeffoorree ssccrriibbbblliinngg oonn ddiisskk.. (just hit return) (30 second pause then tape should move) (tape moves for a few minutes) EEnndd ooff ttaappee == (back at tape boot level) Now, you are ready to boot up BBoooottiinngg UUNNIIXX Now boot UNIX: (load bootstrap program) ==boot BBoooott :: hp(0,0)vmunix (bring in _v_m_u_n_i_x off root system) -5- The bootstrap should then print out the sizes of the differ- ent parts of the system (text, initialized and uninitialized data) and then the system should start with a message which looks (like): 8877884444++1155446644++113300330000 ssttaarrtt 00xx553300 VVMM//UUNNIIXX ((BBeerrkkeelleeyy VVeerrssiioonn 44..11)) 1111//1100//8800 rreeaall mmeemm == _x_x_x aavvaaiill mmeemm == _y_y_y WWAARRNNIINNGG:: pprreeppoosstteerroouuss ttiimmee iinn ffiillee ssyysstteemm ---- CCHHEECCKK AANNDD RREESSEETT TTHHEE DDAATTEE!! EERRAASSEE IISS CCOONNTTRROOLL--HH!!!!!! ## The _m_e_m messages give the amount of real (physical) memory and the memory available to user programs in bytes. For example, if your machine has only 512K bytes of memory, then xxx will be 524228, i.e. exactly 512K. The ``ERASE-IS'' message is part of /.profile which was executed by the root shell when it started. You will probably want to change /.profile somewhat. UNIX is now running, and the `UNIX Programmer's manual' applies. The `#' is the prompt from the Shell, and indi- cates you are the super-user. You should first check the integrity of the root file system by giving the command ## fsck /dev/rrp0a The output from _f_s_c_k should look something like: //ddeevv//rrrrpp00aa FFiillee SSyysstteemm:: VVoolluummee:: **** CChheecckkiinngg //ddeevv//rrrrpp00aa **** PPhhaassee 11 -- CChheecckk BBlloocckkss aanndd SSiizzeess **** PPhhaassee 22 -- CChheecckk PPaatthhnnaammeess **** PPhhaassee 33 -- CChheecckk CCoonnnneeccttiivviittyy **** PPhhaassee 44 -- CChheecckk RReeffeerreennccee CCoouunnttss **** PPhhaassee 55 -- CChheecckk FFrreeee LLiisstt 228822 ffiilleess 11776666 bblloocckkss 55667777 ffrreeee If there are inconsistencies in the file system, you may be prompted as to whether to apply corrective action; see the document describing _f_s_c_k for information. EExxttrraaccttiinngg //uussrr.. The next thing to do is to extract the rest of the data from the tape. Comments are enclosed in ( ); don't type these. The number in the first command is the size of the filesystem to be created, in 1024 character blocks, just as given to the standalone version of _m_k_f_s above. (If you have an RM-03 rather than an RM-05 or RP-06 use ``41040'' rather -6- than ``145673'' in the procedure below.) ##date _y_y_m_m_d_d_h_h_m_m (set date, see _d_a_t_e(1)) ##passwd root (set password for super-user) NNeeww ppaasssswwoorrdd:: (password will not echo) RReettyyppee nneeww ppaasssswwoorrdd:: ##/etc/mkfs /dev/rrp0g 145673 (create empty user filesystem) iissiizzee == 6655448888 (this is the number of available inodes) mm//nn == 33 550000 (freelist interleave parameters) (this takes a few minutes) ##/etc/mount /dev/rp0g /usr (mount the usr filesystem) ##cd /usr (make /usr the current directory) ##cp /dev/rmt12 /dev/null (skip first tape file (tp format)) ##cp /dev/rmt12 /dev/null (skip second tape file (root)) ##tar xpb 20 (extract the usr filesystem) (this takes about 20 minutes) ##rmdir lost+found ##/etc/mklost+found (a directory for _f_s_c_k) ##dd if=/usr/mdec/uboot of=/dev/rrp0a bs=1b count=1 (write boot block so _r_e_b_o_o_t(8) disk boot scheme will work) ##cd / (back to root) ##chmod 755 / /usr ##/etc/umount /dev/rp0g (unmount /usr) All of the data on the tape has now been extracted. The tape will rewind automatically. You should now check the consistency of the /usr file system by doing ## fsck /dev/rrp0g In order to use the /usr file system, you should now remount it by saying ## /etc/mount /dev/rp0g /usr MMaakkiinngg aa UUNNIIXX bboooott ffllooppppyy The next thing to do is to make a new UNIX boot floppy, by adding some files to a copy of your current console floppy, using _f_l_c_o_p_y and _a_r_f_f(8). Place your current floppy in the console, and issue the following commands: -7- ## cd /usr/src/sys/floppy ## mkdir fromdec (scratch sub-directory) ## cd fromdec ## arff xv (extract all files from floppy) (list of files prints out) ## flcopy -t3 (system reads header information off current floppy) CChhaannggee FFllooppppyy,, HHiitt rreettuurrnn wwhheenn ddoonnee.. (waits for you to put clean floppy in console) (copies header information back out after you hit return) ## rm floppy (don't need copy of old header information) ## rm dm* db* vmb.exe (don't need these files with UNIX) ## arff cr * (add basic files) AArree yyoouu ssuurree yyoouu wwaanntt ttoo cclloobbbbeerr tthhee ffllooppppyy?? yes (clobbering is, essentially, a mmkkffss) ## cd .. ## rm -r fromdec (remove scratch directory) ## arff r * (add UNIX boot files to floppy) More copies of this floppy can be made using ffllccooppyy.. You should now be able to reboot using the procedures in _r_e_b_o_o_t(8). First you should turn on the auto-reboot switch on the machine. Then try a reboot, saying ## /etc/reboot -s which you can read about in _r_e_b_o_o_t(8). TTaakkiinngg tthhee ssyysstteemm uupp aanndd ddoowwnn Normally the system reboots itself and no intervention is needed at the console command level (e.g. to the LSI-11 ``>>>'' prompt.)|- In fact, after such a reboot, the system normally performs a reboot checking the disks, and then goes back to multi-user mode. Such a reboot can be stopped (after it prints the date) with a delete (interrupt). If booting from the console command level is needed, then the command >>>>>> B RPS will boot from unit 0 on mba 0 (RM-03, RM-05 or RP-06), bringing in the file ``hp(0,0)vmunix''. Other possibilities are ``B RPM'' which boots and runs the automatic reboot pro- cedure or ``B ANY'' which boots and asks for the name of the ----------- |-If you are going to make your root device be something other than a MASSBUS disk, you will have to change the file RESTAR.CMD on the floppy, to pass the block device index of the major device to be booted from to the system in a register after, e.g., a power-fail. -8- file to be booted. Note that ``B'' with no arguments is used internally by the reboot code, and shouldn't be used. To bring the system up to a multi-user configuration from the single-user status after, e.g., a ``B RPS'' all you have to do is hit control-d on the console. The system will then perform /etc/rc, a multi-user restart script, and come up on the terminals which are indicated in /etc/ttys. See _i_n_i_t(8) and _t_t_y_s(5). Note, however, that this does not cause a file system check to be performed. Unless the sys- tem was taken down cleanly, you should run ``fsck -p'' or force a reboot with _r_e_b_o_o_t(8) to have the disks checked. To take the system down to a single user state you can use ## kill 1 when you are up multi-user. This will kill all processes and give you a shell on the console, as if you had just booted. If you wish to change the terminal lines which are active you can edit the file /etc/ttys, changing the first characters of lines, and then do ## kill -1 1 See _i_n_i_t(8) and _g_e_t_t_y(8) for more information. BBaacckkiinngg uupp tthhee ssyysstteemm Before you change the source for the kernel it is wise to make a backup. The following will do this: ## cd /usr/src ## mkdir distsys distsys/h distsys/sys distsys/dev distsys/conf distsys/stand ## cd sys/sys ## cp * /usr/src/distsys/sys ## cd ../h ## cp * /usr/src/distsys/h ## cd ../dev ## cp * /usr/src/distsys/dev ## cd ../conf ## cp * /usr/src/distsys/conf ## cd ../stand ## cp * /usr/src/distsys/stand This allows you to find out what you have done to the dis- tribution system by later running the command _d_i_f_f(1), com- paring these directories. -9- OOrrggaanniizzaattiioonn The system source is kept in the subdirectories of /usr/src/sys. The directory ssyyss contains the mainline ker- nel code, implementing system calls, the file system, vir- tual memory, etc. The directory ddeevv contains device drivers and other low-level routines. The directory ccoonnff contains CPU-dependent information on physical and logical device configuration. The directory hh contains all the header files, defining structures and system constants. NN..BB..:: TThhee ssyysstteemm hheeaaddeerr ffiilleess iinn //uussrr//ssrrcc//ssyyss//hh aarree ccooppiieess ooff tthhee ffiilleess iinn //uussrr//iinncclluuddee//ssyyss.. SSiinnccee pprrooggrraammss wwhhiicchh ddeeppeenndd oonn ccoonnssttaannttss iinn //uussrr//iinncclluuddee//ssyyss//ppaarraamm..hh mmuusstt ccoorrrreessppoonndd ttoo tthhee rruunnnniinngg ssyysstteemm,, yyoouu sshhoouulldd bbee ccaarreeffuull ttoo mmaakkee nneeww hheeaaddeerr ffiilleess aavvaaiillaabbllee wwhheenneevveerr yyoouu rreessiizzee tthhee ssyysstteemm oorr ootthheerrwwiissee cchhaannggee tthhee hheeaaddeerr ffiilleess.. You should expect to have to make some changes to the ccoonnff directory and perhaps to some of the header files in the hh directory to resize some things. The conf, sys and dev directories each have their own _m_a_k_e_f_i_l_e controlling recompilation. The ssyyss makefile is the master makefile; new versions of the kernel are created as ``vmunix'' in the ssyyss directory. If changes are made in other directories, then the following commands: ## cd ../sys ## rm vmunix ## make will cause all needed recompilations to be done. It is often convenient when making changes in ddeevv or ccoonnff to just run _m_a_k_e there first. Finally note that the directory ssttaanndd which is not part of the system proper. If you add new peripherals such as tapes or disks you will want to extend the standalone system to be able to deal with these. It too, has a _m_a_k_e_f_i_l_e which you can look at. IIssoollaattiinngg llooccaall cchhaannggeess ccoonnddiittiioonnaallllyy You will notice that the system, as distributed, has conditional code in it. The current Berkeley system, on ``Ernie Co-vax'' is made by defining IDENT in the mmaakkeeffiillees to be IDENT= -DERNIE -DUCB This enables the conditional code both for Berkeley and for this particular machine. It is traditional to pick a mon- icker for your machine, and change IDENT to reflect it, and to then put in changes conditionally whenever this makes -10- sense. You can be guided by the ERNIE conditional code, which you will probably want to disable. DDeevviiccee ddrriivveerrss The UNIX system running is configured to run with the given disk and tape, a console, 32 DZ11 lines, 32 DH11 lines, a Varian printer/plotter, a Versatec printer/plotter, and 2 AMPEX 9300 disks on an EMULEX controller on the UNIBUS. This is probably not the correct configuration. It is easy to correct the configuration information to reflect the true state of your machine. For each device driver there there are certain magic numbers and configuration parameters kept in header files (suffixed ..hh) in the ccoonnff directory that you can change. The device addresses of each device are defined there, as are other specifications such as the number of devices. You should edit each ..hh file in ccoonnff change them as appropriate for your machine. If you have any non-standard device addresses, just change the address and recompile. If the devices's inter- rupt vector address(es) are different from those currently known to the system (this is likely), then the file /usr/src/sys/conf/univec.c must be modified appropriately: namely, the proper interrupt routine addresses must be placed in the table `UNIvec'. Now, make sure you add any new drivers which you have to the list of DRIVERS in the mmaakkeeffiillee in ddeevv, and to the FILES and CFILES variables in this mmaakkeeffiillee and FILES3 and CFILES3 in ....//ssyyss//mmaakkeeffiillee. As describe in ``Basics of disk layout'' below, the first line of the makefile in the ssyyss directory controls the root device and swap disk layout by picking one of the confhp.c, confup.c, etc. files of the ccoonnff directory. This should be chosed to reflect the desired disk layout before the system is recompiled. NNoonn--iinntteeggrraatteedd ddeevviiccee ddrriivveerrss As of this date, several device drivers which were available at various times have not been integrated into the conditional compilation of the ddeevv and ccoonnff directories. The source for these device drivers is in the directory //uussrr//ssrrcc//ssyyss//nneewwddeevv.. These device drivers will be made part of //uussrr//ssrrcc//ssyyss//ddeevv and //uussrr//ssrrcc//ssyyss//ccoonnff as we have oppor- tunity to test them. It should not be hard for you to inte- grate these drivers; if you wish, you can contact us as we will be able to supply integrated versions very soon. Here is a list of the drivers which are currently awaiting inte- gration: -11- dhdm DM11 modem control driver kl KL/DL-11 communications driver lp LP11 line printer driver rk7 RK07 driver tm TM11 unibus tape driver In addition, there is another rk07 driver (hk.c). Code to allow mixing of tapes and disks among and across several MBA's is also in the works, but is not supplied on this tape. Contact us if you have need for this code. If you integrate these or other drivers we would appre- ciate receiving a copy in the mail so that we can avoid duplication of effort. (Other comments on the system or these instructions are, of course, also welcomed.) MMaakkeeffiillee eennttrryy ppooiinnttss The _m_a_k_e_f_i_l_es have several additional useful entry points: clean Cleans out the directory, removing ..oo files and the like. lint (In ssyyss) Runs lint on the system. depend Creates a new makefile indicating dependen- cies on header files by running a search through ..cc files looking for ``#include'' lines. Make sure you format your code like the rest of the system so that this will work. print (In ssyyss) Produces a nice listing of most everything in the system directory in a canonical order. symbols.sort (In ssyyss) Creates a new file for sorting sym- bols in the system namelist. If you have locally written programs which use the system namelist you can put the symbols which they reference in _s_y_m_b_o_l_s_._r_a_w and they will be moved at system generation to the front of the system namelist for quicker access. tags Creates a _t_a_g_s file for _e_x_, to make editing of the system much easier. BBaassiicc ccoonnssttaannttss Before running _m_a_k_e_, you should check the definition of the constants in /usr/src/sys/h/param.h The constants NBUF, NINODE, NFILE, NPROC, and NTEXT can be changed, and also -12- TIMEZONE and perhaps HZ if you run on 50 cycles.|- There are also tunable constants in the file /usr/src/sys/h/vmtune.h but ignore them for the time being. As distributed, the system is tuned for a fairly large machine (i.e. 2+ Megabytes of memory and 2-3 disk arms). To generate a completely recompiled UNIX do ## cd /usr/src/sys/sys ## make clean ; cd ../dev ; make clean ; cd ../conf ; make clean ## cd ../sys ## make depend ; cd ../dev ; make depend ; cd ../conf ; make depend ## cd ../sys ## make -k > ERRS 2>& 1 & (compilation will finish about 15 minutes later leaving output in ERRS) The final object file (vmunix) should be moved to the root, and then booted to try it out. It is best to name it /newvmunix so as not to destroy the working system until you're sure it does work. It is also a good idea to keep the old working version around under some other name. A systematic scheme for numbering and saving old versions of the system is best. BBee ssuurree ttoo aallwwaayyss hhaavvee tthhee ccuurrrreenntt ssyyss-- tteemm iinn //vvmmuunniixx wwhheenn yyoouu aarree rruunnnniinngg mmuullttii--uusseerr oorr ccoommmmaannddss ssuucchh aass _p_s(1) aanndd _w(1) wwiillll nnoott wwoorrkk.. SSppeecciiaall FFiilleess Next you should remove any unnecessary device entries from the directory //ddeevv,, as devices were provided for all the things initially configured into the system. See how the devices were made using MMAAKKEE in the directory //ddeevv,, and make another directory //nneewwddeevv,, copy MMAAKKEE into it, edit MMAAKKEE to provide an entry for local needs, and rerun it to gener- ate a //nneewwddeevv directory. You can then do ## cd / ## mv dev olddev ; mv newdev dev If you prefer, you can just whittle away at //ddeevv using rrmm,, mmvv and _m_k_n_o_d(8) to make what you need, but if you do this stuff manually you may have to do it manually again someday, and the devices will appear much more ``magic'' to those who follow you. ----------- |- If you change NINODE, NFILE, NPROC or NTEXT, then the programs _a_n_a_l_y_z_e(1), _p_s(1), _p_s_t_a_t(1) and _w(1) will have to be recompiled. A procedure for doing this is given below. -13- NNootteess oonn tthhee ccoonnffiigguurraattiioonn ffiillee aanndd ddeevviiccee nnaammeess Print the configuration file /usr/src/sys/conf/conf.c. This is the major device switch of each device class (block and character). There is one line for each device config- ured in your system and a null line for place holding for those devices not configured. The essential block special files were installed above; for any new devices, the major device number is selected by counting the line number (from zero) of the device's entry in the block configuration ta- ble. Thus the first entry in the table bdevsw would be major device zero. This number is also printed in the table along the right margin. The minor device is the drive number, unit number or partition as described under each device in section 4. For tapes where the unit is dial selectable, a special file may be made for each possible selection. You can also add entries for other devices. Each device is typically given several special files in //ddeevv.. The name is made from the name of the device driver, sometimes varying for historical reasons. Thus the ``up'' disk driver has disks ``up0a'', ``up0b'', etc., the parti- tions of drive 0 being given letters a-h. The tape drives are given names not dependent on the tape driver hardware, since they are used directly by a num- ber of programs. The file mmtt00 is a 1024 byte blocked tape drive at 800 bpi; mmtt88 is 1600 bpi. By adding 4 to the unit number you get non-rewinding tapes. The disk and magtape drivers provide a `raw' interface to the device which provides direct transmission between the user's core and the device and allows reading or writing large records. The raw device counts as a character device, and conventionally has the name of the corresponding stan- dard block special file with `r' prepended. Thus the raw magtape files are called /dev/rmtX. The names for terminals should be named /dev/ttyX, where X is some string (as in `0' or `d0'). While it is possible to use truly arbitrary strings here, the accounting and noticeably the _p_s(1) command make good use of the fact that tty names (at Berkeley) are distinct in the last 2 characters. In fact, we use the following convention: ``ttyN'', with N the minor device number for normal DZ ports; ``ttydX'' with X a single hex digit (starting from 0) for dialups, ``ttyhX'' and ``ttyiX'' with X a hex digit for dh ports, and ``console'' (abbrev ``co'') for the console. This works out well and _p_s(1) uses a heuristic algorithm based on these conventions to speed up name determination from device numbers it otherwise obtains. -14- Whenever special files are created, care should be taken to change the access modes _(_c_h_m_o_d(8)) on these files to appropriate values. BBaassiiccss ooff DDiisskk LLaayyoouutt If there are to be more filesystems mounted than just the root and /usr, use _m_k_f_s(8) and _m_k_l_o_s_t_+_f_o_u_n_d(8) to create any new filesystem and put the information about it into the file //eettcc//ffssttaabb (see _f_s_t_a_b(5)). You should look also at /etc/rc to see what commands are executed there and investi- gate how ffssttaabb is used. Each physical disk drive can be divided into upto 8 partitions; we typically use only 3 partitions (or 4 on 300M drives). The first partition, e.g. rrpp00aa is used for a root file system, a backup thereof, or a small file system like //ttmmpp.. The second partition rrpp00bb is used for paging and swapping. The third partition rrpp00gg is used to hold a user file system. We partition our 300M disks so that almost all of a large disk will fit onto a 200M disk, just dropping the last partition. This makes it easier to deal with hardware failures by just copying data. There are several considerations in deciding how to adjust the arrangement of things on your disks: the most important is making sure there is adequate space for what is required; secondarily, throughput should be maximized. Pag- ing space is an important parameter. The system as distrib- uted has 33440 (512 byte) blocks in which to page on the primary device; additional devices may be provided, with the paging interleaved between them. This is controlled by entries in //eettcc//ffssttaabb and in a configuration file in the ccoonnff directory. The first line of the mmaakkeeffiillee in the ssyyss directory selects one a file such as ccoonnffrrpp..cc in the conf directory. This file specifies the root and pipe devices and the devices which are to be used for swapping and paging. Each device which is to be used for paging (except the primary one) should be specified as a ``sw'' device in ffssttaabb.. Many common system programs (C, the editor, the assem- bler etc.) create intermediate files in the /tmp directory, so the filesystem where this is stored also should be made large enough to accommodate most high-water marks; if you have several disks, it makes sense to mount this in one of the other ``root'' (i.e. first partition) file systems. The root filesystem as distributed is quite large, and there should be no problem. All the programs that create files in /tmp take care to delete them, but most are not immune to events like being hung up upon, and can leave dregs. The directory should be examined every so often and the old files deleted. -15- Exhaustion of user-file space is certain to occur now and then; the only mechanisms for controlling this phenome- non are occasional use of _d_u(1), _d_f(1), _q_u_o_t(8), threatening messages of the day, and personal letters. The efficiency with which UNIX is able to use the CPU is often strongly affected by the configuration of disk con- trollers. For general time-sharing applications, the best strategy is to try to split the root filesystem (/), system binaries (/usr), the temporary files (/tmp), and the user files among several disk arms, and to interleave the paging activity among a number of arms. We will discuss such con- siderations more below. MMoovviinngg ffiilleessyysstteemm ddaattaa Once you have decided how to make best use of your hardware, the question is how to initialize it. If you have the equipment, the best way to move a filesystem is to dump it to magtape using _d_u_m_p(8), to use _m_k_f_s(8) and _m_k_l_o_s_t_+_f_o_u_n_d(8) to create the new filesystem, and restore (_r_e_s_t_o_r(8)) the tape. If for some reason you don't want to use magtape, dump accepts an argument telling where to put the dump; you might use another disk. Sometimes a filesys- tem has to be increased in logical size without copying. The super-block of the device has a word giving the highest address which can be allocated. For relatively small increases, this word can be patched using the debugger (_a_d_b(1)) and the free list reconstructed using _f_s_c_k(8). The size should not be increased very greatly by this technique, however, since although the allocatable space will increase the maximum number of files will not (that is, the i-list size can't be changed). Read and understand the description given in _f_i_l_s_y_s(5) before playing around in this way. If you have to merge a filesystem into another, exist- ing one, the best bet is to use _t_a_r(1). If you must shrink a filesystem, the best bet is to dump the original and restor it onto the new filesystem. However, this will not work if the i-list on the smaller filesystem is smaller than the maximum allocated inode on the larger. If this is the case, reconstruct the filesystem from scratch on another filesystem (perhaps using _t_a_r(1)) and then dump it. If you are playing with the root filesystem and only have one drive the procedure is more complicated. What you do is the fol- lowing: 1. GET A SECOND PACK!!!! 2. Dump the root filesystem to tape using _d_u_m_p(8). 3. Bring the system down and mount the new pack. -16- 4. Load the standalone versions of _m_k_f_s(8) and _r_e_s_t_o_r(8) from the floppy with a procedure like: >>>>>>UNJAM >>>>>>INIT >>>>>>LOAD MKFS LOAD DONE, xxxx BYTES LOADED >>>>>>ST 2 ... >>>>>>H HALTED AT yyyy >>>>>>U >>>>>>I >>>>>>LOAD RESTOR LOAD DONE, zzzz BYTES LOADED ... etc 5. Boot normally using the newly created disk filesystem. Note that if you change the disk partition tables or add new disk drivers they should also be added to the stand- alone system in /usr/src/sys/stand. SSyysstteemm IIddeennttiiffiiccaattiioonn You should edit the files: /usr/include/ident.h /usr/include/whoami.h /usr/include/whoami /usr/src/cmd/uucp/uucp.h to correspond to your system, and then recompile and install _g_e_t_t_y(8), _b_i_n_m_a_i_l(1), _w_h_o(1) and _u_u_c_p(1) via: ## cd /usr/src/cmd ## DESTDIR=/ ## export DESTDIR ## MAKE getty.c mail.c who.c uucp This will arrange for an appropriate banner to be printed on terminals before users log in, and also arrange that _u_u_c_p knows what site you are. The program _d_e_l_i_v_e_r_m_a_i_l(8) needs to know the topology of your network configuration to be able to forward mail properly; if you are on any networks other than _u_u_c_p be sure to print the file //uussrr//ssrrcc//ccmmdd//ddeelliivveerrmmaaiill//RREEAADD__MMEE,, edit the appropriate tables, and recompile and install delivermail as you did uuuuccpp above. -17- If you run the Berkeley network _n_e_t(1) be sure to change the constant LOCAL in /usr/src/cmd/ucbmail/v7.local.h and recompile when you bring up a network machine. AAddddiinngg NNeeww UUsseerrss See _a_d_d_u_s_e_r(8); local needs will undoubtedly dictate a somewhat different procedure. MMuullttiippllee UUsseerrss If UNIX is to support simultaneous access from more than just the console terminal, the file /etc/ttys (_t_t_y_s(5)) has to be edited. To add a new terminal be sure the device is configured and the special file exists, then set the first character of the appropriate line of /etc/ttys to 1 (or add a new line). You should also edit the file /etc/ttytype placing the type of the new terminal there (see _t_t_y_t_y_p_e(5)). The file /etc/ttywhere is also a useful one to keep up to date. Note that /usr/src/cmd/init.c and /usr/src/cmd/comsat.c will have to be recompiled if there are to be more than 100 terminals. Also note that if the special file is inaccessi- ble when _i_n_i_t tries to create a process for it, the system will thrash trying and retrying to open it. FFiillee SSyysstteemm HHeeaalltthh Periodically (say every week or so in the absence of any problems) and always (usually automatically) after a crash, all the filesystems should be checked for consistency by _f_s_c_k(1). The procedures of _r_e_b_o_o_t(8) should be used to get the system to a state where a file system check can be performed manually or automatically. Dumping of the filesystems should be done regularly, since once the system is going it is very easy to become complacent. Complete and incremental dumps are easily done with _d_u_m_p(8). You should arrange to do a towers-of-hanoi dump sequence; we tune ours so that almost all files are dumped on two tapes and kept for at least a week in most every case. We take full dumps every month (and keep these indefinitely). Operators can execute ``dump w'' at login which will tell them what needs to be dumped (based on the ffssttaabb information). Be sure to create a group ooppeerraattoorr in the file //eettcc//ggrroouupp so that dump can notify logged-in opera- tors when it needs help.|- ----------- |- More precisely, we have three sets of dump tapes: 10 daily tapes, 5 weekly sets of 2 tapes, and fresh sets of three tapes monthly. We do daily dumps circularly on the daily tapes with sequence `3 2 5 4 7 6 9 8 9 9 9 ...'. Each weekly -18- Dumping of files by name is best done by _t_a_r(1) but the number of files is somewhat limited. Finally if there are enough drives entire disks can be copied with _d_d(1) using the raw special files and an appropriate block size. CCoonnvveerrttiinngg 3322//VV FFiilleessyysstteemmss The best way to convert filesystems from 32/V to the new format is to use _t_a_r(1). After converting, you can still restore files from your old-format dump tapes (yes the dump format is different, sorry about that), by using ``512restor'' instead of ``restor''. If you wish, you can move whole file systems from 32/V to the new system by using ``dump'' and then ``512restor''. RReeggeenneerraattiinngg tthhee ssyysstteemm It is quite easy to regenerate the system, and it is a good idea to try this once right away to build confidence. The system consists of three major parts: the kernel itself (/usr/src/sys/sys), the user programs (/usr/src/cmd and sub- directories), and the libraries (/usr/src/lib*). The major part of this is /usr/src/cmd. ----------- is a level 1 and the daily dump sequence level restarts after each weekly dump. Full dumps are level 0 and the daily sequence restarts after each full dump also. Thus a typical dump sequence would be: tape name level number date opr size ----------------------------------------------------- FULL 0 Nov 24, 1979 jkf 137K D1 3 Nov 28, 1979 jkf 29K D2 2 Nov 29, 1979 rrh 34K D3 5 Nov 30, 1979 rrh 19K D4 4 Dec 1, 1979 rrh 22K W1 1 Dec 2, 1979 etc 40K D5 3 Dec 4, 1979 rrh 15K D6 2 Dec 5, 1979 jkf 25K D7 5 Dec 6, 1979 jkf 15K D8 4 Dec 7, 1979 rrh 19K W2 1 Dec 9, 1979 etc 118K D9 3 Dec 11, 1979 rrh 15K D10 2 Dec 12, 1979 rrh 26K D1 5 Dec 15, 1979 rrh 14K W3 1 Dec 17, 1979 etc 71K D2 3 Dec 18, 1979 etc 13K FULL 0 Dec 22, 1979 etc 135K We do weekly's often enough that daily's always fit on one tape and in fact never get to the sequence of 9's in the daily level numbers. -19- We have already seen how to recompile the system itself. The three major libraries are the C library in /usr/src/libc and the FORTRAN libraries /usr/src/libI77 and /usr/src/libF77. In each case the library is remade by changing into the corresponding directory and doing ## make and then installed by ## make install Similar to the system, ## make clean cleans up. The source for all other libraries is kept in subdirectories of /usr/src/lib; each has a makefile and can be recompiled by the above recipe. Recompiling all user programs in /usr/src/cmd is accom- plished by using the MAKE shell script which resides there and its associated file DESTINATIONS. For instance, to recompile ``date.c'', all one has to do is ## cd /usr/src/cmd ## MAKE date.c this will place a stripped version of the binary of ``date'' in /4bsd/bin/date, since date normally resides in /bin, and Admin is building a file-system like tree rooted at /4bsd You will have to make the directory 4bsd for this to work. It is possible to use any directory for the destination, it isn't necessary to use the default /4bsd; just change the instance of ``4bsd'' at the front of MAKE. You can also override the default target by doing: ## DESTDIR=_p_a_t_h_n_a_m_e ## export DESTDIR To regenerate all the system source you can do ## DESTDIR=/usr/newsys ## export DESTDIR ## cd /usr ## rm -r newsys ## mkdir newsys ## cd /usr/src/cmd ## MAKE * > ERRS 2>& 1 & This will take about 4 hours on a reasonably configured machine. When it finished you can move the hierarchy into -20- the normal places using _m_v(1) and _c_p(1), and then execute ## DESTDIR=/ ## export DESTDIR ## cd /usr/src/cmd ## MAKE ALIASES ## MAKE MODES to link files together as necessary and to set all the right set-user-id bits. MMaakkiinngg oorrddeerrllyy cchhaannggeess In order to keep track of changes to system source we migrate changed versions of commands in /usr/src/cmd in through the directory /usr/src/new and out of /usr/src/cmd into /usr/src/old for a time before removing them. Locally written commands which aren't distributed are kept in /usr/src/local and their binaries are kept in /usr/local. This allows /usr/bin /usr/ucb and /bin to correspond to the distribution tape (and to the manuals that people can buy). People wishing to use /usr/local commands are made aware that they aren't in the base manual. As manual updates incorporate these commands they are moved to /usr/ucb. A directory /usr/junk to throw garbage into, as well as binary directories /usr/old and /usr/new are very useful. The man command supports manual directories such as /usr/man/manj for junk and /usr/man/manl for local to make this or something similar practical. IInntteerrpprreettiinngg ssyysstteemm aaccttiivviittyy The _v_m_s_t_a_t program provided with the system is designed to be an aid to monitoring systemwide activity. Together with the _p_s(1) command (as in ``ps av''), it can be used to investigate systemwide virtual activity. By running _v_m_s_t_a_t when the system is active you can judge the system activity in several dimensions: job distribution, virtual memory load, paging and swapping activity, disk and cpu utiliza- tion. Ideally, there should be few blocked (B) jobs, there should be little paging or swapping activity, there should be available bandwidth on the disk devices (most single arms peak out at about 30-35 tps in practice), and the user cpu utilization (US) should be high (above 60%). If the system is busy, then the number of active jobs may be large, and several of these jobs may often be blocked (B). If the virtual memory is very active, then the paging demon may be running (SR will be non-zero). It is healthy for the paging demon to free pages when the virtual memory gets active; it is triggered by the amount of free memory dropping below a threshold and increases its pace as free memory goes to zero. -21- If you run _v_m_s_t_a_t when the system is busy (a ``vmstat 5'' is best, since that is how often most of the numbers are recomputed by the system), you can find imbalances by noting abnormal job distributions. If a large number of jobs are blocked (B), then the disk subsystem is overloaded or imbal- anced. If you have a large number of non-dma devices or open teletype lines which are ``ringing'', or user programs which are doing high-speed non-buffered input/output, then the system time may go very high (60-70% or higher). It is often possible to pin down the cause of high system time by looking to see if there is excessive context switching (CS), interrupt activity (IN) or system call activity (SY). Cumu- latively on one of our large machines we average about 60 context switches and interrupts per second and about 90 sys- tem calls per second. If the system is very heavily loaded, or if you have very little memory relative to your load (1M is little in most any case), then the system may be forced to swap. This is likely to be accompanied by a noticeable reduction in system performance as the system does not swap ``working sets'', but rather forces jobs to reinitialize their resi- dent sets by demand paging. If you expect to be in a mem- ory-poor environment for an extended period you might con- sider administratively limiting system load, and should be sure to downsize the system. TTuunnaabbllee ccoonnssttaannttss There is a modicum of tuning available in the virtual memrory management mechanism if it appears to be badly tuned for your configuration. The page replacement (clock) algo- rithm is run whenever there are not LOTSFREE pages available (this and all other constants discussed here are defined in the system header file /usr/src/sys/h/vmtune.h). This sets up resistance to consumption of the remaining free memory at a minimal rate SLOWSCAN, which gives the desired number of seconds between successive examinations of each page. The rate at which the clock algorithm is run increases linearly to a desired rate of FASTSCAN when there is no free memory. Thus as the available free memory decreases, the clock algo- rithm works harder to hold on to what is left. If less than DESFREE pages are available and the paging rate is high, then the system will begin to swap processes out. If less than MINFREE pages are available then the system will begin to swap, regardless of the paging rate. When it has to swap, the system first tries to find a process which has been blocked for a long time and swap it out first. If there are no jobs of this flavor, then it will choose among the 4 largest jobs in-core which it can swap, picking the one of these which has been core resident longest. It attempts to guarantee (during periods of very heavy load) enough core residency to a process to allow it -22- to at least rebuild its set of active pages (since it must do so by demand paging). Processes which are swapped out with large numbers of active pages similarly receive lower priority for swapin, favoring small jobs to return to the core resident set quickly. It is _v_e_r_y desirable that the system run under reason- ably heavy load with little swapping, with the memory parti- tioning being done by the clock replacement algorithm, rather than by the swapping algorithm. The costs associated with paging activity are the time spent in the paging demon, the overhead associated with reclaim page faults (RE), and the extra disk activity associated with pagins and pageouts. We will discuss disk considerations later; when kept to about 40 reclaim faults per second, the cost of reclaims is less than 1% of total processor time. The cpu time (shown by ``ps u2'') accumulated by the pageout demon will show how much overhead it is generating. The system, as distributed, runs the replacement algo- rithm whenever less than 1/4 of the total user memory is free. This is done starting with a 25 second revolution time of the clock algorithm and increasing to a 15 second revolution time when there is no free memory. The goal here is to use as much memory as possible (i.e. have the free list short) but to not have the system run out and start to swap. You can experiment with changing the writable copies of these variables (e.g. ``lotsfree'' is the writable copy of LOTSFREE) using _a_d_b_, as in: ## adb -w /vmunix /dev/kmem lotsfree/D ---adb prints value of lotsfree--- /W 0t100 Here the ``/W 0t100'' command changed the value of _l_o_t_s_f_r_e_e to be 100 (decimal). One final constant which can be changed is kklliinn which controls the page-in clustering in the system. As distrib- uted, it is set to 2, which allows the system to pre-page a even numbered (1k byte) page when an odd numbered page is faulted and vice-versa. In extreme circumstances, for spe- cial purpose applications which cause heavy paging activity you might try setting it to 4 to increase the amount of pre- paging, allowing the system to pre-page up to 3 adjacent pages. This will increase the rate at which memory is con- sumed on an active system, so you should be aware that this can overload the page replacement mechanisms ability to maintain enough free memory and can thus cause swapping. With this volume of page traffic it may be necessary to set the global constant ffiiffoo to 1 in the system, causing the paging algorithm to ignore page-referencing behavior and favoring, rather, circular replacement. -23- KKlliinn should be changed only experimentally on systems with abnormal amounts of paging activity. This is not nec- essary or appropriate for most any normal timesharing load. BBaallaanncciinngg ddiisskk llooaadd It is critical for good performance to balance disk load. There are at least five components of the disk load which you can divide between the available disks: 1. The root file system. 2. The /tmp file system. 3. The /usr file system. 4. The user files. 5. The paging activity. The following possibilities are ones we have actually used at times when we had 2, 3 and 4 disks: +----------------------------+ +--------+-------------------+ | | | disk|s | |what | 2 | 3 | 4 | +--------+-----+-----+-------+ |// | 1 | 2 | 2 | |tmp | 1 | 3 | 4 | |usr | 1 | 1 | 1 | |paging | 1+2 | 1+3 | 1+3+4 | |users | 2 | 2+3 | 2+3 | |archive | x | x | 4 | +--------+-----+-----+-------+ +----------------------------+ Splits such as these should get you going. The most important things to remember are to even out the disk load as much as possible, and to do this by decoupling file sys- tems (on separate arms) between which heavy copying occurs. Note that a long term average balanced load is not impor- tant... it is much more important to have instantaneously balanced load when the system is busy. Intelligent experimentation with a few file system arrangements can pay off in much improved performance. It is particularly easy to move the root, the /tmp file system and the paging areas. Place the user files and the /usr directory as space needs dictate and experiment with the other, more easily moved file systems. PPrroocceessss ssiizzee lliimmiittaattiioonnss As distributed, the system provides for a maximum of 64M bytes of resident user virtual address space. The size of the text, and data segments of a single process are cur- rently limited to 6M bytes each, and the stack segment size is limited to 512K bytes as a soft, user-changeable limit, -24- and may be increased to 6M by calling _v_l_i_m_i_t(2). If these are insufficient, they can be increased by changing the con- stants MAXTSIZ, MAXDSIZ and MAXSSIZ in the file /usr/src/sys/h/vm.h, while changing the definitions in /usr/src/sys/h/dmap.h and /usr/src/sys/h/text.h. You must be careful in doing this that you have adequate paging space. As configured above, the system has only 16M bytes of paging area, since there is only one paging area. The best way to get more space is to provide multiple, thereby interleaved, paging areas by using a file other than confhp.c; see the first line of the makefile in the sys directory and the disk layout section above. To increase the amount of resident virtual space possi- ble, you can alter the constant USRPTSIZE (in /usr/src/sys/h/vm.h) and by correspondingly change the defi- nitions of _U_s_r_p_t_m_a_p in /usr/src/sys/sys/locore.s Thus to allow 128 megabytes of resident virtual space one would declare _Usrptmap: .space 16*NBPG The system has 6 pages of page tables for its text+data+bss areas and the per-page system information. This limits this part of the system to 6*64K = 384K bytes. The per-page system information uses 12 bytes per 1024 bytes of user available physical memory. This (conservatively) takes 55K bytes on a 4 megabyte machine, limiting the ``size'' of the system to about 330K bytes. You can increase the size of the system page table to, for example, 8 pages by defining, in locore.s: _Sysmap: .space 8*NBPG You will then also have to change the definitions of the constants UBA0, MBA0, and MBA1 in the files uubbaa..hh mmbbaa..hh,, mmbbaa..mm,, and uubbaa..mm.. The 6 in the numbers here is the 6 from the number of pages in _S_y_s_m_a_p thus UBA0 would then be defined by ##ddeeffiinnee UUBBAA00 00xx8800008800000000 You should also change the constant RELOC in the boot- strap programs in //uussrr//ssrrcc//ssyyss//ssttaanndd to be large enough to relocate past the end of the system. _G_r_e_p(1) for RELOC in this directory and change the constants, recompile the boot- strap programs and replace them on the floppy. OOtthheerr lliimmiittaattiioonnss Due to the fact that the file system block numbers are stored in page table ppgg__bbllkknnoo entries, the maximum size of a -25- file system is limited to 2^20 1024 byte blocks. Thus no file system can be larger than 1024M bytes. The number of mountable file systems is limited to 15 (which should be enough; if you have a lot of disks it makes sense to make some of them single file systems, and the pag- ing areas dont count in this total.) To increase this it will be necessary to change the core-map /usr/src/sys/h/cmap.h since there is a 4 bit field used here. The size of the core-map will then expand to 16 bytes per 1024 byte page and you should change /usr/src/sys/h/cmap.m also. (Don't forget to change MSWAPX and NMOUNT in /usr/src/sys/h/param.h also.) The maximum value NOFILE (number of open files per process) can be raised to is 30 because of a bit field in the page table entry in //uussrr//ssrrcc//ssyyss//hh//ppttee..hh.. SSccaalliinngg ddoowwnn If you have 1.5M byte of memory or less you may wish to scale the paging system down, by reducing some fixed table sizes not directly related to the paging system. For instance, you could reduce NBUF from 128 to 40, NCLIST from 500 to 150, NPROC from 250 to 125, NINODE from 400 to 200, NFILE from 350 to 175, NMOUNT from 15 to 8, and NTEXT from 60 to 40. You can use _p_s_t_a_t(8) with the --TT option to find out how much of these structures are typically in use. Although the document is somewhat outdated for the VAX, you can see the last few pages of ``Regenerating System Soft- ware'' in Volume 2B of the programmers manual for hints on setting some of these constants. FFiilleess wwhhiicchh nneeeedd aatttteennttiioonn The following files require periodic attention or are system specific //eettcc//ffssttaabb how disk partitions are used //eettcc//ggrroouupp group memberships //eettcc//mmoottdd message of the day //eettcc//ppaasssswwdd password file; each account has a line //eettcc//rrcc system restart script; runs reboot; starts daemons //eettcc//ttttyyss enables/disables ports //eettcc//ttttyyttyyppee terminal types corrected to ports //eettcc//ttttyywwhheerree lists physical locations of terminals //uussrr//lliibb//ccrroonnttaabb commands which are run periodically //uussrr//lliibb//mmaaiillaalliiaasseess mail forwarding and distribution groups //uussrr//aaddmm//aacccctt raw process account data //uussrr//aaddmm//ddnnaacccctt raw autodialer account data //uussrr//aaddmm//mmeessssaaggeess system error log //uussrr//aaddmm//wwttmmpp login session accounting -26- TThhaattss aallll ffoorr nnooww.. Good luck. William N. Joy Ozalp Babaoglu Keith Sklower