Go to the first, previous, next, last section, table of contents.
The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.
$ gawk 'function die () { print "Aaaaarg!" } BEGIN { die () }' gawk: cmd. line:2: BEGIN { die () } gawk: cmd. line:2: ^ parse error $ gawk 'function die () { print "Aaaaarg!" } BEGIN { die() }' Aaaaarg!If you want your program to be deterministic, don't depend on
for
on arrays:
$ cat for.awk END { arr["foo"] = 1 arr["bar"] = 1 for (i in arr) print i } $ gawk -f for.awk </dev/null foo bar $ nawk -f for.awk </dev/null bar fooSome AWK, such as HPUX 11.0's native one, have regex engines fragile to inner anchors:
$ echo xfoo | $AWK '/foo|^bar/ { print }' $ echo bar | $AWK '/foo|^bar/ { print }' bar $ echo xfoo | $AWK '/^bar|foo/ { print }' xfoo $ echo bar | $AWK '/^bar|foo/ { print }' barEither do not depend on such patterns (i.e., use `/^(.*foo|bar)/', or use a simple test to reject such AWK.
link
(or, in
newer systems, rename
).
dir=`expr "x$file" : 'x\(.*\)/[^/]*' \| '.' : '.'But there are a few subtilities, e.g., under UN*X, should `//1' give `/'? Paul Eggert answers:
No, under some older flavors of Unix, leading `//' is a special path name: it refers to a "super-root" and is used to access other machines' files. Leading `///', `////', etc. are equivalent to `/'; but leading `//' is special. I think this tradition started with Apollo Domain/OS, an OS that is still in use on some older hosts.
POSIX.2 allows but does not require the special treatment for `//'. It says that the behavior of dirname on path names of the form `//([^/]+/*)?' is implementation defined. In these cases, GNU @command{dirname} returns `/', but it's more portable to return `//' as this works even on those older flavors of Unix.
I have heard rumors that this special treatment of `//' may be dropped in future versions of POSIX, but for now it's still the standard.
> printf "foo\n|foo\n" | egrep '^(|foo|bar)$' |foo > printf "bar\nbar|\n" | egrep '^(foo|bar|)$' bar| > printf "foo\nfoo|\n|bar\nbar\n" | egrep '^(foo||bar)$' foo |bar@command{egrep} also suffers the limitations of @command{grep}.
length
, substr
, match
and index
.
expr " \| "GNU/Linux and POSIX.2-1992 return the empty string for this case, but traditional Unix returns `0' (Solaris is one such example). In the latest POSIX draft, the specification has been changed to match traditional Unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:
expr bar : foo \| foo : barAvoid this portability problem by avoiding the empty string.
expr a : b \| "unfortunately this behaves exactly as the original expression, see the `@command{expr' (`:')} entry for more information. Older @command{expr} implementations (e.g. SunOS 4 @command{expr} and Solaris 8 @command{/usr/ucb/expr}) have a silly length limit that causes @command{expr} to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on `echo|sed' if @command{expr} fails. Don't leave, there is some more! The QNX 4.25 @command{expr}, in addition of preferring `0' to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!
$ val=`expr 'a' : 'a'`; echo "$?: $val" 0: 1 $ val=`expr 'a' : 'b'`; echo "$?: $val" 1: 0 $ val=`expr 'a' : '\(a\)'`; echo "?: $val" 1: a $ val=`expr 'a' : '\(b\)'`; echo "?: $val" 1: 0In practice this can be a big problem if you are ready to catch failures of @command{expr} programs with some other method (such as using @command{sed}), since you may get twice the result. For instance
$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'will output `a' on most hosts, but `aa' on QNX 4.25. A simple work around consists in testing @command{expr} and use a variable set to @command{expr} or to @command{false} according to the result.
grep
to `/dev/null'. Check the exit
status of grep
to determine whether it found a match.
Don't use multiple regexps with @option{-e}, as some grep
will only
honor the last pattern (eg., IRIX 6.5 and Solaris 2.5.1). Anyway,
Stardent Vistra SVR4 grep
lacks @option{-e}... Instead, use
alternation and egrep
.
$ echo a | sed 's/x/x/;;s/x/x/' sed: 1: "s/x/x/;;s/x/x/": invalid command code ;Input should have reasonably long lines, since some @command{sed} have an input buffer limited to 4000 bytes. Alternation, `\|', is common but not portable. Anchors (`^' and `$') inside groups are not portable. Nested groups are extremely portable, but there is at least one @command{sed} (System V/68 Base Operating System R3V7.1) that does not support it. Of course the option @option{-e} is portable, but it is not needed. No valid Sed program can start with a dash, so it does not help disambiguating. Its sole usefulness is helping enforcing indenting as in:
sed -e instruction-1 \ -e instruction-2as opposed to
sed instruction-1;instruction-2Contrary to yet another urban legend, you may portably use `&' in the replacement part of the
s
command to mean "what was
matched".
s/keep me/kept/g # a t end # b s/.*/deleted/g # c : end # don
delete me # 1 delete me # 2 keep me # 3 delete me # 4you get
deleted delete me kept deletedinstead of
deleted deleted kept deletedWhy? When processing 1, a matches, therefore sets the t flag, b jumps to d, and the output is produced. When processing line 2, the t flag is still set (this is the bug). Line a fails to match, but @command{sed} is not supposed to clear the t flag when a substitution fails. Line b sees that the flag is set, therefore it clears it, and jumps to d, hence you get `delete me' instead of `deleted'. When processing 3 t is clear, a matches, so the flag is set, hence b clears the flags and jumps. Finally, since the flag is clear, 4 is processed properly. There are two things one should remind about `t' in @command{sed}. Firstly, always remember that `t' jumps if some substitution succeeded, not only the immediately preceding substitution, therefore, always use a fake `t clear; : clear' to reset the t flag where indeed. Secondly, you cannot rely on @command{sed} to clear the flag at each new cycle. One portable implementation of the script above is:
t clear : clear s/keep me/kept/g t end s/.*/deleted/g : end
echo
as a workaround.
GNU @command{touch} 3.16r (and presumably all before that) fails to work
on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume.
Go to the first, previous, next, last section, table of contents.