This file documents awk , a program that you can use to select particular records in a file and perform operations upon them. This is Edition 5. Arnold Robbins and I are good friends. We were introduced in by circumstances—and our favorite programming language, AWK.
|Published (Last):||10 November 2013|
|PDF File Size:||18.85 Mb|
|ePub File Size:||8.68 Mb|
|Price:||Free* [*Free Regsitration Required]|
The awk command uses a set of user-supplied instructions to compare a set of files, one line at a time, to extended regular expressions supplied by the user. Then actions are performed upon any line that matches the extended regular expressions. The pattern searching of the awk command is more general than that of the grep command, and it allows the user to perform multiple actions on input text lines. The awk command programming language requires no compiling, and allows the user to use variables, numeric functions, string functions, and logical operators.
The awk command takes two types of input: input text files and program instructions. Searching and actions are performed on input text files. The files are specified by:. If multiple files are specified with the File variable, the files are processed in the order specified. Instructions provided by the user control the actions of the awk command. If multiple program files are specified, the files are concatenated in the order specified and the resultant order of instructions is used.
The awk command produces three types of output from the data within the input text file:. All of these types of output can be performed on the same file.
The programming language recognized by the awk command allows the user to redirect output. The BEGIN statement in the awk programming language allows the user to specify a set of instructions to be done before the first record is read.
This is particularly useful for initializing special variables. A record is a set of data separated by a record separator. The default value for the record separator is the new-line character, which makes each line in the file a separate record. The record separator can be changed by setting the RS special variable. The command instructions can specify that a specific field within the record be compared. By default, fields are separated by white space blanks or tabs. Each field is referred to by a field variable.
The field separator can be changed by using the -F flag on the command line or by setting the FS special variable. The FS special variable can be set to the values of: blank, single character, or extended regular expression. The END statement in the awk programming language allows the user to specify actions to be performed after the last record is read. This is particularly useful for sending messages about what work was accomplished by the awk command.
The awk command programming language consists of statements in the form:. If a record matches the specified pattern, or contains a field which matches the pattern, the associated action is then performed. A pattern can be specified without an action, in which case the entire line containing the pattern is written to standard output.
An action specified without a pattern is performed for every input record. There are four types of patterns used in the awk command language syntax:. The extended regular expressions used by the awk command are similar to those used by the grep or egrep command. The simplest form of an extended regular expression is a string of characters enclosed in slashes. For an example, suppose a file named testfile had the following contents:. The output is:.
The following special characters are used to form extended regular expressions:. The output in this example is:. To specify the backslash itself as a character, use a double backslash.
See the following item on escape sequences for more information on the backslash and its uses. The awk command recognizes most of the escape sequences used in C language conventions, as well as several that are used as special characters by the awk command itself.
The escape sequences are:. Note: Except in the gsub , match , split , and sub built-in functions, the matching of extended regular expressions is based on input records.
Record-separator characters the new-line character by default cannot be embedded in the expression, and no expression matches the record-separator character. If the record separator is not the new-line character, then the new-line character can be matched.
In the four built-in functions specified, matching is based on text strings, and any character including the record separator can be embedded in the pattern so that the pattern matches the appropriate character. However, in all regular-expression matching with the awk command, the use of one or more NULL characters in the pattern produces undefined results. For example, the pattern:.
The relational operators also work with string values. For example:. String values can also be matched on collation values. If no other information is given, field variables are compared as string values. Actions specified with the END pattern are performed after all input has been read. If a program consists only of END statements, all the input is read prior to any actions being taken.
If the statements are specified without a pattern, they are performed on every record. Multiple actions can be specified within the braces, but must be separated by new-line characters or ; semicolons , and the statements are processed in the order they appear. Action statements include:. The post-increment and post-decrement statements operate as in the C programming language:.
The awk command language uses arithmetic functions, string functions, and general functions. The close Subroutine statement is necessary if you intend to write a file, then read it later in the same program. The following arithmetic functions perform the same actions as the C language subroutines by the same name:.
Note: All forms of the getline function return 1 for successful input, zero for end of file, and -1 for an error. A function can be referred to anywhere in an awk command program, and its use can precede its definition. The scope of the function is global. Function parameters can be either scalars or arrays.
Parameter names are local to the function; all other variable names are global. The same name should not be used for different entities; for example, a parameter name should not be duplicated as a function name, or special variable. Variables with global scope should not share the name of a function. Scalars and arrays should not have the same name in the same scope. The number of parameters in the function definition does not have to match the number of parameters used when the function is called.
Excess formal parameters can be used as local variables. Extra scalar parameters are initialized with a string value equivalent to the empty string and a numeric value of 0 zero ; extra array parameters are initialized as empty arrays.
When invoking a function, no white space is placed between the function name and the opening parenthesis. Function calls can be nested and recursive. Upon return from any nested or recursive function call, the values of all the calling function's parameters shall be unchanged, except for array parameters passed by reference. The return statement can be used to return a value. The function average is passed an array, g , and a variable, n , with the number of elements in the array.
The function then obtains an average and returns it. Most conditional statements in the awk command programming language have the same syntax and function as conditional statements in the C programming language. Six conditional statements in C language are:. Five conditional statements in the awk command programming language that do not follow C-language rules are:.
The for See the delete statement for an example of a for The if The statement is performed if the Array element is found. The delete statement deletes both the array element specified by the Array parameter and the index specified by the Expression parameter. For example, the statements:. The exit statement first invokes all END actions in the order they occur, then terminates the awk command with an exit status specified by the Expression parameter.
The statement places comments. Comments should always end with a new-line but can begin anywhere on a line. Two output statements in the awk command programming language are:. The print statement writes the value of each expression specified by the ExpressionList parameter to standard output. Each expression is separated by the current value of the OFS special variable, and each record is terminated by the current value of the ORS special variable. The printf statement writes to standard output the expressions specified by the ExpressionList parameter in the format specified by the Format parameter.
The Redirection and Expression parameters function the same as in the print statement. For the c conversion specification: if the argument has a numeric value, the character whose encoding is that value will be output.
If the value is zero or is not the encoding of any character in the character set, the behavior is undefined. If the argument does not have a numeric value, the first character of the string value will be output; if the string does not contain any characters the behavior is undefined.
Note: If the Expression parameter specifies a path name for the Redirection parameter, the Expression parameter should be enclosed in double quotes to insure that it is treated as a string. Variables can be scalars, field variables, arrays, or special variables. Variable names cannot begin with a digit.
Manual awk español pdf
Exit Print View. Search Scope:. This Document Entire Library. Gawk also provides more recent Bell Laboratories awk extensions, and a number of GNU-specific extensions. Pgawk is the profiling version of gawk. It is identical in every way to gawk, except that programs run more slowly, and it automatically produces an execution profile in the file awkprof. See the --profile option, below.
The GNU Awk User’s Guide
AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems. The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data — either run directly on files or used as part of a pipeline — for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype , associative arrays that is, arrays indexed by key strings , and regular expressions. While AWK has a limited intended application domain and was especially designed to support one-liner programs , the language is Turing-complete , and even the early Bell Labs users of AWK often wrote well-structured large AWK programs. Weinberger who worked on tiny relational databases , and Brian Kernighan ; it takes its name from their respective initials. According to Kernighan, one of the goals of AWK was to have a tool that would easily manipulate both numbers and strings.