Method sscanf()


Method sscanf

int sscanf(string data, string format, mixed ... lvalues)

Description

The purpose of sscanf is to match a string data against a format string and place the matching results into a list of variables. The list of lvalues are destructively modified (which is only possible because sscanf really is a special form, rather than a pike function) with the values extracted from the data according to the format specification. Only the variables up to the last matching directive of the format string are touched.

The format string may contain strings separated by special matching directives like %d, %s %c and %f. Every such directive corresponds to one of the lvalues, in the order they are listed. An lvalue is the name of a variable, a name of a local variable, an index into an array, mapping or object. It is because of these lvalues that sscanf can not be implemented as a normal function.

Whenever a percent character is found in the format string, a match is performed, according to which operator and modifiers follow it:

"%b"

Reads a binary integer ("0101" makes 5)

"%d"

Reads a decimal integer ("0101" makes 101).

"%o"

Reads an octal integer ("0101" makes 65).

"%x"

Reads a hexadecimal integer ("0101" makes 257).

"%D"

Reads an integer that is either octal (leading zero), hexadecimal (leading 0x) or decimal. ("0101" makes 65).

"%c"

Reads one character and returns it as an integer ("0101" makes 48, or '0', leaving "101" for later directives). Using the field width and endianness modifiers, you can decode integers of any size and endianness. For example "%-2c" decodes "0101" into 12592, leaving "01" for later directives. The sign modifiers can be used to modify the signature of the data, making "%+1c" decode "รค" into -28.

"%n"

Returns the current character offset in data. Note that any characters matching fields scanned with the "!"-modifier are removed from the count (see below).

"%f"

Reads a float ("0101" makes 101.0).

"%F"

Reads a float encoded according to the IEEE single precision binary format ("0101" makes 6.45e-10, approximately). Given a field width modifier of 8 (4 is the default), the data will be decoded according to the IEEE double precision binary format instead. (You will however still get a float, unless your pike was compiled with the configure argument --with-double-precision.)

"%s"

Reads a string. If followed by %d, %s will only read non-numerical characters. If followed by a %[], %s will only read characters not present in the set. If followed by normal text, %s will match all characters up to but not including the first occurrence of that text.

"%H"

Reads a Hollerith-encoded string, i.e. first reads the length of the string and then that number of characters. The size and byte order of the length descriptor can be modified in the same way as %c. As an example "%2H" first reads "%2c" and then the resulting number of characters.

"%[set]"

Matches a string containing a given set of characters (those given inside the brackets). Ranges of characters can be defined by using a minus character between the first and the last character to be included in the range. Example: %[0-9H] means any number or 'H'. Note that sets that includes the character '-' must have it first (not possible in complemented sets, see below) or last in the brackets to avoid having a range defined. Sets including the character ']' must list this first too. If both '-' and ']' should be included then put ']' first and '-' last. It is not possible to make a range that ends with ']'; make the range end with '\' instead and put ']' at the beginning of the set. Likewise it is generally not possible to have a range start with '-'; make the range start with '.' instead and put '-' at the end of the set. If the first character after the [ bracket is '^' (%[^set]), and this character does not begin a range, it means that the set is complemented, which is to say that any character except those inside brackets is matched. To include '-' in a complemented set, it must be put last, not first. To include '^' in a non-complemented set, it can be put anywhere but first, or be specified as a range ("^-^").

"%{format%}"

Repeatedly matches 'format' as many times as possible and assigns an array of arrays with the results to the lvalue.

"%O"

Match a Pike constant, such as string or integer (currently only integer, string and character constants are functional).

"%%"

Match a single percent character (hence this is how you quote the % character to just match, and not start an lvalue matcher directive).

Similar to sprintf, you may supply modifiers between the % character and the operator, to slightly change its behaviour from the default:

"*"

The operator will only match its argument, without assigning any variable.

number

You may define a field width by supplying a numeric modifier. This means that the format should match that number of characters in the input data; be it a number characters long string, integer or otherwise ("0101" using the format %2c would read an unsigned short 12337, leaving the final "01" for later operators, for instance).

"-"

Supplying a minus sign toggles the decoding to read the data encoded in little-endian byte order, rather than the default network (big-endian) byte order.

"+"

Interpret the data as a signed entity. In other words, "%+1c" will read "\xFF" as -1 instead of 255, as "%1c" would have.

"!"

Ignore the matched characters with respect to any following "%n".

Note

Sscanf does not use backtracking. Sscanf simply looks at the format string up to the next % and tries to match that with the string. It then proceeds to look at the next part. If a part does not match, sscanf immediately returns how many % were matched. If this happens, the lvalues for % that were not matched will not be changed.

Example
// a will be assigned "oo" and 1 will be returned
sscanf("foo", "f%s", a);

// a will be 4711 and b will be "bar", 2 will be returned
sscanf("4711bar", "%d%s", a, b);

// a will be 4711, 2 will be returned
sscanf("bar4711foo", "%*s%d", a);

// a will become "test", 2 will be returned
sscanf(" \t test", "%*[ \t]%s", a);

// Remove "the " from the beginning of a string
// If 'str' does not begin with "the " it will not be changed
sscanf(str, "the %s", str);

// It is also possible to declare a variable directly in the sscanf call;
// another reason for sscanf not to be an ordinary function:

sscanf("abc def", "%s %s", string a, string b);
Returns

The number of directives matched in the format string. Note that a string directive (%s or %[]) counts as a match even when matching just the empty string (which either may do).

See also

sprintf, array_sscanf