This text was automatically converted from troff me macros to HTML. Information may have been lost, added, or changed in the process. Lars Aronsson and Lysator do not guarantee the correctness of this HTML document.
define function *
- define a new function
define function function_name ( language = {"c" | "postquel"}, returntype = type-r [ , percall_cpu = "costly{!*}" ] [ , perbyte_cpu = "costly{!*}" ] [ , outin_ratio = percentage] [ , byte_pct = percentage] ) arg is ( type-1 { , type-n } ) as {"/full/path/filename.o" | "list-of-postquel-queries"}
With this command, a POSTGRES user can register a function with POSTGRES. Subsequently, this user is treated as the owner of the function.
When defining the function, the input data types, type-1 , type-2 , &..., type-n , and the return data type, type-r must be specified, along with the language, which may be "c" or "postquel". The input types may be base or complex types. The output type may be specified as a base type, complex type, or "setof <type>". The setof modifier indicates that the function will return a set of items, rather than a single item. The as clause of the command is treated differently for C and POSTQUEL functions, as explained below.
Functions written in C can be defined to POSTGRES, which will dynamically load them into its address space. The loading happens either via the load command, or automatically the first time the function is necessary for execution. Repeated execution of a function will cause negligible additional overhead, as the function will remain in a main memory cache.
The percall_cpu, perbyte_cpu, outin_ratio, and byte_pct flags are provided for C functions to give a rough estimate of the function's running time, allowing the query optimizer to postpone applying expensive functions used in a query's where clause. The percall_cpu flag captures the overhead of the function's invocation (regardless of input size), while the perbyte_cpu flag captures the sensitivity of the function's running time to the size of its inputs. The magnitude of these two parameters is determined by the number of exclamation points appearing after the word costly: specifically, each exclamation point can be thought of as another order of magnitude in cost, i.e., "cost = 10number-of-exclamation-points." The default value for percall_cpu and perbyte_cpu is 0. Examples of reasonable cost values may be found in the system catalog pg_proc; most simple functions on base types have costs of 0.
The outin_ratio is provided for functions which return variable-length types, such as text or bytea. It should be set to the size of the function's output as a percentage of the size of the input. For example, a function which compresses its operands by 2 should have "outin_ratio = 50." The default value is 100.
The byte_pct flag should be set to the percentage of the bytes of the arguments that actually need to be examined in order to compute the function. This flag is particularly useful for functions which generally take a large object as an argument, but only examine a small fixed portion of the object. The default value is 100.
The body of a C function following as should be the FULL PATH of the object code (.o file) for the function, bracketed by quotation marks. (POSTGRES will not compile a function automatically -- it must be compiled before it is used in a define function command.)
C functions with base type arguments can be written in a straightforward fashion. The C equivalents of built-in POSTGRES types are accessible in a C file if
$POSTGRESHOME/src/lib/H/utils/builtins.his included as a header file. This can be achieved by having
at the top of the C source file and by compiling all C files with the following include options:#include <utils/builtins.h>
before any ".c" programs in the "cc" command line, e.g.:-I$POSTGRESHOME/src/lib/H -I$POSTGRESHOME/src/port/$PORTNAME -I$POSTGRESHOME/O/lib/H
The directorycc -I$POSTGRESHOME/src/lib/H\
-I$POSTGRESHOME/src/port/$PORTNAME\
-I$POSTGRESHOME/O/lib/H\
-c progname.c
$POSTGRESHOME/O/lib/Hcontains "tags.h", which is generated in the build process. The directory
$POSTGRESHOME/src/port/$PORTNAMEcontains "machine.h". Typical values for PORTNAME are sunos4 and ultrix4.
The convention for passing arguments to and from the user's C functions is to use pass-by-value for data types that are 32 bits (4 bytes) or smaller, and pass-by-reference for data types that require more than 32 bits. The following table gives the C type required for parameters in the C functions that will be loaded into POSTGRES. The "Defined In" column gives the actual header file (in the
$POSTGRESHOME/src/lib/Hdirectory) that the equivalent C type is defined. However, if you include "utils/builtins.h", these files will automatically be included.
center; l l l l l l. Built-In Type C Type Defined In _ abstime AbsoluteTime utils/nabstime.h bool bool tmp/c.h box (BOX *) utils/geo-decls.h bytea (bytea *) tmp/postgres.h char char N/A char16 Char16 or (char16 *) tmp/postgres.h cid CID tmp/postgres.h int2 int2 tmp/postgres.h int28 (int28 *) tmp/postgres.h int4 int4 tmp/postgres.h float4 float32 or (float4 *) tmp/c.h or tmp/postgres.h float8 float64 or (float8 *) tmp/c.h or tmp/postgres.h lseg (LSEG *) tmp/geo-decls.h oid oid tmp/postgres.h oid8 (oid8 *) tmp/postgres.h path (PATH *) utils/geo-decls.h point (POINT *) utils/geo-decls.h regproc regproc or REGPROC tmp/postgres.h reltime RELTIME tmp/postgres.h text (text *) tmp/postgres.h tid ItemPointer storage/itemptr.h tinterval TimeInterval tmp/nabstime.h uint2 uint16 tmp/c.h uint4 uint32 tmp/c.h xid (XID *) tmp/postgres.h
Complex arguments to C functions are passed into the C function as a special C type, TUPLE, defined in
$POSTGRESHOME/src/lib/H/tmp/libpq-fe.h.Given a variable t of this type, the C function may extract attributes from the function using the function call:
where isnull is a pointer to a bool, which the function sets to true if the field is null. The result of this function should be cast appropriately as shown in the examples below.GetAttributeByName(t, "fieldname", &isnull)
POSTQUEL functions execute an arbitrary list of POSTQUEL queries, returning the results of the last query in the list. POSTQUEL functions in general return sets. If their returntype is not specified as a setof, then an arbitrary element of the last query's result will be returned. The expensive function parameters percall_cpu, perbyte_cpu, outin_ratio, and byte_pct are not used for POSTQUEL functions; their costs are determined dynamically by the planner.
The body of a POSTQUEL function following
as
should be a list of queries separated by
whitespace characters and bracketed within quotation marks. Note that
quotation marks used in the queries must be escaped, by preceding them
with two backslashes (i.e. \
").
Arguments to the POSTQUEL function may be referenced in the queries using a $n syntax: $1 refers to the first argument, $2 to the second, and so on. If an argument is complex, then a "dot" notation may be used to access attributes of the argument (e.g. $1.emp), or to invoke functions via a nested dot syntax.
The following command defines a C function, overpaid, of two basetype arguments.
The C file "overpaid.c" might look something like:define function overpaid (language = "c", returntype = bool) arg is (float8, int4) as "/usr/postgres/src/adt/overpaid.o"
The overpaid function can be used in a query, e.g:#include <utils/builtins.h> bool overpaid(salary, age) float8 *salary; int4 age; { if (*salary > 200000.00) return(TRUE); if ((age < 30) && (*salary > 100000.00)) return(TRUE); return(FALSE) }
retrieve (EMP.name) where overpaid(EMP.salary, EMP.age)
One can also write this as a function of a single argument of type EMP:
The following query is now accepted:define function overpaid_2 (language = "c", returntype = bool) arg is (EMP) as "/usr/postgres/src/adt/overpaid_2.o"
In this case, in the body of the overpaid_2 function, the fields in the EMP record must be extracted. The C file "overpaid_2.c" might look something like:retrieve (EMP.name) where overpaid_2(EMP)
#include <utils/builtins.h> #include <tmp/libpq-fe.h> bool overpaid_2(t) TUPLE t; { float8 *salary; int4 age; bool salnull, agenull; salary = (float8 *)GetAttributeByName(t, "salary", &salnull); age = (int4)GetAttributeByName(t, "age", &agenull); if (!salnull && *salary > 200000.00) return(TRUE); if (!agenull && (age<30) && (*salary > 100000.00)) return(TRUE); return(FALSE) }
To illustrate a simple POSTQUEL function, consider the following, which might be used to debit a bank account:
A user could execute this function to debit account 17 by $100.00 as follows:define function TP1 (language = "postquel", returntype = int4) arg is (int4, float8) as "replace BANK (balance = BANK.balance - $2) where BANK.accountno = $1 retrieve(x = 1)"
retrieve (x = TP1( 17,100.0))
The following more interesting examples take a single argument of type EMP, and retrieve multiple results:
define function hobbies (language = "postquel", returntype = setof HOBBIES) arg is (EMP) as "retrieve (HOBBIES.all) where $1.name = HOBBIES.person" define function children (language = "postquel", returntype = setof KIDS) arg is (EMP) as "retrieve (KIDS.all) where $1.name = KIDS.dad or $1.name = KIDS.mom"
Then the following query retrieves overpaid employees, their hobbies, and their children:
Note that attributes can be projected using function syntax (e.g. name(EMP)), as well as the traditional dot syntax (e.g. EMP.name).retrieve (name=name(EMP), hobby=name(hobbies(EMP)), kid=name(children(EMP))) where overpaid_2(EMP)
An equivalent expression of the previous query is:
This "nested dot" notation for functions can be used to cascade functions of single arguments. Note that the function after a dot must have only one argument, of the type returned by the function before the dot.retrieve (EMP.name, hobby=EMP.hobbies.name, kid=EMP.children.name) where overpaid_2(EMP)
POSTGRES flattens the target list of the queries above. That is, it produces the cross-product of the hobbies and the children of the employees. For example, given the schema:
create BANK (accountno = int4, balance = float8) append BANK (accountno = 17, balance = "10000.00"::float8) create EMP (name = char16, salary = float8, dept = char16, age = int4) create HOBBIES (name = char16, person = char16) create KIDS (name = char16, dad = char16, mom = char16) append EMP (name = "joey", salary = "100000.01"::float8, dept = "toy", age = 24) append EMP (name = "jeff", salary = "100000.01"::float8, dept = "shoe", age = 23) append EMP (name = "wei", salary = "100000"::float8, dept = "tv", age = 30) append EMP (name = "mike", salary = "500000"::float8, dept = "appliances", age = 30) append HOBBIES (name = "biking", person = "jeff" ) append HOBBIES (name = "jamming", person = "joey" ) append HOBBIES (name = "basketball", person = "wei") append HOBBIES (name = "swimming", person = "mike") append HOBBIES (name = "philately", person = "mike") append KIDS (name = "matthew", dad = "mike", mom = "teresa") append KIDS (name = "calvin", dad = "mike", mom = "teresa")
the query above returns
center; l l l l l l. name hobby kid _ jeff biking (null) joey jamming (null) mike swimming matthew mike philately matthew mike swimming calvin mike philately calvin
Note that flattening preserves the name and hobby fields even when the kid field is null.
information(unix), load(commands), remove function(commands).
The percall_cpu and perbyte_cpu flags can take integers surrounded by quotes instead of the """costly{!*}""" syntax described above. This allows a finer grain of distinction between function costs, but is not encouraged since such distinctions are difficult to estimate accurately.
On Ultrix, all .o files that POSTGRES is expected to load dynamically must be compiled under cc with the
"-G 0"option turned on.
The name of the C function must be a legal C function name, and the name of the function in C code must be exactly the same as the name used in define function.
Function names must be unique per database, except for the fact that there may be attributes of the same name as a function. In the case that a there is an ambiguity between a function on a complex type and an attribute of the complex type, the attribute will always be used.
C functions cannot return a set of values.
The dynamic loader for DECstation Ultrix has exceedingly bad performance.