WARNING

This text was automatically converted from troff me macros to HTML. Information may have been lost, added, or changed in the process. Lars Aronsson and Lysator do not guarantee the correctness of this HTML document.

NAME

define function *- define a new function

SYNOPSIS

define function function_name (
          language = {"c" | "postquel"},
          returntype = type-r
          [ , percall_cpu = "costly{!*}" ]
          [ , perbyte_cpu = "costly{!*}" ]
          [ , outin_ratio =  percentage]
          [ , byte_pct =  percentage]
        )
    arg is ( type-1  { , type-n } )
    as {"/full/path/filename.o" | "list-of-postquel-queries"}

DESCRIPTION

With this command, a POSTGRES user can register a function with POSTGRES. Subsequently, this user is treated as the owner of the function.

When defining the function, the input data types, type-1 , type-2 , &..., type-n , and the return data type, type-r must be specified, along with the language, which may be "c" or "postquel". The input types may be base or complex types. The output type may be specified as a base type, complex type, or "setof <type>". The setof modifier indicates that the function will return a set of items, rather than a single item. The as clause of the command is treated differently for C and POSTQUEL functions, as explained below.

C FUNCTIONS

Functions written in C can be defined to POSTGRES, which will dynamically load them into its address space. The loading happens either via the load command, or automatically the first time the function is necessary for execution. Repeated execution of a function will cause negligible additional overhead, as the function will remain in a main memory cache.

The percall_cpu, perbyte_cpu, outin_ratio, and byte_pct flags are provided for C functions to give a rough estimate of the function's running time, allowing the query optimizer to postpone applying expensive functions used in a query's where clause. The percall_cpu flag captures the overhead of the function's invocation (regardless of input size), while the perbyte_cpu flag captures the sensitivity of the function's running time to the size of its inputs. The magnitude of these two parameters is determined by the number of exclamation points appearing after the word costly: specifically, each exclamation point can be thought of as another order of magnitude in cost, i.e., "cost = 10number-of-exclamation-points." The default value for percall_cpu and perbyte_cpu is 0. Examples of reasonable cost values may be found in the system catalog pg_proc; most simple functions on base types have costs of 0.

The outin_ratio is provided for functions which return variable-length types, such as text or bytea. It should be set to the size of the function's output as a percentage of the size of the input. For example, a function which compresses its operands by 2 should have "outin_ratio = 50." The default value is 100.

The byte_pct flag should be set to the percentage of the bytes of the arguments that actually need to be examined in order to compute the function. This flag is particularly useful for functions which generally take a large object as an argument, but only examine a small fixed portion of the object. The default value is 100.

The body of a C function following as should be the FULL PATH of the object code (.o file) for the function, bracketed by quotation marks. (POSTGRES will not compile a function automatically -- it must be compiled before it is used in a define function command.)

C functions with base type arguments can be written in a straightforward fashion. The C equivalents of built-in POSTGRES types are accessible in a C file if

$POSTGRESHOME/src/lib/H/utils/builtins.h

is included as a header file. This can be achieved by having

#include <utils/builtins.h>

at the top of the C source file and by compiling all C files with the following include options:

 -I$POSTGRESHOME/src/lib/H
 -I$POSTGRESHOME/src/port/$PORTNAME
 -I$POSTGRESHOME/O/lib/H

before any ".c" programs in the "cc" command line, e.g.:

cc -I$POSTGRESHOME/src/lib/H \
        -I$POSTGRESHOME/src/port/$PORTNAME \
        -I$POSTGRESHOME/O/lib/H \
        -c progname.c

The directory

$POSTGRESHOME/O/lib/H

contains "tags.h", which is generated in the build process. The directory

$POSTGRESHOME/src/port/$PORTNAME

contains "machine.h". Typical values for PORTNAME are sunos4 and ultrix4.

The convention for passing arguments to and from the user's C functions is to use pass-by-value for data types that are 32 bits (4 bytes) or smaller, and pass-by-reference for data types that require more than 32 bits. The following table gives the C type required for parameters in the C functions that will be loaded into POSTGRES. The "Defined In" column gives the actual header file (in the

$POSTGRESHOME/src/lib/H

directory) that the equivalent C type is defined. However, if you include "utils/builtins.h", these files will automatically be included.

Equivalent C Types for Built-In POSTGRES Types

center;
l l l
l l l.
Built-In Type C Type Defined In
_
abstime  AbsoluteTime utils/nabstime.h
bool bool tmp/c.h
box (BOX *)  utils/geo-decls.h
bytea (bytea *) tmp/postgres.h
char char N/A
char16 Char16 or (char16 *) tmp/postgres.h
cid CID tmp/postgres.h
int2 int2 tmp/postgres.h
int28 (int28 *) tmp/postgres.h
int4 int4 tmp/postgres.h
float4 float32 or (float4 *) tmp/c.h or tmp/postgres.h
float8 float64 or (float8 *) tmp/c.h or tmp/postgres.h
lseg (LSEG *) tmp/geo-decls.h
oid oid tmp/postgres.h
oid8 (oid8 *) tmp/postgres.h
path (PATH *) utils/geo-decls.h
point (POINT *) utils/geo-decls.h
regproc  regproc or REGPROC tmp/postgres.h
reltime  RELTIME  tmp/postgres.h
text (text *) tmp/postgres.h
tid ItemPointer storage/itemptr.h
tinterval TimeInterval tmp/nabstime.h
uint2 uint16 tmp/c.h
uint4 uint32 tmp/c.h
xid (XID *)  tmp/postgres.h

Complex arguments to C functions are passed into the C function as a special C type, TUPLE, defined in

$POSTGRESHOME/src/lib/H/tmp/libpq-fe.h.

Given a variable t of this type, the C function may extract attributes from the function using the function call:

GetAttributeByName(t, "fieldname", &isnull)

where isnull is a pointer to a bool, which the function sets to true if the field is null. The result of this function should be cast appropriately as shown in the examples below.

POSTQUEL FUNCTIONS

POSTQUEL functions execute an arbitrary list of POSTQUEL queries, returning the results of the last query in the list. POSTQUEL functions in general return sets. If their returntype is not specified as a setof, then an arbitrary element of the last query's result will be returned. The expensive function parameters percall_cpu, perbyte_cpu, outin_ratio, and byte_pct are not used for POSTQUEL functions; their costs are determined dynamically by the planner.

The body of a POSTQUEL function following as should be a list of queries separated by whitespace characters and bracketed within quotation marks. Note that quotation marks used in the queries must be escaped, by preceding them with two backslashes (i.e. \").

Arguments to the POSTQUEL function may be referenced in the queries using a $n syntax: $1 refers to the first argument, $2 to the second, and so on. If an argument is complex, then a "dot" notation may be used to access attributes of the argument (e.g. $1.emp), or to invoke functions via a nested dot syntax.

EXAMPLES: C Functions

The following command defines a C function, overpaid, of two basetype arguments.

define function overpaid
        (language = "c", returntype = bool)
        arg is (float8, int4)
        as "/usr/postgres/src/adt/overpaid.o"

The C file "overpaid.c" might look something like:

#include <utils/builtins.h>

bool overpaid(salary, age)
        float8 *salary;
        int4    age;
{
        if (*salary > 200000.00)
                return(TRUE);
        if ((age < 30) && (*salary > 100000.00))
                return(TRUE);
        return(FALSE)
}

The overpaid function can be used in a query, e.g:

retrieve (EMP.name)
    where overpaid(EMP.salary, EMP.age)

One can also write this as a function of a single argument of type EMP:

define function overpaid_2
        (language = "c", returntype = bool)
        arg is (EMP)
        as "/usr/postgres/src/adt/overpaid_2.o"

The following query is now accepted:

retrieve (EMP.name) where overpaid_2(EMP)

In this case, in the body of the overpaid_2 function, the fields in the EMP record must be extracted. The C file "overpaid_2.c" might look something like:

#include <utils/builtins.h>
#include <tmp/libpq-fe.h>

bool overpaid_2(t)
TUPLE t;
{
    float8 *salary;
    int4    age;
    bool    salnull, agenull;

    salary = (float8 *)GetAttributeByName(t, "salary",
                                          &salnull);
    age = (int4)GetAttributeByName(t, "age", &agenull);
    if (!salnull && *salary > 200000.00)
        return(TRUE);
    if (!agenull && (age<30) && (*salary > 100000.00))
        return(TRUE);
    return(FALSE)
}

EXAMPLES: POSTQUEL Functions

To illustrate a simple POSTQUEL function, consider the following, which might be used to debit a bank account:

define function TP1
        (language = "postquel", returntype = int4)
        arg is (int4, float8)
        as "replace BANK (balance = BANK.balance - $2)
              where BANK.accountno = $1
            retrieve(x = 1)"

A user could execute this function to debit account 17 by $100.00 as follows:

retrieve (x = TP1( 17,100.0))

The following more interesting examples take a single argument of type EMP, and retrieve multiple results:

define function hobbies
    (language = "postquel", returntype = setof HOBBIES)
    arg is (EMP)
    as "retrieve (HOBBIES.all)
          where $1.name = HOBBIES.person"

define function children
    (language = "postquel", returntype = setof KIDS)
    arg is (EMP)
    as "retrieve (KIDS.all)
          where $1.name = KIDS.dad
             or $1.name = KIDS.mom"

Then the following query retrieves overpaid employees, their hobbies, and their children:

retrieve (name=name(EMP), hobby=name(hobbies(EMP)),
 kid=name(children(EMP)))
    where overpaid_2(EMP)

Note that attributes can be projected using function syntax (e.g. name(EMP)), as well as the traditional dot syntax (e.g. EMP.name).

An equivalent expression of the previous query is:

retrieve (EMP.name, hobby=EMP.hobbies.name,
 kid=EMP.children.name)
    where overpaid_2(EMP)

This "nested dot" notation for functions can be used to cascade functions of single arguments. Note that the function after a dot must have only one argument, of the type returned by the function before the dot.

POSTGRES flattens the target list of the queries above. That is, it produces the cross-product of the hobbies and the children of the employees. For example, given the schema:

create BANK (accountno = int4, balance = float8) 
append BANK (accountno = 17,
             balance = "10000.00"::float8)
create EMP (name = char16, salary = float8,
            dept = char16, age = int4)
create HOBBIES (name = char16, person = char16) 
create KIDS (name = char16, dad = char16, mom = char16) 
append EMP (name = "joey", salary = "100000.01"::float8,
            dept = "toy", age = 24) 
append EMP (name = "jeff", salary = "100000.01"::float8,
            dept = "shoe", age = 23)
append EMP (name = "wei", salary = "100000"::float8,
            dept = "tv", age = 30) 
append EMP (name = "mike", salary = "500000"::float8,
            dept = "appliances", age = 30) 
append HOBBIES (name = "biking", person = "jeff" ) 
append HOBBIES (name = "jamming", person = "joey" ) 
append HOBBIES (name = "basketball", person = "wei") 
append HOBBIES (name = "swimming", person = "mike") 
append HOBBIES (name = "philately", person = "mike") 
append KIDS (name = "matthew", dad = "mike",
             mom = "teresa")
append KIDS (name = "calvin", dad = "mike",
             mom = "teresa")

the query above returns


center;
l l l
l l l.
name hobby kid
_
jeff biking (null)
joey jamming (null)
mike swimming matthew
mike philately matthew
mike swimming calvin
mike philately calvin

Note that flattening preserves the name and hobby fields even when the kid field is null.

NOTES

The percall_cpu and perbyte_cpu flags can take integers surrounded by quotes instead of the """costly{!*}""" syntax described above. This allows a finer grain of distinction between function costs, but is not encouraged since such distinctions are difficult to estimate accurately.

On Ultrix, all .o files that POSTGRES is expected to load dynamically must be compiled under cc with the

"-G 0"

option turned on.

RESTRICTIONS

The name of the C function must be a legal C function name, and the name of the function in C code must be exactly the same as the name used in define function.

BUGS

Function names must be unique per database, except for the fact that there may be attributes of the same name as a function. In the case that a there is an ambiguity between a function on a complex type and an attribute of the complex type, the attribute will always be used.

C functions cannot return a set of values.

The dynamic loader for DECstation Ultrix has exceedingly bad performance.