A Function is a predefined program that performs a task.
A DUAL
table is a special one row, one column table that is useful for running unit tests (testing functions). Most Database engines either have a DUAL
table or allow you to easily create one. Oracle has a preset DUAL
table in every database. Simply use the DUAL
keyword as a table name to access it:
SELECT * FROM DUAL;
A Single Row Function (SRF) a function that acts on one row at a time. For every row that function is called, and every task that it is designed to do, it will return a single value every time. Most database engines have a huge selection of pre-made functions that you can use at your disposal. Here are some of them:
Function | Attributes | Description |
---|---|---|
CONCAT(<att>, <att>) |
1. Column/String 2. Column/String |
Concatenate 2 parts of a string/column without using the concatenation operator. Unlike CONCAT however, the concatenation operator (|| ) can be used indefinitely. |
UPPER(<att>) |
1. Column/String | Converts the characters in a string/column to Upper Case. |
LOWER(<att>) |
1. Column/String | Converts the characters in a string/column to Lower Case. |
INITCAP(<att>) |
1. Column/String | Capitalizes the first letter in every word. |
LENGTH(<att>) |
1. Column/String | Returns the number of characters in a string/column. |
INSTR(<att1>, <att2>, <att3>, <att4>) |
1. Column/String 2. Column/String 3. Number (opt) 4. Number (opt) |
Locates a string within another string. The second attribute determines the string to look for. The third attribute indicates the character position to start looking and the last parameter indicates which occurrence (Ex. If <att4> was 2, then it would skip the first occurence and return the character position of the second). |
SUBSTR(<att1>,<br><att2>,<att3>) |
1. Column/String 2. Number 3. Number (opt) |
Extracts parts of a column/string. The second attribute indicates the starting character of the string. The optional third attribute indicates the number of characters to extract. |
LPAD(<att1>,<attt2>,<att3>) |
1. Column/String 2. Number 3. Character |
Inserts (a padding of) characters on the left side of the column/string. The second attribute indicates the total number of characters the column/string should have. The third attribute indicates the character to insert to make up that total. |
RPAD(<att1>,<att2>,<att3>) |
1. Column/String 2. Number 3. Character |
Does the exact same thing as LPAD, but inserts the character(s) on the right side of the column/string. |
LTRIM(<att1>,<att2>) |
1. Column/String 2. Character |
Removes all instances of a specified character from the left of the column/string. It will only trim from the left until a different character is present. The second attribute indicates the type of character to remove. |
RTRIM(<att1>,<att2>) |
1. Column/String 2. Character |
Does the exact same thing as LTRIM, but removes the charater(s) from the right side of the column/string until a different character is present. |
SOUNDEX(<att>) |
1. Column/String | Translates a source string into its SOUNDEX code. SOUNDEX codes are used to translate English words into sound-alike patterns. The first letter remains the same, the next three (translatable) letters are turned into numbers, and the remaining letters are ignored. (Example: ‘Worthington’ = W635) |
Letter | SOUNDEX Code |
---|---|
B, F, P, V | 1 |
C, G, J, K, Q, S, X, Z | 2 |
D, T | 3 |
L | 4 |
M, N | 5 |
R | 6 |
All other letters (A, E, H, I, O, U, W, Y) | Ignored |
Function | Attributes | Description |
---|---|---|
ROUND(<att1>,<att2>) |
1. Column/Number 2. Number (opt) |
Rounds the column/number to the specified number of places from the left. The second attribute indicates the number of places to round to. The default for the second attribute is 0. |
CEIL(<att>) |
1. Column/Number | Returns the smallest integer that is greater than or equal to the attribute. |
FLOOR(<att>) |
1. Column/Number | Returns the largest integer that is less than or equal to the attribute. |
REMAINDER(<att1>, <att2>) |
1. Column/Number 2. Column/Number |
Identifies the multiple of the first attribute that is nearest to the second attribute and returns the difference between those two values. |
MOD(<att1>, <att2>) |
1. Column/Number 2. Column/Number |
Identifies the multiple of the second attribute that is nearest to the first attribute and returns the difference between those two values. |
TRUNC(<att1>, <att2>) |
1. Column/Number 2. Number |
Removes a specified number of fractional digits from the right of the column/number without rounding. The second attribute indicates the total number of fractional digits to include. The default for the second attribute is 0. |
Function | Attributes | Description |
---|---|---|
SYSDATE |
Returns the current date in M/D/Y format. | |
SYSTIMESTAMP |
Returns the current date in D-M-Y format as well as the time in h-m-s-f AM/PM {timezone}. | |
ADD_MONTHS(<att1>,<att2>) |
1. Column/Date 2. Number |
Returns a date (M/D/Y) with the specified number of months added to the column/date. If the second attribute is a negative number, the function returns a date with the specified number of months removed from it. |
MONTHS_BETWEEN(<att1>, <att2>) |
1. Column/Date 2. Column/Date |
Returns the number of months between two specified columns/dates. |
LAST_DAY(<att>) |
1. Column/Date | Returns the last day of the month for the specified column/date. |
NEXT_DAY(<att1>, <att2>) |
1. Column/Date 2. Date (DAY format) |
Returns the date of the next “day of the week” specified. I.e: the next Monday or the next Tuesday, etc. |
ROUND(<att1>, <att2>) |
1. Column/Date 2. Date Format |
Rounds the first attribute to the nearest date value specified by the second attribute. The default for the second attribute is to the nearest hour. |
TRUNC(<att1>, <att2>) |
1. Column/Date 2. Date Format |
Performs the same task as ROUND, except that it always rounds down. |
Function | Attributes | Description |
---|---|---|
TO_CHAR(<att1>, <att2>) |
1. Column/Number/Date 2. String Format |
Converts a column/number/date into a specified string format. |
TO_NUMBER(<att1>, <att2>, <att3>) |
1. Column/String/Date 2. Number Format 3. NLS Parameter (opt) |
Converts a column/string/date into a specified number format. An optional NLS parameter can be included. |
TO_DATE(<att1>,<att2>) |
1. Column/String/Number 2. Date Format |
Converts a column/string/number into a specified date format by using a valid date format. The first argument must math the second arguments format in order for the function to work. |
TO_TIMESTAMP(<att1>, <att2>) |
1. Column/Character 2. Date Format |
Converts the character string to the TIMESTAMP date format. |
TO_TIMESTAMP_TZ(<att1>, <att2>) |
1. Column/Character 2. Date Format |
Converts the character string to the TIMESTAMP date format with the Time Zone. |
TO_YMINTERVAL(<att1>-<att2>) |
1. Column/Number 2. Column/Number |
Converts the “n-n” parameter into a valid year-month date. |
TO_DSINTERVAL(<att>) |
1. Column/SQL Format | Converts a String in a valid INTERVAL DAT TO SECOND FORMAT ('DAYS HH24:MI:SS.FF') into a date. |
NUMTOYMINTERVAL(<att1>, <att2>) |
1. Column/Number 2. ‘YEAR’/’MONTH’ |
Converts the first attribute from a number into a (Year/Month) interval value of time. |
NUMTODSINTERVAL(<att1>, <att2>) |
1. Column/Number 2. ‘DAY’/’HOUR’/’MINUTE’/’SECOND’ |
Converts the first attribute from a number into a (Day/Hour/Minute/Second) interval of time. |
CAST(<att1> AS <att2>) |
1. Any 2. Any |
Converts the first attribute to the second attribute. This is particularly useful for converting text representations of datetime information into datetime formats, particularly TIMESTAMP WITH LOCAL TIME ZONE. |
Function | Attributes | Description |
---|---|---|
NULLIF(<att1>,<att2>) |
1. Any 2. Any |
Returns a NULL if attribute 1 and 2 match. |
NVL(<att1>,<att2>) |
1. Column/Any 2. String/Number |
Returns a new string/number for every NULL or EMPTY value in a column. |
The Attribute of a function can also be a nested function. Example:
SELECT CONCAT(
CONCAT('Pet Species: ', UPPER(species)),
CONCAT(' --> Name: ', UPPER(name))
) AS "Pets" FROM ex_pets;
Pets |
---|
Pet Species: DOG —> Name: OLIVER |
Pet Species: CAT —> Name: TIGER |
Pet Species: CAT —> Name: SILVERBELL |
Pet Species: DOG —> Name: ANNIE |
Pet Species: GUINEA_PIG —> Name: SUSIE |
A Group Function takes many inputs, but only returns a single output. Here are some of them:
Function | Attributes | Description |
---|---|---|
AVG(<att>) |
1. Column/Number | Returns the average of all the values in a column. |
COUNT(<att>) |
1. Column/Any/* | Returns the number of (non-null) values in a column. If a * is used as the attribute, the amount of rows in the table is returned. DISTINCT and ALL can be used before the attribute. |
MIN(<att>) |
1. Column/Any | Returns the minimum (smallest) value in a column. Lower numbers, earlier dates, and first alphabetical letters or lower charter type numbers are MIN. |
MAX(<att>) |
1. Column/Any | Returns the maximum (largest) value in a column. Higher numbers, later dates, and last alphabetical letters or higher character type numbers are MAX. |
MEDIAN(<att>) |
1. Column/Number | Sorts the data, and returns the middle value. If there is an even number of values, the two middle values are interpolated to create a single value. |
RANK(<att1>) WITHIN GROUP (ORDER BY <att2>) |
1. Column(s) 2. Column(s) |
Calculates the rank of a value within a group of values. Ranks may not be consecutive numbers since SQL counts tied rows individually, so if three rows are tied for first, they will each be ranked 1, 1, and 1, and the next row will be ranked 4. |
DENSE_RANK(<att1>) WITHIN GROUP (ORDER BY <att2>) |
1. Column(s) 2. Column(s) |
Same as RANK, but will not skip the next sequential number(s) following a tie, so if three rows are tied for first, they will each be ranked 1, 1, and 1, and the next row will be ranked 2. |
SUM(<att>) |
1. Column/Number | Returns the sum of all the values in a column. |
For the date: Tuesday, March 25th, 2009 at 9:50:15 PM
Format Element | How it Works | Example Value Returned |
---|---|---|
FM | Used in combination with other elements to direct the suppression of leading or trailing blanks | |
DD | Numeric day of the month | 25 |
DDD | Day of the year, 1 through 366. | 84 |
DDTH | Numeric day of the month with tense | 25th |
DY Dy dy | Three letter abbreviation of day | TUE Tue tue |
DAY Day day | Full name of the day | TUESDAY Tuesday tuesday |
HH, HH12 | Hours of the day in 12-hour-clock | 09 |
HH24 | Hour of the day, 1 through 24. 12 midnight is represented by 00 | 21 |
MI | Minute. 0 through 59 | 50 |
MON Mon mon | Three letter abbreviation of month | MAR Mar mar |
MM | Two digit value of month | 04 |
MONTH Month month | Full name of month | MARCH March march |
Q | Quarter of year. | 1 |
RM | Roman numeral month. | III |
RR | Accepts twentieth-century dates in the twenty-first century using only two digits. 00 through 49 is interpreted as 2000 through 2049. 50 through 99 is interpreted as 1950 through 1999. | 09 |
RRRR | The four-digit year. If provided a two-digit year, it returns the same value as RR. | 2009 |
SS | Seconds, 0 through 59. | 15 |
SSSS | Seconds past midnight, 0 through 86399. | 78615 |
TS | The short time format. Allowable only when specified with the DL or DS format model element, separated by white space. | 9:50:15 PM |
AM, PM | The AM/PM portion of the 12-hour clock. Either AM or PM will display the same result | PM |
WW | The week of the year, 1 through 53. Week 1 starts on the first day of the year and ends on the seventh day of the year. | 13 |
W | The week of the month, 1 through 5. Week 1 starts on the first day of the month and ends on the seventh day of the month. | 3 |
YYYY | Four digit year | 2009 |
Y,YYY | Year with a comma | 2,009 |
YEAR | Year Spelled out (Upper) | TWO THOUSAND NINE |
Year | Year Spelled out (InitCap) | Two Thousand Nine |
year | Year Spelled out (lower) | two thousand nine |
“text” | Literal value. Display as is | text |
For the number 2780
Format Element | How it works | Format Usage | Example Value Returned |
---|---|---|---|
9 | Wildcard for a number | 999 | 278 |
0 | Adds leading and fractional 0(s) to match width of format | 00000 | 02780 |
$ | Indicates location of dollar sign | $9999 | $2780 |
. | Indicates location of decimal | 9999.99 | 2780.00 |
, | Indicates location of comma for thousands | 99,999 | 2,780 |
B | Leading blank for integers. | B9999 | 2780 |
C | The ISO currency symbol as defined in the NLS_ISO_CURRENCY parameter. | C9999 | $2780 |
D | Returns the current decimal character as defined by the NLS_NUMERIC_CHARACTERS parameter. The default value is a period. | 9999D99 | 2780.00 |
EEEE | Returns a value in scientific notation. | 9.9EEE | 2.8x10^2 |
G | Returns the group separator (e.g., a comma). | 9G999 | 2,780 |
L | Returns the local currency symbol. | L9999 | $2780 |
MI | Returns negative value with trailing minus sign; returns positive value with a trailing blank. | 9999MI | 2780 |
PR | Returns negative values in angle brackets. | 9999PR | 2780 |
RN rn |
Returns values in Roman numerals, uppercase. Put RN in lowercase (as in "rn") for Roman numerals in lowercase. | RN | CCLXXVIII |
S (prefix) | Returns negative values with a leading minus sign, positive values with a leading positive sign. Note: Can appear only in the first or last position of a format mask. |
S9999 | +2780 |
S (suffix) | Returns negative values with a trailing minus sign, positive values with a trailing positive sign. Note: Can appear only in the first or last position of a format mask. |
9999S | 2780+ |
TM | The text minimum number format model returns the smallest number of characters possible. | TM | 2780 |
U | Returns the Euro currency symbol or whatever is indicated by the NLS_DUAL_CURRENCY parameter. | U9999 | €2780 |
V | Returns a value multiplied by 10n, where n is the number of 9s after the V. | 9999V99 | 278000 |
X | Returns the hexadecimal value. | XXXX | 0ADC |
NLS Parameter | Description |
---|---|
NLS_NUMERIC_CHARACTERS = 'dg' | d = decimal character (see D in Table 6-1) g = group separator (see G in Table 6-1) |
NLS_CURRENCY = 'text' | text = local currency symbol (see L in Table 6-1) |
NLS_ISO_CURRENCY = 'currency' | currency = international currency symbol (see C in Table 6-1) |
Analytical functions have the ability to return multiple rows from within a group of rows. Because a given SELECT
statement processes data row by row, an analytic function can operate across a window of rows. You cannot include analytic functions anywhere other than the SELECT
list or the ORDER BY
clause.
This is a requirement for any analytic function to work. It essentially defines the scope of the function. Without a parameter, OVER
will identify the scope of the function to span across the entire table. You can use OVER
to turn an otherwise scalar function into an analytic function.
By itself, the SUM
function exists as a scalar function and will return one result per row.
SELECT SUM(worker_hours) FROM employees;
This would return something like:
+--------------+
| worker_hours |
+--------------+
| 62 |
+--------------+
However, if you were to use the OVER function, you could display the accumulated hours for each row:
SELECT worker_dept, worker_id, worker_hours,
SUM(worker_hours) OVER (ORDER BY worker_hours) "Accumulated Hours"
FROM employees;
This would return something like:
+-------------+-----------+--------------+-------------------+
| worker_dept | worker_id | worker_hours | Accumulated Hours |
+-------------+-----------+--------------+-------------------+
| mnft | 132 | 18 | 18 |
| mnft | 127 | 20 | 38 |
| sale | 212 | 24 | 62 |
+-------------+-----------+--------------+-------------------+
PARTITION BY
will allow you to subdivide the results of the analytical function based on a column.
Lets modify the query above to include a partition that will separate the Accumulated Hours by the worker_dept
id.
SELECT worker_dept, worker_id, worker_hours,
SUM(worker_hours) OVER (
PARTITION BY worker_dept ORDER BY worker_hours ) "Accumulated Hours"
FROM employees;
This would return something like:
+-------------+-----------+--------------+-------------------+
| worker_dept | worker_id | worker_hours | Accumulated Hours |
+-------------+-----------+--------------+-------------------+
| mnft | 132 | 18 | 18 |
| mnft | 127 | 20 | 38 |
| sale | 212 | 24 | 24 |
+-------------+-----------+--------------+-------------------+
Notice that the accumulated hours counter reset when the worker_dept
changed to sale
.
The RANK
function calculates the rank of a value within a group of values. Ranks may not be consecutive numbers since SQL counts tied rows individually, so if three rows are tied for first, they will each be ranked 1, 1, and 1, and the next row will be ranked 4.
SELECT RANK() OVER (PARTITION BY [partition1] ORDER BY [expression1])
Select worker_dept, worker_hours,
RANK() OVER (PARTITION BY worker_dept ORDER BY worker_hours) AS "Rank"
FROM employees;
This would return something like:
+-------------+--------------+------+
| worker_dept | worker_hours | Rank |
+-------------+--------------+------+
| mnft | 18 | 1 |
| mnft | 20 | 2 |
| sale | 24 | 1 |
+-------------+--------------+------+
SQL is ordering and grouping the worker_dept and then ranking the worker_hours in each department.
This works the same as RANK
, but will not skip the next sequential number(s) following a tie, so if three rows are tied for first, they will each be ranked 1, 1, and 1, and the next row will be ranked 2.
SELECT DENSE_RANK() OVER (PARTITION BY [partition1] ORDER BY [expression1])
The functions FIRST
and LAST
are similar. Both are considered aggregate functions as well as analytic functions. For a given range of sorted values, each returns either the first value (FIRST) or the last value (LAST) of the population of rows defining e1, in the sorted order.
SELECT [aggregate_function] KEEP (DENSE_RANK FIRST ORDER BY [expression1])
SELECT [aggregate_function] KEEP (DENSE_RANK LAST ORDER BY [expression1])
For a given row within a window, LAG
shows a column's value in the prior row, and LEAD
shows the next value.
SELECT worker_dept, worker_id, worker_hours,
LAG(worker_hours) OVER (ORDER BY worker_hours) "Lag",
LEAD(worker_hours) OVER (ORDER BY worker_hours) "Lead",
FROM employees;
This would return something like:
+-------------+-----------+--------------+-----+------+
| worker_dept | worker_id | worker_hours | Lag | Lead |
+-------------+-----------+--------------+-----+------+
| mnft | 132 | 18 | | 20 |
| mnft | 127 | 20 | 18 | 24 |
| sale | 212 | 24 | 20 | |
+-------------+-----------+--------------+-----+------+
The STDDEV
function returns the sample standard deviation of a set of numeric values. Standard deviation is a mathematical value representing the degree of distribution for a given numeric value within a larger range of values. The standard deviation for a given number is the square root of its variance.
STDDEV
is useful when analyzing a set of numbers by looking for correlations among those numbers—those numbers whose STDDEV
values are low are closer to the mean than those numbers with relatively higher values of STDDEV
.
Lets assume the numbers are: 1, 2, 4, and 13
The PERCENTILE_CONT
function uses linear interpolation between a given row's row set ceiling and floor to calculate the percentile value for that given row. The PERCENTILE_CONT
function takes as input a percentage you specify, say, .4 (for 40 percent), and it analyzes a group of rows you specify; then it determines the equivalent numeric value equal to that percentage of the whole, by using linear extrapolation.