It's All About ORACLE

Oracle - The number one Database Management System. Hope this Blog will teach a lot about oracle.

Invisible Indexes in Oracle Database 11g Release 1

Oracle 11g allows indexes to be marked as invisible. Invisible indexes are maintained like any other index, but they are ignored by the optimizer unless the OPTIMIZER_USE_INVISIBLE_INDEXES parameter is set to TRUE at the instance or session level. Indexes can be created as invisible by using the INVISIBLEkeyword, and their visibility can be toggled using the ALTER INDEX command.
CREATE INDEX index_name ON table_name(column_name) INVISIBLE;

ALTER INDEX index_name INVISIBLE;
ALTER INDEX index_name VISIBLE;
The following script creates and populates a table, then creates an invisible index on it.
CREATE TABLE ii_tab (
  id  NUMBER
);

BEGIN
  FOR i IN 1 .. 10000 LOOP
    INSERT INTO ii_tab VALUES (i);
  END LOOP;
  COMMIT;
END;
/

CREATE INDEX ii_tab_id ON ii_tab(id) INVISIBLE;

EXEC DBMS_STATS.gather_table_stats(USER, 'ii_tab', cascade=> TRUE);
A query using the indexed column in the WHERE clause ignores the index and does a full table scan.
SET AUTOTRACE ON
SELECT * FROM ii_tab WHERE id = 9999;

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |     3 |     7   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| II_TAB |     1 |     3 |     7   (0)| 00:00:01 |
----------------------------------------------------------------------------
Setting the OPTIMIZER_USE_INVISIBLE_INDEXES parameter makes the index available to the optimizer.
ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES=TRUE;
SELECT * FROM ii_tab WHERE id = 9999; ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| II_TAB_ID | 1 | 3 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------
Making the index visible means it is still available to the optimizer when the OPTIMIZER_USE_INVISIBLE_INDEXES parameter is reset.
ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES=FALSE;
ALTER INDEX ii_tab_id VISIBLE;

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |     3 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| II_TAB_ID |     1 |     3 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------
Invisible indexes can be useful for processes with specific indexing needs, where the presence of the indexes may adversely affect other functional areas. They are also useful for testing the impact of dropping an index.


The current visibility status of an index is indicated by the VISIBILITY column of the [DBA|ALL|USER]_INDEXES views.

External Table Syntax from SQL*Loader

Earlier in this article, a number of advantages of external tables over SQL*Loader were listed. It was also noted that in some cases we might use external tables as a replacement for SQL*Loader but continue to use staging tables. A new feature of SQL*Loader helps us to develop this quickly; this is the EXTERNAL_TABLE option. This option generates an external table "load" script from an existing SQL*Loader control file. Once we have this, the script can be cleaned and modified as required. For existing loads, this dramatically removes the development time of external tables and removes much of the time spent learning the new syntax through the inevitable "trial and error" phases with the access parameters.

In the following example, we'll create a small SQL*Loader control file to load a staging table named EMP_STG. We will use the new option EXTERNAL_TABLE=GENERATE_ONLY in the control file and run it to generate a SQL script. 

options (external_table=generate_only)
load data
infile 'd:\oracle\dir\emp.dat'
badfile 'd:\oracle\dir\emp.bad'
truncate
into table emp_stg
fields terminated by ','
trailing nullcols
(
  empno
, ename
, job
, mgr
, hiredate date "dd/mm/yyyy"
, sal
, comm
, deptno
)

Note that invoking SQL*Loader for this file will not load any data. First, we can see the control file. Note the options clause. The external_table=generate_only clause makes SQL*Loader run "silently" and generate a logfile only. SQL*Loader is invoked as normal. The resulting logfile contains the following SQL statements based on the contents of the control file. These statements support an external table load to replace our SQL*Loader job.

CREATE DIRECTORY statements needed for files
------------------------------------------------------------------------
CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 AS 'd:\oracle\dir\'


CREATE TABLE statement for external table:
------------------------------------------------------------------------
CREATE TABLE "SYS_SQLLDR_X_EXT_EMP_STG"
(
  EMPNO NUMBER(4),
  ENAME VARCHAR2(10),
  JOB VARCHAR2(9),
  MGR NUMBER(4),
  HIREDATE DATE,
  SAL NUMBER(7,2),
  COMM NUMBER(7,2),
  DEPTNO NUMBER(2)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
    BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'emp.bad'
    LOGFILE 'emp.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "," LDRTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL FIELDS
    (
      EMPNO CHAR(255)
        TERMINATED BY ",",
      ENAME CHAR(255)
        TERMINATED BY ",",
      JOB CHAR(255)
        TERMINATED BY ",",
      MGR CHAR(255)
        TERMINATED BY ",",
      HIREDATE CHAR(255)
        TERMINATED BY ","
        DATE_FORMAT DATE MASK "dd/mm/yyyy",
      SAL CHAR(255)
        TERMINATED BY ",",
      COMM CHAR(255)
        TERMINATED BY ",",
      DEPTNO CHAR(255)
        TERMINATED BY ","
    )
  )
  location
  (
    'emp.dat'
  )
)REJECT LIMIT UNLIMITED


INSERT statements used to load internal tables:
------------------------------------------------------------------------
INSERT /*+ append */ INTO EMP_STG
(
  EMPNO,
  ENAME,
  JOB,
  MGR,
  HIREDATE,
  SAL,
  COMM,
  DEPTNO
)
SELECT
  EMPNO,
  ENAME,
  JOB,
  MGR,
  HIREDATE,
  SAL,
  COMM,
  DEPTNO
FROM "SYS_SQLLDR_X_EXT_EMP_STG"


statements to cleanup objects created by previous statements:
------------------------------------------------------------------------
DROP TABLE "SYS_SQLLDR_X_EXT_EMP_STG"

DROP DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000

Oracle has done the work for us and provided a script to create our EMP_XT external table. We will probably choose to clean up certain elements of the generated code (such as the object names, for example), but the hard work of converting a SQL*Loader load to an external table load is done.

Advantages of EXTERNAL TABLE Over SQL*Loader

We have seen some examples of external table syntax but have not yet explored why we might use them over SQL*Loader. It is the case that SQL*Loader can parse and load almost any flat-file we wish to throw at it. External tables, on the other hand, cater for the more common processing requirements. Despite this, their advantages over SQL*Loader are numerous, some of which are as follows:
  • Ease of use: External tables can be selected, sorted, filtered, joined, intersected, minused, unioned and so on using SQL, the language most familiar database to Oracle developers. Anything we can do in a SELECT statement with a "normal" table can be done with an external table. This makes working with external flat-files very simple.
  • Performance (1): reads/loads involving external tables can be paralleled  When combined with direct path (in the case of INSERTs), this dramatically outperforms SQL*Loader which has to load in serial mode. Parallelism can be set at the external table level or more selectively in hints.
  • Performance (2): as seen above, loading a table from an external table is faster than SQL*Loader due to parallel query and DML. In addition to this, ETL processes that read directly from external tables rather than pre-loaded staging tables are faster because they do not incur the SQL*Loader step (i.e. data is only read once). For example, if an ETL process takes 10 minutes in SQL*Loader and 10 minutes in SQL or PL/SQL loading, using external tables directly in the latter process can eradicate up to 50% of the ETL time.
  • Disk Space: External tables do not require any database space; only the space consumed by the flat-files on the filesystem. SQL*Loader requires two copies of each file (one inside the database in a staging table), so external tables are "cheaper" on disk.
  • Error-trapping: SQL*Loader returns different codes based on its outcome. For batch systems, this can be tricky because the error code is sometimes ambiguous. For example, exit code 2 means "a bad file has been created". Of course, this may or may not be a cause for concern. If we allow 50 bad records (errors=50), then the fact that a bad file contains 1-49 bad records should not signal a batch failure. We therefore need to write some clever code to interrogate the badfile or logfile to determine whether this is a "bad code 2" or "acceptable code 2". With external tables, it is much more simple. Until we reach the REJECT LIMIT, the SQL statement continues or completes successfully. If we reach this limit, the statement fails.
  • Debugging and support: External tables are equally useful as debugging and support aids. For example, we can create further external tables over logfiles and badfiles to investigate errors easily with SQL. DBAs can also create external tables over critical files such as the alert log.
We can see that the advantages of external tables are compelling. In our batch systems, we can use them in one of two ways and still achieve their benefits. First, and for maximum benefit, we can completely remove the need for staging tables altogether and use the tables directly in our ETL code. Second, and for medium benefit, we can use them as a straight replacement for SQL*Loader and therefore load our staging tables in parallel direct-path.

There are, however, a few issues to be aware of that might limit the scope of how we use these tables. For example, external tables can only have one location set at a time (i.e. different sessions cannot share a table but set different locations). This means that their use for loading is serialised for the length of time a specific location is required and in use (and probably should be protected as such). If multiple sessions need to share the same table but load different files at the same time, then either multiple tables must be created or some form of locking will be required. For the latter scenario, the length of time a table is locked should be reduced to a minimum to maximise concurrency. For this reason, multi-user external tables will probably not figure in long-running batch processes.

In addition to this, external tables cannot be indexed. For most staging tables, this won't be an issue as we would prefer hash joins for ETL batch queries, but in some cases indexes might be required. For these, it would probably be sensible to use external tables to load the physical staging tables rather than use them directly in the ETL queries.

External Table in Oracle 9i Explored

External tables enable us to read flat-files (stored on the O/S) using SQL. They have been introduced in Oracle 9i as an alternative to SQL*Loader. External tables are essentially stored SQL*Loader control files, but because they are defined as tables, we can access our flat-file data using all available read-only SQL and PL/SQL operations. We can also read flat-files in parallel and join files to other files or tables, views and so on.

This article is a short introduction to external tables and how we can benefit from them.

Creating an external table

We will begin by creating a simple external table that mimics the EMP table. Before we can create the table, however, we require an Oracle directory. A directory (also new in 9i) is an Oracle object that "points" to a filesystem location where our source flat-files are stored (as an aside, note that directory objects can also be used to replace utl_file_dir in UTL_FILE read-write operations). In the following example, we will create a directory named XT_DIR.
SQL> CREATE DIRECTORY xt_dir AS 'd:\oracle\dir';

Directory created.
Note that to create a directory, we require the CREATE ANY DIRECTORY system privilege. Directories are system-wide, so if we cannot get the privilege, a DBA can create the directory and grant READ or WRITE to our users as required (READ for external table users and READ/WRITE as appropriate for UTL_FILE users).

We will reference this directory in our external table definition as follows. Some explanation of the new syntax follows the table creation.
SQL> CREATE TABLE emp_xt
  2  (  empno      NUMBER(4)
  3  ,  ename      VARCHAR2(10)
  4  ,  job        VARCHAR2(9)
  5  ,  mgr        NUMBER(4)
  6  ,  hiredate   DATE
  7  ,  sal        NUMBER(7,2)
  8  ,  comm       NUMBER(7,2)
  9  ,  deptno     NUMBER(2)
 10  )
 11  ORGANIZATION EXTERNAL
 12  (
 13    TYPE ORACLE_LOADER
 14    DEFAULT DIRECTORY xt_dir
 15    ACCESS PARAMETERS
 16    (
 17       RECORDS DELIMITED by NEWLINE
 18       BADFILE 'emp_xt.bad'
 19       LOGFILE 'emp_xt.log'
 20       NODISCARDFILE
 21       FIELDS TERMINATED BY ','
 22       (  empno
 23       ,  ename
 24       ,  job
 25       ,  mgr
 26       ,  hiredate  CHAR(20) DATE_FORMAT DATE MASK "DD/MM/YYYY"
 27       ,  sal
 28       ,  comm
 29       ,  deptno
 30       )
 31    )
 32    LOCATION ('emp.dat')
 33  )
 34  REJECT LIMIT UNLIMITED;

Table created.
We have now created an external table and we can see a wide range of new and extended syntax. In particular, we can see something that looks similar to (but not quite the same as) a SQL*Loader control file. Some points to note are as follows.
  • Line 11: the ORGANIZATION EXTERNAL clause tells Oracle that we are creating an external table;
  • Line 13: the ORACLE_LOADER driver is a new type that "powers" external tables;
  • Line 14: we can set a DEFAULT DIRECTORY once for the entire table if we wish. This states that, unless we tell Oracle otherwise, all files created or read as part of this table will reside in this named directory. In most cases, we will wish to write log/bad/discardfiles to a logging directory and read our incoming data files from a data directory. For simplicity, we have used a single XT_DIR in our examples for all files;
  • Lines 15-31: the ACCESS PARAMETERS clause contains the SQL*Loader-style reference to enable Oracle to parse the flat-file into rows and columns. At this time, external tables do not offer the extensive range of parse options that SQL*Loader provides, yet still cater for most loading requirements. Note that if we have made any syntactical errors in our ACCESS PARAMETERS, we can still create the external table. Access parameters themselves are not parsed until we issue a SELECT against the external table;
  • Lines 22-30: in this example, we have listed each of the fields in the file and table. Like SQL*Loader, the ORACLE_LOADER external table driver assumes that all incoming fields are CHAR(255) unless we state otherwise. In the EMP_XT table, we have a DATE column that needs to be converted as on line 26, hence the need to list all fields. Otherwise, we can simply use the default parsing specified in the FIELDS clause, which in our simple example only lists the field delimiter;
  • Line 32: the LOCATION clause is where we specify the input file(s). Note that we don't have a file named "emp.dat" at the time of table creation; this is not necessary (we shall discuss this later); and
  • Line 34: similar to the ERRORS= clause of SQL*Loader, we can specify a REJECT LIMIT for an external table. This is the number of bad records we will tolerate before the load is failed.

using external tables

Now we have created our EMP_XT external table, we can use it as follows. Remember at this stage we don't have the emp.dat file that the table expects (as per the LOCATION) clause). We can generate a simple csv-file from the existing EMP table as follows (using the oracle-developer.net DATA_DUMP utility).
SQL> BEGIN
  2     data_dump( query_in        => 'SELECT * FROM emp',
  3                file_in         => 'emp.dat',
  4                directory_in    => 'XT_DIR',
  5                nls_date_fmt_in => 'DD/MM/YYYY' );
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL> host dir d:\oracle\dir\emp.dat
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of d:\oracle\dir

04/08/2002  18:57               633 emp.dat
               1 File(s)            633 bytes
               0 Dir(s)  26,696,744,960 bytes free

Now we are ready to select from our external table.
SQL> SELECT * FROM emp_xt;

     EMPNO ENAME      JOB              MGR HIREDATE          SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ---------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17/12/1980        800                    20
      7499 ALLEN      SALESMAN        7698 20/02/1981       1600        300         30
      7521 WARD       SALESMAN        7698 22/02/1981       1250        500         30
      7566 JONES      MANAGER         7839 02/04/1981       2975                    20
      7654 MARTIN     SALESMAN        7698 28/09/1981       1250       1400         30
      7698 BLAKE      MANAGER         7839 01/05/1981       2850                    30
      7782 CLARK      MANAGER         7839 09/06/1981       2450                    10
      7788 SCOTT      ANALYST         7566 09/12/1982       3000                    20
      7839 KING       PRESIDENT            17/11/1981       5000                    10
      7844 TURNER     SALESMAN        7698 08/09/1981       1500          0         30
      7876 ADAMS      CLERK           7788 12/01/1983       1100                    20
      7900 JAMES      CLERK           7698 03/12/1981        950                    30
      7902 FORD       ANALYST         7566 03/12/1981       3000                    20
      7934 MILLER     CLERK           7782 23/01/1982       1300                    40

14 rows selected.
It is clear from the above that if we can query a flat-file via an external table, then we have fundamentally changed the way in which we can load external data into our database! In fact, we may not even need to load the file at all as we can query it directly (why bother to stage data that will be used once and replaced every day?).

External Table metadata

We can see information about our EMP_XT table in several dictionary views as follows (note that XXX is a placeholder for USER, ALL and DBA).
  • XXX_TABLES;
  • XXX_ALL_TABLES;
  • XXX_EXTERNAL_TABLES; and
  • XXX_EXTERNAL_LOCATIONS.
The last two are specific to external tables only. For general information on our external table we can query USER_EXTERNAL_TABLES, which is structured as follows.
SQL> desc USER_EXTERNAL_TABLES
 Name                                Null?    Type
 ----------------------------------- -------- -----------------
 TABLE_NAME                          NOT NULL VARCHAR2(30)
 TYPE_OWNER                                   CHAR(3)
 TYPE_NAME                           NOT NULL VARCHAR2(30)
 DEFAULT_DIRECTORY_OWNER                      CHAR(3)
 DEFAULT_DIRECTORY_NAME              NOT NULL VARCHAR2(30)
 REJECT_LIMIT                                 VARCHAR2(40)
 ACCESS_TYPE                                  VARCHAR2(7)
 ACCESS_PARAMETERS                            VARCHAR2(4000)
As its name suggests, the XXX_EXTERNAL_LOCATIONS views list the file(s) that an external table is currently "pointed to". We can read multiple files via a single external table and these can be in different directories. Currently, our EMP_XT location is as follows.
SQL> SELECT *
  2  FROM   user_external_locations
  3  WHERE  table_name = 'EMP_XT';

TABLE_NAME   LOCATION     DIRECTORY_OWNER   DIRECTORY_NAME
------------ ------------ ----------------- ----------------
EMP_XT       emp.dat      SYS               XT_DIR

1 row selected.

Modifying Location

Remember that the emp.dat file did not exist at the time we created the EXP_XT table. If we try to query an external table that has an incorrect location clause, we receive the following error.
SQL> host del d:\oracle\dir\emp.dat

SQL> host dir d:\oracle\dir\emp.dat
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of d:\oracle\dir

File Not Found

SQL> SELECT * FROM emp_xt;
SELECT * FROM emp_xt
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file emp.dat in XT_DIR not found
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1
Our EMP_XT external table currently has a single incoming flat-file (emp.dat) and this doesn't exist. In operational systems, we are more likely to receive new files every day and these are often uniquely named to distinguish one day's delivery from another (e.g. with a business date of some format in the file name). The location clause of an external table can be modified to cater for this without invalidating any dependencies (such as views or packages).

In the following example, we will create an emp_20020804.dat file and modify the EMP_XT table to reference this new file. We will complete the example by selecting from it.
SQL> BEGIN
  2     data_dump( query_in        => 'SELECT * FROM emp',
  3                file_in         => 'emp_20020804.dat',
  4                directory_in    => 'XT_DIR',
  5                nls_date_fmt_in => 'DD/MM/YYYY' );
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL> ALTER TABLE emp_xt LOCATION ('emp_20020804.dat');

Table altered.

SQL> SELECT * FROM emp_xt;

     EMPNO ENAME      JOB              MGR HIREDATE          SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ---------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17/12/1980        800                    20
      7499 ALLEN      SALESMAN        7698 20/02/1981       1600        300         30
      7521 WARD       SALESMAN        7698 22/02/1981       1250        500         30
      7566 JONES      MANAGER         7839 02/04/1981       2975                    20
      7654 MARTIN     SALESMAN        7698 28/09/1981       1250       1400         30
      7698 BLAKE      MANAGER         7839 01/05/1981       2850                    30
      7782 CLARK      MANAGER         7839 09/06/1981       2450                    10
      7788 SCOTT      ANALYST         7566 09/12/1982       3000                    20
      7839 KING       PRESIDENT            17/11/1981       5000                    10
      7844 TURNER     SALESMAN        7698 08/09/1981       1500          0         30
      7876 ADAMS      CLERK           7788 12/01/1983       1100                    20
      7900 JAMES      CLERK           7698 03/12/1981        950                    30
      7902 FORD       ANALYST         7566 03/12/1981       3000                    20
      7934 MILLER     CLERK           7782 23/01/1982       1300                    40

14 rows selected.

Investigating Errors

External tables can optionally generate various log files in the same manner as SQL*Loader. In our EMP_XT table, remember that we opted to create a logfile and badfile but didn't need a discardfile (as we do not have a LOAD WHEN clause). For developers who are unfamiliar with these three SQL*Loader-style files, their short descriptions are:
  • logfile: contains information on how the input files were parsed, the positions and error messages for any rejected records and some other general information on the load such as the number of row successfully read;
  • badfile: contains the rejected records (i.e. the records that couldn't be loaded for the reasons given in the logfile); and
  • discardfile: contains the records that failed the LOAD WHEN clause (this clause is a simple filter to prevent records with certain data characteristics from being loaded).
To investigate errors with an external table read or load, therefore, we have the same information available to us as we did with SQL*Loader. It is likely that, when putting together access parameters for the first few times, we will make mistakes. Oracle will not parse the parameters when we create the table; rather they will be invoked when we try to read from the external table (i.e. SELECT from it). Any syntactical errors will show up as a KUP-% message and will need to be investigated in line with the online documentation (link provided at the end of this article). Errors with the data, however, can be investigated by reference to the output files.

Note that logfiles are appended on each select from an external table so in a regular batch system we might wish to uniquely name each output file for support and diagnostic purposes. Oracle provides two switches to append to the file names in the LOGFILE, BADFILE and DISCARDFILE clauses. These are %p (process ID) and %a (agent number for parallel query) and are specified in the access parameters (e.g. LOGFILE 'emp_xt_%p.log'). Assuming each select from the external table is performed in a new session, each output file is generated with a new process ID.

It is unfortunate that we cannot use ALTER TABLE directly to modify the output file names as we are more likely to want to append a date stamp to them (we can't do this because the file names are part of the ACCESS PARAMETERS clause). We can, however, take advantage of the fact that we can alter the access parameters to build a utility based somewhat loosely on the following example. This is a very rough demonstration of the principles we can adopt to make a log/bad/discardfile naming utility (obviously a "real" utility would be far more flexible and parameterised). The following anonymous block finds the existing logfile name and replaces it with "today's" logfile before using the external table.
SQL> DECLARE
  2
  3     v_access  VARCHAR2(32767);
  4     v_replace VARCHAR2(128);
  5     v_oldfile VARCHAR2(128);
  6     v_newfile VARCHAR2(128) := 'emp_20020804.log';
  7
  8  BEGIN
  9
 10     SELECT access_parameters INTO v_access
 11     FROM   user_external_tables
 12     WHERE  table_name = 'EMP_XT';
 13
 14     IF INSTR(v_access,'NOLOGFILE') > 0 THEN
 15        v_access := REPLACE(
 16                       v_access, 'NOLOGFILE',
 17                       'LOGFILE ''' || v_newfile || ''''
 18                       );
 19     ELSE
 20        v_oldfile := SUBSTR(
 21                        v_access,
 22                        INSTR(v_access,'LOGFILE')+8,
 23                        (INSTR(v_access,'.log')+4)-(INSTR(v_access,'LOGFILE')+7)
 24                        );
 25        v_replace := REPLACE(REPLACE(v_oldfile, '"', ''''), '''');
 26        v_access := REPLACE(v_access, v_replace, v_newfile);
 27     END IF;
 28
 29     EXECUTE IMMEDIATE
 30        'ALTER TABLE emp_xt ACCESS PARAMETERS (' || v_access || ')';
 31
 32  END;
 33  /

PL/SQL procedure successfully completed.

SQL> SELECT access_parameters
  2  FROM   user_external_tables
  3  WHERE  table_name = 'EMP_XT';

ACCESS_PARAMETERS
-----------------------------------------------------------------------
RECORDS DELIMITED by NEWLINE
     BADFILE 'emp_xt.bad'
     LOGFILE 'emp_20020804.log'
     NODISCARDFILE
     FIELDS TERMINATED BY ','
     (  empno
     ,  ename
     ,  job
     ,  mgr
     ,  hiredate  CHAR(20) DATE_FORMAT DATE MASK "DD/MM/YYYY"
     ,  sal
     ,  comm
     ,  deptno
     )

1 row selected.

SQL> SELECT COUNT(*) FROM emp_xt;

  COUNT(*)
----------
        14

1 row selected.

SQL> host dir d:\oracle\dir\emp*
 Volume in drive D is USER
 Volume Serial Number is 7476-8930

 Directory of d:\oracle\dir

04/08/2002  19:09               633 emp_20020804.dat
04/08/2002  21:54             2,558 emp_20020804.log
04/08/2002  19:04            11,850 emp_xt.log
               3 File(s)         15,041 bytes
               0 Dir(s)  26,696,704,000 bytes free

Advantages Over SQL*Loader

We have seen some examples of external table syntax but have not yet explored why we might use them over SQL*Loader. It is the case that SQL*Loader can parse and load almost any flat-file we wish to throw at it. External tables, on the other hand, cater for the more common processing requirements. Despite this, their advantages over SQL*Loader are numerous, some of which are as follows:
  • Ease of use: External tables can be selected, sorted, filtered, joined, intersected, minused, unioned and so on using SQL, the language most familiar database to Oracle developers. Anything we can do in a SELECT statement with a "normal" table can be done with an external table. This makes working with external flat-files very simple;
  • Performance (1): reads/loads involving external tables can be paralleled  When combined with direct path (in the case of INSERTs), this dramatically outperforms SQL*Loader which has to load in serial mode. Parallelism can be set at the external table level or more selectively in hints;
  • Performance (2): as seen above, loading a table from an external table is faster than SQL*Loader due to parallel query and DML. In addition to this, ETL processes that read directly from external tables rather than pre-loaded staging tables are faster because they do not incur the SQL*Loader step (i.e. data is only read once). For example, if an ETL process takes 10 minutes in SQL*Loader and 10 minutes in SQL or PL/SQL loading, using external tables directly in the latter process can eradicate up to 50% of the ETL time;
  • Disk Space: External tables do not require any database space; only the space consumed by the flat-files on the filesystem. SQL*Loader requires two copies of each file (one inside the database in a staging table), so external tables are "cheaper" on disk;
  • Error-trapping: SQL*Loader returns different codes based on its outcome. For batch systems, this can be tricky because the error code is sometimes ambiguous. For example, exit code 2 means "a bad file has been created". Of course, this may or may not be a cause for concern. If we allow 50 bad records (errors=50), then the fact that a bad file contains 1-49 bad records should not signal a batch failure. We therefore need to write some clever code to interrogate the badfile or logfile to determine whether this is a "bad code 2" or "acceptable code 2". With external tables, it is much more simple. Until we reach the REJECT LIMIT, the SQL statement continues or completes successfully. If we reach this limit, the statement fails;
  • Debugging and support: External tables are equally useful as debugging and support aids. For example, we can create further external tables over logfiles and badfiles to investigate errors easily with SQL. DBAs can also create external tables over critical files such as the alert log.
There are, however, a few issues to be aware of that might limit the scope of how we use these tables. For example, external tables can only have one location set at a time (i.e. different sessions cannot share a table but set different locations). This means that their use for loading is serialised for the length of time a specific location is required and in use (and probably should be protected as such). If multiple sessions need to share the same table but load different files at the same time, then either multiple tables must be created or some form of locking will be required. For the latter scenario, the length of time a table is locked should be reduced to a minimum to maximise concurrency. For this reason, multi-user external tables will probably not figure in long-running batch processes.

In addition to this, external tables cannot be indexed. For most staging tables, this won't be an issue as we would prefer hash joins for ETL batch queries, but in some cases indexes might be required. For these, it would probably be sensible to use external tables to load the physical staging tables rather than use them directly in the ETL queries.

More on Performance

All external table reads are direct path and with direct path inserts and parallel DML, external tables will usually be quicker than SQL*Loader (which is serial). We've also seen that ETL processes involving external tables can be faster because they do not require the "lead-in" time of loading a staging table first. However, we have not yet looked at external tables compared with "internal" tables (note that this how Oracle described such tables in the SQL*Loader logfile above). In the following simple example, we will generate 1 million records in a heap table and a flat-file and compare the time taken to scan this data.

First we will create a large table based on 1 million EMP records as follows.
SQL> CREATE TABLE million_emps
  2  NOLOGGING
  3  AS
  4     SELECT e1.*
  5     FROM   emp e1
  6     ,      emp, emp, emp, emp, emp
  7     WHERE  ROWNUM <= 1000000;

Table created.
We will write these records to a flat-file for our existing EMP_XT table to use.
SQL> BEGIN
  2     data_dump( query_in        => 'SELECT * FROM million_emps',
  3                file_in         => 'million_emps.dat',
  4                directory_in    => 'XT_DIR',
  5                nls_date_fmt_in => 'DD/MM/YYYY' );
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL> ALTER TABLE emp_xt LOCATION ('million_emps.dat');

Table altered.
Using the wall-clock and autotrace, we will compare a simple fetch of the data from both the MILLION_EMPS and the EMP_XT tables, starting with the "internal" table.
SQL> set timing on
SQL> set autotrace traceonly statistics

SQL> SELECT * FROM million_emps;

1000000 rows selected.

Elapsed: 00:00:18.05

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      72337  consistent gets
       5675  physical reads
          0  redo size
   13067451  bytes sent via SQL*Net to client
     733825  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed
Next we can test the EMP_XT external table.
SQL> SELECT * FROM emp_xt;

1000000 rows selected.

Elapsed: 00:00:22.08

Statistics
----------------------------------------------------------
        192  recursive calls
          0  db block gets
       6282  consistent gets
          0  physical reads
          0  redo size
   13067451  bytes sent via SQL*Net to client
     733825  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
    1000000  rows processed
We can see that the external table is approximately 22% slower than the internal table on a single read. If we re-run the fetches several times, similar timings are recorded. Note that autotrace does not show any physical reads in its statistics for the external table (this is possibly a bug).


Error Troubleshooting:

Analytic Function - LAG, LEAD, LISTAGG, COLLECT

Both LAG and LEAD functions have the same usage, as shown below.
LAG  (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)
LEAD (value_expression [,offset] [,default]) OVER ([query_partition_clause] order_by_clause)

  • value_expression - Can be a column or a built-in function, except for other analytic functions.
  • offset - The number of rows preceeding/following the current row, from which the data is to be retrieved. The default value is 1.
  • default - The value returned if the offset is outside the scope of the window. The default value is NULL.
Looking at the EMP table, we query the data in salary (SAL) order.

SELECT empno,
       ename,
       job,
       sal
FROM   emp
ORDER BY sal;

     EMPNO ENAME      JOB              SAL
---------- ---------- --------- ----------
      7369 SMITH      CLERK            800
      7900 JAMES      CLERK            950
      7876 ADAMS      CLERK           1100
      7521 WARD       SALESMAN        1250
      7654 MARTIN     SALESMAN        1250
      7934 MILLER     CLERK           1300
      7844 TURNER     SALESMAN        1500
      7499 ALLEN      SALESMAN        1600
      7782 CLARK      MANAGER         2450
      7698 BLAKE      MANAGER         2850
      7566 JONES      MANAGER         2975
      7788 SCOTT      ANALYST         3000
      7902 FORD       ANALYST         3000
      7839 KING       PRESIDENT       5000

SQL>

LAG

The LAG function is used to access data from a previous row. The following query returns the salary from the previous row to calculate the difference between the salary of the current row and that of the previous row. Notice that the ORDER BY of the LAG function is used to order the data by salary.
SELECT empno,
       ename,
       job,
       sal,
       LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_prev,
       sal - LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_diff
FROM   emp;

     EMPNO ENAME      JOB              SAL   SAL_PREV   SAL_DIFF
---------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK            800          0        800
      7900 JAMES      CLERK            950        800        150
      7876 ADAMS      CLERK           1100        950        150
      7521 WARD       SALESMAN        1250       1100        150
      7654 MARTIN     SALESMAN        1250       1250          0
      7934 MILLER     CLERK           1300       1250         50
      7844 TURNER     SALESMAN        1500       1300        200
      7499 ALLEN      SALESMAN        1600       1500        100
      7782 CLARK      MANAGER         2450       1600        850
      7698 BLAKE      MANAGER         2850       2450        400
      7566 JONES      MANAGER         2975       2850        125
      7788 SCOTT      ANALYST         3000       2975         25
      7902 FORD       ANALYST         3000       3000          0
      7839 KING       PRESIDENT       5000       3000       2000

SQL>
In LAG(sal, 1, 0), Second parameter '1' represents how many rows back. When there is no row left it will show default value '0' which is third parameter to this function.

LEAD

The LEAD function is used to return data from the next row. The following query returns the salary from the next row to calulate the difference between the salary of the current row and the following row.

SELECT empno,
       ename,
       job,
       sal,
       LEAD(sal, 1, 0) OVER (ORDER BY sal) AS sal_next,
       LEAD(sal, 1, 0) OVER (ORDER BY sal) - sal AS sal_diff
FROM   emp;

     EMPNO ENAME      JOB              SAL   SAL_NEXT   SAL_DIFF
---------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK            800        950        150
      7900 JAMES      CLERK            950       1100        150
      7876 ADAMS      CLERK           1100       1250        150
      7521 WARD       SALESMAN        1250       1250          0
      7654 MARTIN     SALESMAN        1250       1300         50
      7934 MILLER     CLERK           1300       1500        200
      7844 TURNER     SALESMAN        1500       1600        100
      7499 ALLEN      SALESMAN        1600       2450        850
      7782 CLARK      MANAGER         2450       2850        400
      7698 BLAKE      MANAGER         2850       2975        125
      7566 JONES      MANAGER         2975       3000         25
      7788 SCOTT      ANALYST         3000       3000          0
      7902 FORD       ANALYST         3000       5000       2000
      7839 KING       PRESIDENT       5000          0      -5000

SQL>
Similar to LAG function, In LEAD(sal, 1, 0), Second parameter '1' represents how many rows forward it should get value. When there is no row left it will show default value '0' which is third parameter to this function.

LISTAGG Analystic Function in 11g Release 2

The LISTAGG analytic function was introduced in Oracle 11g Release 2, making it very easy to aggregate strings. The nice thing about this function is it also allows us to order the elements in the concatenated list. If you are using 11g Release 2 you should use this function for string aggregation.

COLUMN employees FORMAT A50

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM   emp
GROUP BY deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected

WM_CONCAT Built-in Function (Not Supported)

If you are not running 11g Release 2, but are running a version of the database where the WM_CONCAT function is present, then it is a zero effort solution as it performs the aggregation for you. It is actually an example of a user defined aggregate function described below, but Oracle have done all the work for you.
COLUMN employees FORMAT A50

SELECT deptno, wm_concat(ename) AS employees
FROM   emp
GROUP BY deptno;

    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,FORD,ADAMS,SCOTT,JONES
        30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD

3 rows selected.

Note. WM_CONCAT is an undocumented function and as such is not supported by Oracle for user applications (MOS Note ID 1336219.1). 

COLLECT function in Oracle 10g

An example on oracle-developer.net uses the COLLECT function in Oracle 10g to get the same result. This method requires a table type and a function to convert the contents of the table type to a string. I've altered his method slightly to bring it in line with this article.
CREATE OR REPLACE TYPE t_varchar2_tab AS TABLE OF VARCHAR2(4000);
/

CREATE OR REPLACE FUNCTION tab_to_string (p_varchar2_tab  IN  t_varchar2_tab,
                                          p_delimiter     IN  VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
  l_string     VARCHAR2(32767);
BEGIN
  FOR i IN p_varchar2_tab.FIRST .. p_varchar2_tab.LAST LOOP
    IF i != p_varchar2_tab.FIRST THEN
      l_string := l_string || p_delimiter;
    END IF;
    l_string := l_string || p_varchar2_tab(i);
  END LOOP;
  RETURN l_string;
END tab_to_string;
/
The query below shows the COLLECT function in action.
COLUMN employees FORMAT A50

SELECT deptno,
       tab_to_string(CAST(COLLECT(ename) AS t_varchar2_tab)) AS employees
FROM   emp
GROUP BY deptno;
       
    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        
3 rows selected.

You Might Also Like

Related Posts with Thumbnails

Pages