Technical Solutions Using Oracle and sas



Download 455.21 Kb.
Page2/2
Date28.05.2018
Size455.21 Kb.
#50895
1   2
SAS/ACCESS Products

The Access Research and Development Division (ARD) at SAS are responsible for implementation and support of the DBMS access products that interface SAS software to an underlying DMBS. We have developed and supported the Oracle interface for many years in my group on many different platforms and OS. Some basic terminology concerning SAS/ACCESS libname engine development:



SAS/ACCESS to Oracle Libname Engine – The SAS Libname Statement can be issued via the SAS 4GL or by external SAS applications via communication paths to MVA. Once issued the libname statement (e.g. libname x oracle user=scott pass=tiger;) is processed by the libname engine then connection is made to the DBMS server and the engine waits for application requests. Requests received by the libname engine are processed against the

DBMS tables identified in the request. The engine provides different modes of ACCESS against the DBMS tables (e.g. select, insert, update, delete and direct DDL and SQL statement processing).



MVA – Multi-Vendor Architecture provides SAS with a framework for portability and I/O between layers of the SAS system. The libname engine architecture is driven by a SAS I/O model that is used by all SAS applications that need to access data within the MVA framework.

SAS I/O – This identifies a system at SAS that processes application request for data. This API provides applications with a consistent methodology to access data from any source supported by SAS.

SAS Options – provides user control of the SAS/ACCESS Interface to Oracle Libname Engine and other SAS Procedures and processes. These options provide a granularity of control that allows you great latitude in tuning the SAS system and applications, such as the Oracle Libname Engine.

SAS Procedure – a program that can be executed using the SAS 4GL Language. This program performs a desired set of operations depending on parameters passed in on the execute line or set as SAS system options. Many procedures have the capability of interacting with the Oracle DBMS using the SAS I/O model and the SAS/ACCESS Interface to Oracle Libname Engine.

SAS V9a release of SAS that contains threaded I/O functionality that has been added to the SAS I/O model. This allows SAS applications and the underlying engines to access data in parallel in both engine extract and application processing functions. Implementation of this model has resulted in significant performance increases

Key features/what’s new in SAS/ACCESS Interface to Oracle in SAS V9:

There are many key features and new features in the SAS/ACCESS Interface to Oracle in SAS 9. Note that most features are geared toward performance improvements, and information display concerning how we interact with the Oracle DBMS. A general knowledge of the SAS/ACCESS Interface to Oracle product is a necessary aid in understanding the impact of these features.



The following options and features allow you to control interaction of SAS with Oracle are:

  • Threaded reads - divide resource-intensive tasks into multiple independent units of work and execute those units in parallel. This extends the SAS MVA I/O model and provides blocked data in parallel to many multi-threaded SAS applications.

  • Temporary table support - enables DBMS temporary tables to persist from one SAS step to the next. This support involves establishing a SAS connection to the DBMS that persists across SAS procedures and DATA steps.

  • The new SQL options MULTI_DATASRC_OPT= and DBMASTER= optimize the performance of the SQL procedure. More detailed information is available about passing joins to the DBMS, determining when joins will fail, and optimizing WHERE clauses passed to Oracle.

  • The SASTRACE= system option now provides improved debugging capabilities. Many layers of debugging are now available that supply information that can be used to tune SAS and Oracle.

  • DBMS metadata can be stored and processed using the SAS Open Metadata Repository. Information needed to drive the Oracle engine is stored in the repository. Using the repository the SAS user can define metadata that will be used by SAS vertical products and solutions.

  • The MULTI_DATASRC_OPT= option in the LIBNAME statement can be used in place of the DBKEY= option to improve performance when you are processing a join between two data sources.

  • The DIRECT_EXE= option in the LIBNAME statement enables you to pass an SQL statement directly to a database using explicit pass-through, when you are using PROC SQL with a libref.

  • You now have the ability to encode the DBMS password that appears in SAS source code so that it does not appear as text in SAS programs.

  • The CHANGE statement can be used to rename SAS/ACCESS tables.

  • BL_PARFILE = data set option creates a file that contains the SQL*Loader parameters.

  • DBSLICE= data set option specifies user-supplied WHERE clauses to partition a DBMS query into component queries for threaded reads.

  • DBSLICEPARM= data set option and option in the LIBNAME statement controls the scope of DBMS threaded reads and the number of threads.

  • OR_UPD_NOWHERE= is now a data set option, in addition to being an option in the LIBNAME statement.

The new SAS I/O model for providing parallel data to SAS 9 applications. New in SAS 9 and enhanced in subsequent SAS 9 releases. This process gives you the ability to scale up with SAS and Oracle.

With the data streaming back and processed on multiple threads within the SAS application has netted significant performance boosts as compared with the previous SAS I/O model. This technology will be generally available with the SAS 9 technology release in 2003.



Current Oracle Technology Adoption at SAS:

Currently SAS/Access Interface utilizes the following Oracle features:



  1. OCI programming

  2. Partitioning and parallelism

  3. RAC support

  4. LOBS

Oracle Database 10g Features and Optimizations

When using SAS/Access interface to Oracle 10g the best optimization should be considered. The following new technologies introduced in 10g could prove beneficial top many SAS applications running on Oracle:



  1. Wide Table Selects

  2. New CBO algorithms and costing model

  3. LOB improvements

  4. Data pump

Performance Issues and Concerns with Oracle and SAS

Joint Customer Concerns

SAS technical support processes many questions from SAS customers that use Oracle as their DBMS. These questions from our current customer base, along with SAS Technical support analysis of new SAS 9 features, should provide insight into what our joint customers see:



  1. Implicit vs. Explicit pass through –

Customer ability to pass critical components of the SQL statement to Oracle may be directly proportional to performance. Joint customers ask about when and how components are passed to Oracle and how they can control them. Explicit pass through is where you write the Oracle SQL that is to be sent to the DBMS server. This is Oracle SQL that is not interpreted or changed by SAS. It is simply passed to the DBMS for processing.

Implicit pass through is where the SAS SQL, procedure or datastep code is interpreted and DBMS SQL is generated on your behalf. Components of Implicit SQL where statement may be passed to the DBMS and components may be processed within SAS. How the user constructs the where criteria and how the user sets up and tunes DBMS components is critical.

1 Performance issues –

“Who is to blame for my poor performance?” The idea here is not to attach blame but to help joint customers work through performance issues. The key to this is obtaining information about what is happening under the covers. To do this the SASTRACE option in SAS 9 was given a considerable face lift. It now can report details on SQL, time to execute, where the execution time is spent and other details. This information along with Oracle tools such as the explain process, detailed later in this paper, give clues on how to increase performance.

2 How SAS accommodates new releases of Oracle –

Each new Oracle release drives support questions through SAS Technical support. The main question is do we support the new Oracle release? The rule of thumb here is yes. If there have been significant changes in the client server structure then a new SAS/ACCESS image will be provided. For example with SAS 9 we will only support 64 bit access to Oracle. Given that we will provide 64 bit shared library images so you can access Oracle 8.1.7.x, 9.i and 10G.

3 Many options to choose from, which is best –

The SAS/ACCESS product in SAS 9 provides many options that allow users to control the interactions between SAS and Oracle. Many of these options are performance related and should be studied carefully. Some of the options you need to investigate are, threaded data access using DBSLICE and DBSLICEPARM, data buffering using READBUFF and tracing information using SASTRACE.

4 SAS support for SAS 6.12, 8.2 and SAS 9 components –

Support of old SAS/ACCESS components continues in SAS 9. For example PROC ACCESS and PROC DBLOAD can still be used in SAS V9; however, they are not being recommended. Support for these components continues at a SAS V6 level. They will not be enhanced or modified.

How you would move forward involves investigation into PROC MIGRATE and PROC CV2VIEW with the use of VALIDVARNAME options to maintain compatibility to existing jobs. You should investigate to see what component is right for you as you convert to SAS 9.

5 Masking passwords –

The ability to mask Oracle passwords in SAS Datastep code generates a good deal of customer concern. We have recommended $OPSYSID as a solution; however, with customers where this is not an option mutual customers are asked to consider:


  • Permissions on SAS files that contain code with passwords. SAS V6, V8 and V9 feature.

  • Storing libname statements in the SAS registry where passwords are encrypted. SAS V8 and V9 feature.

  • Using PROC PWENCODE for encrypting SAS/ACCESS passwords and include them in the SAS Datastep and procedure code. This combined with SAS Metadata Repository password management helps secure DBMS passwords used in SAS. SAS V9 feature.

Performance Issues

Concerns when using Oracle and SAS can be alleviated with a little design and process consideration. The following four common topics and included SAS code snips show how you can increase performance. What is required is consideration of the process that must take place in Oracle and SAS then choosing the best course of action.



SQL Statement Not Written for Index

When it comes to putting together SQL that will process against the Oracle DBMS it is critical to form the SQL correctly. The following two SAS code snips detail what you might be doing vs. what you should be doing. The Oracle explain output is attached for comparison between the two SAS procedures. Note the execution difference between the first and second proc SQL is more than 4x with the second proc SQL running faster than the first.

proc sql;

/* Performs a full table scan the "function on column problem" */

create table work.customers_substr

as

select * from oradata.customers



where substr(cust_main_phone_number,1,5) = '123-4';

quit;
Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=106 Card=500 Bytes=76500)

1 0 TABLE ACCESS (FULL) OF 'CUSTOMERS' (Cost=106 Card=500 Bytes=76500)

proc sql;

/* uses an index and returns the same data */

create table work.customers_like

as

select * from oradata.customers



where cust_main_phone_number like '123-4%';

quit;


Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=84 Card=91 Bytes=13923)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTOMERS' (Cost=84 Card=91 Bytes=13923)

2 1 INDEX (RANGE SCAN) OF 'CUST_PHONE_NBR_IDX' (NON-UNIQUE)(Cost=2 Card=91)

How do I generate an execution plan:

Using SQLPLUS enter the following commands:

SQL> SET AUTOTRACE TRACEONLY

SQL>
How do I get the SQL statement to give to Oracle for the execution plan:

Turn on SASTRACE, for example enter the following statement in your SAS code –

option sastrace=’,,,d’;
And then pluck the SQL you want to analyze.

Tables and Indexes are not Analyzed

For the Oracle optimizer to function properly the tables and indexes must be analyzed. Consider the following SAS code:

proc sql;

create table work.friday_ny_sales

as

SELECT T.DAY_NAME ,



C.CUST_CITY ,

COUNT(*) AS SALE_COUNT,

SUM(S. AMOUNT_SOLD ) AS SALE_DOLLARS

FROM oradata.CUSTOMERS C, oradata.SALES S, oradata.TIMES T

WHERE (C.CUST_ID = S.CUST_ID ) AND (T.TIME_ID = S.TIME_ID )

AND (c.cust_state_province = 'NY')

AND (t.day_name = 'Friday ')

GROUP BY T.DAY_NAME , C.CUST_CITY;

quit;
Running the proc above and then after the following proc was executed showed a 2X performance gain. It is critical that tables and indexes in Oracle be analyzed before using SAS/ACCESS.

proc sql;

connect to ORACLE as ORACON (user=sh password=sh1 path='demodb.na.sas.com');

execute ( ANALYZE TABLE CUSTOMERS COMPUTE STATISTICS

FOR TABLE FOR ALL INDEXES FOR ALL INDEXED COLUMNS) by ORACON;

execute ( ANALYZE TABLE SALES COMPUTE STATISTICS

FOR TABLE FOR ALL INDEXES FOR ALL INDEXED COLUMNS) by ORACON;

execute ( ANALYZE TABLE TIMES COMPUTE STATISTICS

FOR TABLE FOR ALL INDEXES FOR ALL INDEXED COLUMNS) by ORACON;

disconnect from ORACON;

quit;

SAS Functions Not Being Passed to Oracle

Passing functions to the DBMS server for processing, especially in where components is critical for performance. Consider the following SAS code snips. The second is 10X faster than the first as far as execution time. The functions that SAS can map to Oracle can be found in the SAS/ACCESS documentation.


proc sql;

/* Send the SAS function which is mapped to the Oracle function vs. just the oracle function. */

create table work.ny_sales_by_day_up1

as

select c.cust_last_name, c.cust_city, t.day_name, sum(s.amount_sold)



from oradata.sales s, oradata.times t, oradata.customers c

where


t.time_id = s.time_id and

c.cust_id = s.cust_id and



UPPER(c.cust_state_province) = 'NY'

group by c.cust_last_name, c.cust_city, t.day_name;

quit;

proc sql;



/* Mapps UPCASE to UPPER and passes the correct SQL to ORACLE */

create table work.ny_sales_by_day_up2

as

select c.cust_last_name, c.cust_city, t.day_name, sum(s.amount_sold)



from oradata.sales s, oradata.times t, oradata.customers c

where


t.time_id = s.time_id and

c.cust_id = s.cust_id and

UPCASE(c.cust_state_province) = 'NY'

group by c.cust_last_name, c.cust_city, t.day_name;

quit;

SAS Explicit/Implicit Pass Through

There are two methods for pass SQL to Oracle from SAS. One is explicit SQL where you type Oracle SQL using Proc SQL which is sent “as is” to Oracle. The second is using SAS SQL which is converted into Oracle SQL before being sent to Oracle. You need to be performance aware when using the two methods to help you decide when to use each one. In this example the explicit SQL was almost a 4X winner of implicit SQL:

/*--- implicit sql ---*/

proc sql;

INSERT INTO oradata.quartly_city_iias

SELECT T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT(*) AS SALE_COUNT,

SUM(S.AMOUNT_SOLD) AS SALE_DOLLARS

FROM


oradata.CUSTOMERS C, oradata.SALES S, oradata.TIMES T

WHERE C.CUST_ID = S.CUST_ID

AND T.TIME_ID = S.TIME_ID

GROUP BY T.FISCAL_QUARTER_DESC, C.CUST_CITY;

quit;

/*--- Explicit SQL ---*/



proc sql;

connect to ORACLE as ORACON (user=sh password=sh1 path='demo.na.sas.com');

execute (insert /*+ APPEND */ into quartly_city_iias

select T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT(*) AS SALE_COUNT,

SUM(S.AMOUNT_SOLD) AS SALE_DOLLARS

FROM


CUSTOMERS C, SALES S, TIMES T

WHERE C.CUST_ID = S.CUST_ID AND T.TIME_ID = S.TIME_ID

GROUP BY T.FISCAL_QUARTER_DESC, C.CUST_CITY) by ORACON;

disconnect from ORACON;

quit;

Notice with explicit SQL above you are able to send information to Oracle that will speed performance (i.e. note the + APPEND hint).



Using ORACLE FEATURES with SAS

The SAS/ACCESS Interface to Oracle product provides you with different interfaces into Oracle. These range from direct SQL interaction to SAS SQL generation. The interaction with the Oracle server is enhanced with a variety of options that give you control of everything from buffer sizes to formatting. As mentioned above this functionality provides for full read, write and update controls.

The sample code below gives you insight into Oracle features in SAS and how they interact with the Oracle server. The process of tuning your application is provided as well.


  1. Threading data access to Oracle – fast data access

  2. Temporary tables usage scenarios – push processing to the DBMS server

  3. Optimize the data loading process – SQL loader interactions

Threaded data access to Oracle – fast data access

/*--- this is a SAS macro that will get executed later in the process ---*/



%macro makeRegData(nObs,nVars,oraSize);

%let ORASIZE=&oraSize;


data work.xxx;

array x{&nVars} x1-x&nVars;

drop i;

do n=1 to &nObs;

do i=1 to &nVars;

x{i}=ranuni(12345);

end;

y=x{1}+3*x{&Nvars}+rannor(12345);



output;

end;


run;

%mend;
/*--- create the data to load into Oracle – execute the macro above ---*/

%makeRegData(500000,49,20);


/*--- issue the SAS libname statement – window to Oracle ---*/

libname x oracle user=scott pass=tiger bulkcopy=yes;


/*--- using SAS bulk load the SAS data into Oracle ---*/

/*--- this invokes the Oracle SQL Loader under the covers ---*/

data x.tdata_init;

set work.xxx;

run;
/*--- using SAS create the Oracle partition table ---*/

/*--- direct access to Oracle using Oracle SQL ---*/

proc sql;

connect to oracle( user=scott pass=tiger);

exec( create table tdata

partition by range(n)

(partition obs1 values less than (126000) tablespace tabspace1,

partition obs2 values less than (251000) tablespace tabspace2,

partition obs3 values less than (376000) tablespace tabspace3,

partition obs4 values less than (500001) tablespace tabspace4)

nologging as select * from tdata_init) by oracle;

quit;
Processing the sample deata – using the SAS 9 threaded SAS process


/*--- Using SAS options control the CPU resources and Oracle connections for this process ---*/

options CPUCOUNT=4;

options DBSLICEPARM=(ALL,4);

proc reg data=x.tdata;

model y=x1-x&nVars;

performance details;

quit;
/*--- execute the regression macro ---*/

%runReg(49);

Using threaded reads in this case vs. non-threaded reads result in about a 37% percent performance gain. It is expected that your results may vary; however, reading data in parallel and utilizing threaded SAS procedures will lead to performance gains.

Temporary Oracle table process – pushing joins to the DBMS server

The ability to persist Oracle temporary tables across SAS procedure and data step boundaries has significant performance implications in the SAS/ACCESS Interface to Oracle. The SAS 9 coding example and comments below detail this process:


/*--- step one create the Oracle temporary table – using explicit SQL ---*/

/*--- (ORACLE SQL sent from SAS) ---*/

proc sql;

connect to oracle (user=scott password=tiger path=oraclev10);

execute (create global temporary table oratemp

(empid number, salary number)) by oracle;

run;

/*--- load the Oracle temporary table with SAS data and push a join ---*/



/*--- to the Oracle DBMS – the shared connection is the key – by ---*/

/*--- permitting us to share a connection made by SAS we are able to ---*/

/*--- use and process Oracle temporary tables. ---*/

libname ora oracle user=scott password=tiger path=oraclev10 connection=shared;


/*--- load the Oracle temporary table with SAS transaction data ---*/

proc append base=ora.oratemp set work.trans;

run;
/*--- push the join to Oracle ---*/

proc sql;

select lastname, firstname, salary from ora.employees t1, ora.oratemp t2

where t1.empno = t2.empno;

run ;

The option of pushing joins to the DBMS by taking advantage of Oracle temporary tables gives you a significant new process in your SAS datastep or procedure code. Performance improvements can be significant.


Optimizing the Data Loading Process – SQL Loader Interactions

The ability to load data quickly using SAS is detailed by the simple code snips below. By using the documented Oracle SQL loader interface options you can significantly reduce the time it takes you to load data into Oracle.

/* Conventional Path Load */

data oradata.customer_conv_2M(BULKLOAD=YES);

set work.customers_2M;



run;
/* Direct path load */

data oradata.customer_direct_2M(BULKLOAD=YES BL_DIRECT_PATH=YES);

set work.customers_2M;



run;
By setting the BL_DIRECT_PATH=YES you are using Oracle SQL*Loader direct path loads. This method showed an increase in performance between the first and second data step above. In some rudimentary tests we have seem marked performance differences as the number of rows reaches 2 million.

Paper #

Download 455.21 Kb.

Share with your friends:
1   2




The database is protected by copyright ©ininet.org 2024
send message

    Main page