今天在一个报表数据库后台发现了这个错误简单描述一下问题的解决过程 详细的错误信息为 Fri Feb :: Errors in file /u/oracle/admin/repdb/bdump/repdb_j_trc: ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] [] Fri Feb :: Errors in file /u/oracle/admin/repdb/bdump/repdb_j_trc: ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] [] 进一步检查对应的trace文件 bash$ more /u/oracle/admin/repdb/bdump/repdb_j_trc /u/oracle/admin/repdb/bdump/repdb_j_trc Oraclei Enterprise Edition Release bit Production With the Partitioning OLAP and Oracle Data Mining options JServer Release Production ORACLE_HOME = /data/oracle/product/ System name: SunOS Node name: newreport Release: Version: Generic_ Machine: sunu Instance name: repdb Redo thread mounted by this instance: Oracle process number: Unix process pid: image: oracle@newreport (J) *** SESSION ID:() :: *** :: ksedmp: internal or fatal error ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] [] Current SQL statement for this session: DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate; broken BOOLEAN := FALSE; BEGIN P_GENERATE_REPDATA(FRT ); :mydate := next_date; IF broken THEN :b := ; ELSE :b := ; END IF; END; Call Stack Trace calling call entry argument values in hex location type point (? means dubious value) ksedmp()+ CALL ksedst()+ FFFFFFFFFFF ? ? ? E ? FFFFFFFFFFFCC ? DC ? kgeriv()+ PTR_CALL ? ? D ? C ? D ? DC ? kgesiv()+ CALL kgeriv()+ DE ? C ? ? C ? FFFFFFFFFFF ? DB ? kgesic()+ CALL kgesiv()+ DE ? C ? AD ? ? FFFFFFFFFFF ? ? kglgob()+ CALL kgesic()+ DE ? C ? AD ? ? ADEEE ? A ? kgldpo()+ CALL kglgob()+ ? ? ADEEE ? FFFFFFFFFFFA ? ? ? kgldon()+ CALL kgldpo()+ ? ? ADA ? ? ? FFFFFFFFFFFBEE ? pkldon()+ CALL kgldon()+ DE ? FFFFFFFFFFFDE ? ADA ? ? ? FFFFFFFFFFFDE ? pkloud()+ CALL pkldon()+ FFFFFFFFFFFAA ? FFFFFFFFFFFDE ? ADA ? ? ? FFFFFFFFFFFDE ? phnnrl_name_resolve CALL pkloud()+ FCA ? _by_loading()+ FFFFFFFFFFFEC ? ? ? ? ACE ? phngdl_get_defining CALL phnnrl_name_resolve ? ? _libunit()+ _by_loading()+ FFFFFFFFFFF ? FFFFFFFFFFF ? ? ? phnrpls_resolve_pre CALL phngdl_get_defining FFFFFFFFFFF ? fix_libscope()+ _libunit()+ FFFFFFFFFFFA ? FFFFFFFFFFFC ? ? ? ? 无论是从trace文件对应的名称还是从trace文件中对应的语句都可以确定引起问题的是一个JOB检查metalinkOracle在文档Doc ID 中对这个错误的已知bug进行了汇总不过这些bug的描述似乎没有和当前十分相符的 查看文档的描述发现ORA错误的第二个参数这里是xADEEE代表Library Cache Object Handle看来问题可能和LATCH有关 但是根据信息在V$LATCH和V$LATCH_CHILDREN视图中没有找到有价值的信息 这个JOB由于失败会自动再次执行检查JOB运行时的V$LOCK信息 SQL> SELECT ADDR TYPE ID ID LMODE REQUEST BLOCK FROM V$LOCK WHERE SID = ; ADDR TY ID ID LMODE REQUEST BLOCK CU E+ F JQ 从V$LOCK中看不到什么特别有价值的信息接着检查V$SESSION_WAIT看看这个JOB在等待什么 SQL> SELECT EVENT PTEXT PRAW PTEXT PRAW STATE FROM V$SESSION_WAIT WHERE SID = ; EVENT PTEXT PRAW PTEXT PRAW STATE library cache pin handle address ADEEE pin address BAA WAITING 这次的信息就明显了ORA错误的第二个参数就是V$SESSION_WAIT视图的PRAW的值而且从等待事件上也可以看到问题就是出现在LIBRARY CACHE PIN的过程中 重新查看METALINK的信息这个错误可能发生在一个长时间运行的进程在其运行过程中所依赖的对象被编译或者删除了 检查JOB调用的过程的状态 SQL> SELECT OWNER OBJECT_NAME OBJECT_TYPE STATUS FROM DBA_OBJECTS WHERE OWNER = FUJIANREP AND OBJECT_NAME = P_GENERATE_REPDATA; OWNER OBJECT_NAME OBJECT_TYPE STATUS FUJIANREP 果然问题过程处于不正常的状态 将JOB至于BROKEN状态避免JOB再次运行 SQL> EXEC DBMS_JOBBROKEN( TRUE) PL/SQL procedure successfully completed SQL> COMMIT; Commit complete 杀掉JOB对应的PROCESS SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = ); SPID
SQL> HOST kill 下面用重新编译该过程 SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;ALTER PROCEDURE P_GENERATE_REPDATA COMPILE * ERROR at line : ORA: timeout occurred while waiting to lock object FUJIANREPP_GENERATE_REPDATA 由于从V$LOCK和V$LATCH无法得到信息只能看看有没有其他人当前在访问P_GENERATE_REPDATA所依赖的对象 SQL> SELECT * FROM V$ACCESS WHERE (OWNER OBJECT) IN (SELECT REFERENCED_OWNER REFERENCED_NAME FROM DBA_DEPENDENCIES WHERE OWNER = FUJIANREP AND NAME = P_GENERATE_REPDATA); SID OWNER OBJECT TYPE FUJIANREP CAT_BUYER SYNONYM FUJIANREP CAT_CATEGORY SYNONYM FUJIANREP CAT_DOSEAGE_FORM SYNONYM FUJIANREP CAT_DRUG SYNONYM FUJIANREP CAT_ENTERPRISE SYNONYM FUJIANREP CAT_METRIC SYNONYM FUJIANREP CAT_ORG SYNONYM FUJIANREP CAT_PRODUCT SYNONYM FUJIANREP CAT_QUALITY_DEFINE SYNONYM FUJIANREP GOV_CAT_BUYER TABLE FUJIANREP GOV_CAT_ENTERPRISE TABLE FUJIANREP GOV_S_MO_BU TABLE FUJIANREP GOV_S_MO_BU_EN TABLE FUJIANREP GOV_S_MO_BU_PR TABLE FUJIANREP GOV_S_MO_EN TABLE FUJIANREP GOV_S_MO_ME TABLE FUJIANREP GOV_S_MO_ME_CA TABLE FUJIANREP GOV_S_MO_ME_PR TABLE FUJIANREP GOV_S_MO_ORDER TABLE FUJIANREP GOV_S_YE_ORDER TABLE FUJIANREP GRP_HOSPITAL TABLE FUJIANREP GRP_LEVEL TABLE FUJIANREP ORD_ORDER TABLE FUJIANREP ORD_ORDER_ITEM TABLE FUJIANREP ORD_ORDER_ITEM_REP CURSOR FUJIANREP ORD_ORDER_RECEIVE TABLE FUJIANREP ORD_ORDER_RECEIVE_REP SYNONYM FUJIANREP ORD_ORDER_REP CURSOR FUJIANREP ORD_ORDER_RETURN TABLE FUJIANREP ORD_ORDER_RETURN_REP CURSOR FUJIANREP PLT_PLAT CURSOR FUJIANREP USER_TAB_PARTITIONS CURSOR NDMAIN CAT_BUYER TABLE NDMAIN CAT_CATEGORY TABLE NDMAIN CAT_DOSEAGE_FORM TABLE NDMAIN CAT_DRUG TABLE NDMAIN CAT_ENTERPRISE TABLE NDMAIN CAT_METRIC TABLE NDMAIN CAT_ORG TABLE NDMAIN CAT_PRODUCT TABLE NDMAIN CAT_QUALITY_DEFINE TABLE NDMAIN ORD_ORDER VIEW NDMAIN ORD_ORDER_ITEM VIEW NDMAIN ORD_ORDER_RECEIVE VIEW NDMAIN ORD_ORDER_RETURN VIEW NDMAIN PLT_PLAT TABLE PUBLIC USER_TAB_PARTITIONS SYNONYM SYS STANDARD PACKAGE SYS STANDARD PACKAGE SYS SYS_STUB_FOR_PURITY_ANALYSIS PACKAGE SYS USER_TAB_PARTITIONS VIEW rows selected 对象果然被其他人所访问看看这个会话在做什么 SQL> SELECT SID SERIAL# USERNAME PROGRAM TERMINAL FROM V$SESSION WHERE SID = ; SID SERIAL# USERNAME PROGRAM TERMINAL FUJIANREP PlSqlDevexe LIBY 没想到是同事的连接的会话看看他在干什么 SQL> SELECT SQL_TEXT FROM V$SQL WHERE ADDRESS IN (SELECT SQL_ADDRESS FROM V$SESSION WHERE SID = ); SQL_TEXT
ALTER TABLE GOV_S_MO_EN TRUNCATE PARTITION P 居然是TRUNCATE分区操作难怪会导致过程处于INVALID状态不过这个操作应该不会持续很长时间的难道这个操作一直没有完成吗 SQL> SELECT EVENT PTEXT P PTEXT P PTEXT P SECONDS_IN_WAIT FROM V$SESSION_WAIT WHERE SID = ; EVENT PTEXT P PTEXT P PTEXT P SECONDS_IN_WAIT db file sequential read file# block# blocks 这个等待已经发生了几十天了显然这是一个僵死的会话 从后台kill掉对应的进程 SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = ); SPID
SQL> HOST kill 切换为FUJIANREP用户再次编译过程 SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE; Procedure altered 至此问题解决将JOB重新设置BROKEN即可 SQL> EXEC DBMS_JOBBROKEN( FALSE) PL/SQL procedure successfully completed SQL> COMMIT; Commit complete |