电脑故障

位置:IT落伍者 >> 电脑故障 >> 浏览文章

ORA-600(17069)错误的解决过程


发布日期:2018/6/19
 

今天在一个报表数据库后台发现了这个错误简单描述一下问题的解决过程

详细的错误信息为

Fri Feb ::

Errors in file /u/oracle/admin/repdb/bdump/repdb_j_trc:

ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] []

Fri Feb ::

Errors in file /u/oracle/admin/repdb/bdump/repdb_j_trc:

ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] []

进一步检查对应的trace文件

bash$ more /u/oracle/admin/repdb/bdump/repdb_j_trc

/u/oracle/admin/repdb/bdump/repdb_j_trc

Oraclei Enterprise Edition Release bit Production

With the Partitioning OLAP and Oracle Data Mining options

JServer Release Production

ORACLE_HOME = /data/oracle/product/

System name: SunOS

Node name: newreport

Release:

Version: Generic_

Machine: sunu

Instance name: repdb

Redo thread mounted by this instance:

Oracle process number:

Unix process pid: image: oracle@newreport (J)

*** SESSION ID:() ::

*** ::

ksedmp: internal or fatal error

ORA: internal error code arguments: [] [xADEEE] [] [] [] [] [] []

Current SQL statement for this session:

DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate; broken BOOLEAN := FALSE; BEGIN P_GENERATE_REPDATA(FRT

); :mydate := next_date; IF broken THEN :b := ; ELSE :b := ; END IF; END;

Call Stack Trace

calling call entry argument values in hex

location type point (? means dubious value)

ksedmp()+ CALL ksedst()+ FFFFFFFFFFF ?

? ?

E ?

FFFFFFFFFFFCC ?

DC ?

kgeriv()+ PTR_CALL ? ?

D ? C ?

D ? DC ?

kgesiv()+ CALL kgeriv()+ DE ? C ?

? C ?

FFFFFFFFFFF ?

DB ?

kgesic()+ CALL kgesiv()+ DE ? C ?

AD ? ?

FFFFFFFFFFF ?

?

kglgob()+ CALL kgesic()+ DE ? C ?

AD ? ?

ADEEE ? A ?

kgldpo()+ CALL kglgob()+ ? ?

ADEEE ?

FFFFFFFFFFFA ?

? ?

kgldon()+ CALL kgldpo()+ ? ?

ADA ? ?

?

FFFFFFFFFFFBEE ?

pkldon()+ CALL kgldon()+ DE ?

FFFFFFFFFFFDE ?

ADA ? ?

?

FFFFFFFFFFFDE ?

pkloud()+ CALL pkldon()+ FFFFFFFFFFFAA ?

FFFFFFFFFFFDE ?

ADA ? ?

?

FFFFFFFFFFFDE ?

phnnrl_name_resolve CALL pkloud()+ FCA ?

_by_loading()+ FFFFFFFFFFFEC ?

? ?

? ACE ?

phngdl_get_defining CALL phnnrl_name_resolve ? ?

_libunit()+ _by_loading()+ FFFFFFFFFFF ?

FFFFFFFFFFF ?

? ?

phnrpls_resolve_pre CALL phngdl_get_defining FFFFFFFFFFF ?

fix_libscope()+ _libunit()+ FFFFFFFFFFFA ?

FFFFFFFFFFFC ?

? ?

?

无论是从trace文件对应的名称还是从trace文件中对应的语句都可以确定引起问题的是一个JOB检查metalinkOracle在文档Doc ID 中对这个错误的已知bug进行了汇总不过这些bug的描述似乎没有和当前十分相符的

查看文档的描述发现ORA错误的第二个参数这里是xADEEE代表Library Cache Object Handle看来问题可能和LATCH有关

但是根据信息在V$LATCH和V$LATCH_CHILDREN视图中没有找到有价值的信息

这个JOB由于失败会自动再次执行检查JOB运行时的V$LOCK信息

SQL> SELECT ADDR TYPE ID ID LMODE REQUEST BLOCK

FROM V$LOCK

WHERE SID = ;

ADDR TY ID ID LMODE REQUEST BLOCK

CU E+

F JQ

从V$LOCK中看不到什么特别有价值的信息接着检查V$SESSION_WAIT看看这个JOB在等待什么

SQL> SELECT EVENT PTEXT PRAW PTEXT PRAW STATE

FROM V$SESSION_WAIT

WHERE SID = ;

EVENT PTEXT PRAW PTEXT PRAW STATE

library cache pin handle address ADEEE pin address BAA WAITING

这次的信息就明显了ORA错误的第二个参数就是V$SESSION_WAIT视图的PRAW的值而且从等待事件上也可以看到问题就是出现在LIBRARY CACHE PIN的过程中

重新查看METALINK的信息这个错误可能发生在一个长时间运行的进程在其运行过程中所依赖的对象被编译或者删除了

检查JOB调用的过程的状态

SQL> SELECT OWNER OBJECT_NAME OBJECT_TYPE STATUS

FROM DBA_OBJECTS

WHERE OWNER = FUJIANREP

AND OBJECT_NAME = P_GENERATE_REPDATA;

OWNER OBJECT_NAME OBJECT_TYPE STATUS

FUJIANREP

果然问题过程处于不正常的状态

将JOB至于BROKEN状态避免JOB再次运行

SQL> EXEC DBMS_JOBBROKEN( TRUE)

PL/SQL procedure successfully completed

SQL> COMMIT;

Commit complete

杀掉JOB对应的PROCESS

SQL> SELECT SPID FROM V$PROCESS WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = );

SPID

SQL> HOST kill

下面用重新编译该过程

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;

ALTER PROCEDURE P_GENERATE_REPDATA COMPILE

*

ERROR at line :

ORA: timeout occurred while waiting to lock object FUJIANREPP_GENERATE_REPDATA

由于从V$LOCK和V$LATCH无法得到信息只能看看有没有其他人当前在访问P_GENERATE_REPDATA所依赖的对象

SQL> SELECT * FROM V$ACCESS

WHERE (OWNER OBJECT) IN

(SELECT REFERENCED_OWNER REFERENCED_NAME

FROM DBA_DEPENDENCIES

WHERE OWNER = FUJIANREP

AND NAME = P_GENERATE_REPDATA);

SID OWNER OBJECT TYPE

FUJIANREP CAT_BUYER SYNONYM

FUJIANREP CAT_CATEGORY SYNONYM

FUJIANREP CAT_DOSEAGE_FORM SYNONYM

FUJIANREP CAT_DRUG SYNONYM

FUJIANREP CAT_ENTERPRISE SYNONYM

FUJIANREP CAT_METRIC SYNONYM

FUJIANREP CAT_ORG SYNONYM

FUJIANREP CAT_PRODUCT SYNONYM

FUJIANREP CAT_QUALITY_DEFINE SYNONYM

FUJIANREP GOV_CAT_BUYER TABLE

FUJIANREP GOV_CAT_ENTERPRISE TABLE

FUJIANREP GOV_S_MO_BU TABLE

FUJIANREP GOV_S_MO_BU_EN TABLE

FUJIANREP GOV_S_MO_BU_PR TABLE

FUJIANREP GOV_S_MO_EN TABLE

FUJIANREP GOV_S_MO_ME TABLE

FUJIANREP GOV_S_MO_ME_CA TABLE

FUJIANREP GOV_S_MO_ME_PR TABLE

FUJIANREP GOV_S_MO_ORDER TABLE

FUJIANREP GOV_S_YE_ORDER TABLE

FUJIANREP GRP_HOSPITAL TABLE

FUJIANREP GRP_LEVEL TABLE

FUJIANREP ORD_ORDER TABLE

FUJIANREP ORD_ORDER_ITEM TABLE

FUJIANREP ORD_ORDER_ITEM_REP CURSOR

FUJIANREP ORD_ORDER_RECEIVE TABLE

FUJIANREP ORD_ORDER_RECEIVE_REP SYNONYM

FUJIANREP ORD_ORDER_REP CURSOR

FUJIANREP ORD_ORDER_RETURN TABLE

FUJIANREP ORD_ORDER_RETURN_REP CURSOR

FUJIANREP PLT_PLAT CURSOR

FUJIANREP USER_TAB_PARTITIONS CURSOR

NDMAIN CAT_BUYER TABLE

NDMAIN CAT_CATEGORY TABLE

NDMAIN CAT_DOSEAGE_FORM TABLE

NDMAIN CAT_DRUG TABLE

NDMAIN CAT_ENTERPRISE TABLE

NDMAIN CAT_METRIC TABLE

NDMAIN CAT_ORG TABLE

NDMAIN CAT_PRODUCT TABLE

NDMAIN CAT_QUALITY_DEFINE TABLE

NDMAIN ORD_ORDER VIEW

NDMAIN ORD_ORDER_ITEM VIEW

NDMAIN ORD_ORDER_RECEIVE VIEW

NDMAIN ORD_ORDER_RETURN VIEW

NDMAIN PLT_PLAT TABLE

PUBLIC USER_TAB_PARTITIONS SYNONYM

SYS STANDARD PACKAGE

SYS STANDARD PACKAGE

SYS SYS_STUB_FOR_PURITY_ANALYSIS PACKAGE

SYS USER_TAB_PARTITIONS VIEW

rows selected

对象果然被其他人所访问看看这个会话在做什么

SQL> SELECT SID SERIAL# USERNAME PROGRAM TERMINAL

FROM V$SESSION

WHERE SID = ;

SID SERIAL# USERNAME PROGRAM TERMINAL

FUJIANREP PlSqlDevexe LIBY

没想到是同事的连接的会话看看他在干什么

SQL> SELECT SQL_TEXT FROM V$SQL

WHERE ADDRESS IN

(SELECT SQL_ADDRESS FROM V$SESSION

WHERE SID = );

SQL_TEXT

ALTER TABLE GOV_S_MO_EN TRUNCATE PARTITION P

居然是TRUNCATE分区操作难怪会导致过程处于INVALID状态不过这个操作应该不会持续很长时间的难道这个操作一直没有完成吗

SQL> SELECT EVENT PTEXT P PTEXT P PTEXT P SECONDS_IN_WAIT

FROM V$SESSION_WAIT WHERE SID = ;

EVENT PTEXT P PTEXT P PTEXT P SECONDS_IN_WAIT

db file sequential read file# block# blocks

这个等待已经发生了几十天了显然这是一个僵死的会话

从后台kill掉对应的进程

SQL> SELECT SPID FROM V$PROCESS

WHERE ADDR IN (SELECT PADDR FROM V$SESSION WHERE SID = );

SPID

SQL> HOST kill

切换为FUJIANREP用户再次编译过程

SQL> ALTER PROCEDURE P_GENERATE_REPDATA COMPILE;

Procedure altered

至此问题解决将JOB重新设置BROKEN即可

SQL> EXEC DBMS_JOBBROKEN( FALSE)

PL/SQL procedure successfully completed

SQL> COMMIT;

Commit complete

上一篇:如何评 估OS 安 全

下一篇:全面解读 startx