1、Recovery Manager TroubleshootingTroubleshooting User-Managed Media RecoveryThis chapter describes how to troubleshoot user-managed media recovery, and includes the following topics: About User-Managed Media Recovery Problems Investigating the Media Recovery Problem: Phase 1 Trying to Fix the Recover
2、y Problem Without Corrupting Blocks: Phase 2 Deciding Whether to Allow Recovery to Corrupt Blocks: Phase 3 Allowing Recovery to Corrupt Blocks: Phase 4 Performing Trial RecoveryAbout User-Managed Media Recovery ProblemsTable20-1 describes potential problems that can occur during media recovery.Table
3、 20-1 Media Recovery Problems ProblemDescriptionMissing or misnamed archived logRecovery stops because the database cannot find the archived log recorded in the control file.When you attempt to open the database, error ORA-1113 indicates that a datafile needs media recoveryThis error commonly occurs
4、 because: You are performing incomplete recovery but failed to restore all needed datafile backups. Incomplete recovery stopped before datafiles reached a consistent SCN. You are recovering datafiles from an online backup, but not enough redo was applied to make the datafiles consistent. You are per
5、forming recovery with a backup control file, and did not specify the location of a needed online redo log. A datafile is undergoing media recovery when you attempt to open the database. Datafiles needing recovery were not brought online before executing RECOVER DATABASE, and so were not recovered.Re
6、do record problemsTwo possible cases are as follows: Recovery stops because of failed consistency checks, a problem called stuck recovery. Stuck recovery can occur when an underlying operating system or storage system loses a write issued by the database during normal operation. The database signals
7、 an internal error when applying the redo. This problem can be caused by an Oracle bug. If checksums are not being used, it can also be caused by corruptions to the redo or data blocks.Corrupted archived logsLogs may be corrupted while they are stored on or copied between storage systems. If DB_BLOC
8、K_CHECKSUM is enabled, then the database usually signals checksum errors. If checksumming is not on, then log corruption may appear as a problem with redo.Archived logs with incompatible parallel redo formatIf you enable the parallel redo feature, then the database generates redo logs in a new forma
9、t. Prior releases of Oracle are unable to apply parallel redo logs. However, releases prior to Oracle9i Release 2 (9.2) can detect the parallel redo format and indicate the inconsistency with the following error message: External error 00303, 00000, cannot process Parallel Redo. See Also: Oracle Dat
10、abase Performance Tuning Guide to learn about the parallel redo featureCorrupted data blocksA datafile backup may have contained a corrupted data block, or the data block may become corrupted either during recovery or when it was copied to the backup. If checksums are being used, then the database s
11、ignals a checksum error. Otherwise, the problem may also appear as a redo corruption.Random problemsMemory corruptions and other transient problems can occur during recovery.The symptoms of media recovery problems are usually external or internal errors signaled during recovery. For example, an exte
12、rnal error indicates that a redo block or a data block has failed checksum verification checks. Internal errors can be caused by either bugs in the database or errors arising from the underlying operating system and hardware.If media recovery encounters a problem while recovering a database backup,
13、whether it is a stuck recovery problem or a problem during redo application, the database always stops and leaves the datafiles undergoing recovery in a consistent state, that is, at a consistent SCN preceding the failure. You can then do one of the following: Open the database read-only to investig
14、ate the problem. Open the database with the RESETLOGS option, as long as the requirements for opening RESETLOGS have been met. Note that the RESETLOGS restrictions apply to opening the standby database as well, because a standby database is updated by a form of media recovery.In general, opening the
15、 database read-only or opening with the RESETLOGS option require all online datafiles to be recovered to the same SCN. If this requirement is not met, then the database may signal ORA-1113 or other errors when you attempt to open. Some common causes of ORA-1113 are described in Table20-1.The basic m
16、ethodology for responding to media recovery problems occurs in the following phases:1. Try to identify the cause of the problem. Run a trial recovery if needed. 2. If the problem is related to missing redo logs or you suspect there is a redo log, memory, or data block corruption, then try to resolve
17、 it using the methods described in Table20-2. 3. If you cannot resolve the problem using the methods described in Table20-2, then do one of the following: o Open the database with the RESETLOGS option if you are recovering a whole database backup. If you have performed serial media recovery, then th
18、e database contains all the changes up to but not including the changes at the SCN where the corruption occurred. No changes from this SCN onward are in the recovered part of the database. If you have restored online backups, then opening RESETLOGS succeeds only if you have recovered through all the
19、 ALTER . END BACKUP operations in the redo stream. o Proceed with recovery by allowing media recovery to corrupt data blocks. After media recovery completes, try performing block media recovery using RMAN. o Call Oracle Support Services as a last resort. See Also: Performing Block Media Recovery wit
20、h RMAN to learn about block media recoveryInvestigating the Media Recovery Problem: Phase 1If media recovery encounters a problem, then obtain as much information as possible after recovery halts. You do not want to waste time fixing the wrong problem, which may in fact make matters worse.The goal o
21、f this initial investigation is to determine whether the problem is caused by incorrect setup, corrupted redo logs, corrupted data blocks, memory corruption, or other problems. If you see a checksum error on a data block, then the data block is corrupted. If you see a checksum error on a redo log bl
22、ock, then the redo log is corrupted.Sometimes the cause of a recovery problem can be difficult to determine. Nevertheless, the methods in this chapter allow you to quickly recover a database even when you do not completely understand the cause of the problem.To investigate media recovery problems:1.
23、 Examine the alert.log to see whether the error messages give general information about the nature of the problem. For example, does the alert_SID.log indicate any checksum failures? Does the alert_SID.log indicate that media recovery may have to corrupt data blocks in order to continue? 2. Check th
24、e trace file generated by the Oracle process during recovery. It may contain additional error information.Trying to Fix the Recovery Problem Without Corrupting Blocks: Phase 2Depending on the type of media recovery problem you suspect, you have different solutions at your disposal. You can try one o
25、r a combination of the methods described in Table20-2. Note that these methods are fairly safe: in almost all cases, they should not cause any damage to the database.Table 20-2 Media Recovery Solutions If you suspect . . .Then . . .Missing/misnamed archived logsDetermine whether you entered the corr
26、ect filename. If you did, then check to see whether the log is missing from the operating system. If it is missing, and you have a backup, then restore the backup and apply the log. If you do not have a backup, then if possible perform incomplete recovery up to the point of the missing log.ORA-1113
27、for ALTER DATABASE OPENReview the causes of this error in Table20-1. Make sure that all read/write datafiles requiring recovery are online. If you use a backup control file for recovery, then the control file and datafiles must be at a consistent SCN for the database to be opened. If you do not have
28、 the necessary redo, then you must re-create the control file.Corrupt archived logsThe log is corrupted if the checksum verification on the log redo block fails. If DB_BLOCK_CHECKSUM is not enabled either during the recovery session or when the database generated the redo, then recovery problems may
29、 be caused by corrupted logs. If the log is corrupt and an alternate copy of the corrupt log is available, then try to apply it and see whether this tactic fixes the problem. The DB_BLOCK_CHECKSUM initialization parameter determines whether checksums are computed for redo log and data blocks.Archive
30、d logs with incompatible parallel redo formatIf you are running an Oracle release prior to Oracle9i Release 2, and if you are attempting to apply redo logs created with the parallel redo format, then you must do the following steps:1. Upgrade the database to a later release. 2. Perform media recover
31、y. 3. Shut down the database consistently and back up the database. 4. Downgrade the database to the original release.See Also: Oracle Database Performance Tuning Guide to learn about the parallel redo featureMemory corruption or transient problemsYou may be able to fix the problem by shutting down
32、the database and restarting recovery. The databse should be left in a consistent state if the second attempt also fails.Corrupt data blocksRestore and recover the datafile again with user-managed methods, or restore and recover individual data blocks with the RMAN BLOCKRECOVER command. This tactic may fix the problem. A data block is corrupted if the checksum verification on the block fails. If DB_BLOCK_CHECKING is disabled, a corrupted data block problem may appear as a redo problem.
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1