Manual Recovery

Manual Recovery Process

This document outlines the steps required to manually recover a degraded volume using the Volumez platform. The manual recovery process is essential when automatic recovery cannot proceed due to specific limitations, such as insufficient permissions or unavailable media.

Overview

Manual recovery involves a series of actions to restore the resiliency and full capabilities of a volume that has become degraded due to media loss. This process ensures that the volume can return to its optimal state, whether through partial or full recovery depending on the availability of data and media.

Steps for Manual Recovery

Monitor Volume Status
- Action: Continuously monitor the status of the volume.
- Target / Interface: Use the Volumez console/API
- API Command: GET /volumes/{volume}/
- Details: Check the 'state' attribute to determine the current status of the volume.
Check Instance Status
- Action: Verify the status of the instances attached to the volume.
- Target / Interface: Use the Cloud Console or API.
- Details: Ensure that the instances are running and operational.
Create a Replacement Instance
- Action: Provision a new instance to replace the failed one.
- Target / Interface: Use the Cloud Console or API.
- Details: Configure the new instance with the necessary specifications to match the failed instance.
Install Connector on New Instance
- Action: Install the required Volumez connector on the newly provisioned instance.
- Target / Interface: Access the shell of the newly provisioned instance.
- Details: Follow the installation instructions for the Volumez connector to ensure proper setup.
Assign Media to New Instance
- Action: Assign the necessary media resources to the newly provisioned instance.
- Target / Interface: Use the Volumez console / API.
- Command: GET /media/{media}/assign
- Details: Ensure the media is correctly attached to the new instance for recovery operations.
Initiate the Recovery Process
- Action: Start the recovery process for the volume.
- Target / Interface: Use the Volumez console / API.
- Command: POST /volumes/{volume}/recovery
- Details: Initiate the recovery, allowing the system to start rebuilding the volume data.
Check Volume Status Post-Recovery
- Action: Monitor the status of the volume after initiating the recovery process.
- Target / Interface: Use the Volumez console / API.
- Command: GET /volumes/{volume}/
- Details: Check the 'state', 'volumerecoveryjob', and 'progress' attributes to track recovery progress.
Monitor Recovery Process
- Action: Continuously monitor the recovery job to ensure successful completion.
- Target / Interface: Use the Volumez console / API.
- Command: GET /jobs/{job_id}
- Details: Monitor the job status, addressing any issues that may arise during recovery.

Important Considerations

Media Visibility: Ensuring visibility to the media status is critical for selecting the appropriate recovery method and prioritizing the rebuild process.
Permissions: If Volumez does not have the necessary permissions to create compute instances or access media, these steps must be performed manually. Ensure that Volumez has the required permissions to automate the recovery process.
Recovery Types:
- Quick Recovery: When the missing media becomes available again, leveraging the latest snapshot to recover only the data newer than what’s in the snapshot.
- Full Recovery: Requires copying all data from the active copy to the new media, potentially taking up to 8 hours depending on the volume size and data reservation rate.

By following these steps and considerations, users can effectively manage the manual recovery of volumes, ensuring minimal downtime and restoring system resilience efficiently.