OCI: Troubleshoot Bootvolume

I had a case where I was unable to log into an OCI compute instance. Any attempts to connect via SSH resulted in a “connection refused” error. A console connection did not work either.

To troubleshoot the issue I decided to detach the boot volume from my current instance, and attach it as a regular block volume on another instance. That way I could look into the boot volume to see what was going on.

Working with the OCI CLI was the quickest way for me to do the detach/attach actions. To make things a bit easier to read, I’ll use environment variables.

I worked with two compute instance. INST1 is the one not allowing to log in, INST2 is another instance to which I’ll attach the boot volume from INST1.

You will need to collect the necessary OCIDS either through the console or via the OCI CLI.

$ export INST1=ocid1.instance.oc1.iad.xxxexamplexxx         # Malfunctioning instance
$ export INST2=ocid1.instance.oc1.iad.xxxexamplexxx         # Good instance for testing
$ export CMPOCID=ocid1.compartment.oc1..xxxexamplexxx       # Compartment for instances
$ export ADNAME=XYZ:US-ASHBURN-AD-1                         # Availability doman for volume
$ export PRF=my-oci                                         # OCI CLI profile to connect to tenancy
$ export BOOTVOLUME=ocid1.bootvolume.oc1.iad.xxxexamplexxx  # Boot volume of malfunctioning instance
$ oci --profile $PRF compute boot-volume-attachment list \
  -c $CMPOCID --availability-domain $ADNAME \
  --query "data[?\"boot-volume-id\" == '$BOOTVOLUME'].{id:id}"

    "id": "ocid1.instance.oc1.iad.xxxexamplexxx"

Note that the boot volume attachment ID is the ID of our malfunctioning instance.

$ oci --profile $PRF compute boot-volume-attachment detach \
  --boot-volume-attachment-id $INSTOCID

After a few seconds, the boot volume will have detached.

Now attach the boot volume as a regular volume to the second instance.

$ oci --profile $PRF compute volume-attachment attach \
  --instance-id $INST2 \
  --volume-id $BOOTVOLUME \
  --type paravirtualized

Once the volume is attached, log into INST2 and mount it. In my case, the volume showed up as device /dev/sdb as the output of lsblk shows.

$ sudo lsblk
sdb      8:16   0  500G  0 disk
└─sdb1   8:17   0  250M  0 part 
└─sdb2   8:17   0   48G  0 part 
└─sdb3   8:17   0  250M  0 part 
sda      8:0    0   47G  0 disk
├─sda2   8:2    0    8G  0 part [SWAP]
├─sda3   8:3    0 38.4G  0 part /
└─sda1   8:1    0  200M  0 part /boot/efi

Since partition sdb2 was the largest, it is the one I decided to mount:

$ sudo mkdir /mnt/vol1
$ sudo mount /dev/sdb2 /mnt/vol1
$ sudo cd /mnt/vol1

At this point I was able to dive into the boot volume to see what was going on. In my case it turned out that the volume was 100% full. After some investigation it showed that some large log files were generated. I truncated those logs and unmounted, detached, and re-attached the boot volume to INST1. Afterwards I was able to log back in.