Posts Tagged VM Snapshot
Like many of you, I am frustrated with the information I get while a snapshot or several snapshots are being deleted. If you use the GUI to monitor this process then you won’t have much luck trying to determine at what stage the snapshot deletion process is at, the task would stick at 99% for minutes and maybe hours depending on the number of snapshots and their sizes.
There is no way to estimate time left, the only option is to SSH to the ESX or ESXi host, browse to the datastore where that specific machine files do reside. And run the following command
# ls –lhut
This should display files in the following format:
-rw——- 1 root root 40.0G Sep 6 08:27 ServerName-000001-delta.vmdk
This would arrange files according to their access time, starting from most recent. This method should give an indication of which .vmdk file the deletion process is active on. Each snapshot file has 6 digit number attached to it to indicate its position within the snapshot chain (e.g. ServerName-000001-delta.vmdk, ServerName-000002-delta.vmdk, ServerName-000003-delta.vmdk, etc). the deletion process goes through each file in sequence starting with ServerName-000001-delta.vmdk.
I am sure you have read or come across VMware best practices regarding snapshots. We use vDR in some of our backups in addition to Backup Exec to backup specific applications such as SQL and Exchange.
As a note, VMware snapshots are crash consistent and not application consistent. For that reason VMware do not recommend using snapshots on DC servers. To have an application consistent backup, Backup Exec and other third party backup application can do that for you.
Now back to our issue, running on multiple snapshots is bad. It is even worse when having multiple snapshots on one machine older than 3 days. Your machine would struggle reading from these multiple delta vmdk disks to recreate your data. These snapshots can get very large and can degrade the performance of the server immensely! We had this issue with an Exchange server that had 7 snapshots over 3 days old at various sizes, the server became unusable. The time it took to delete those snapshots were close to 10 hours and that wasn’t fun!
To avoid this crisis, setup alarms on your VCentre to cascade to all nodes in your cluster to alerts you for any VM running on snapshots.
There is a great KB article by VMware.
Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.
That was a persistent error every time the Vmware Data Recovery appliance tries to take a snapshot of the server for backup.
I had to make sure VSS provider exists by running vssadmin list providers on the effected server, also listening all VSS writers and making sure none of them is reporting an error (by running the command vssadmin list writers)
Make sure MS VSS service set to start manually and the COM+ System Application set to start automatically and it is running.
If the above doesn’t resolve the issue, re-install Vmware tools on the server to re-register VSS.