Storware Backup & Recovery
    • Some backups failing

      Virtual Environments 3 11 3.4k

      Hello.
      I have vprotect to backup VM's on a Ovirt infrastructure.

      Everything is working great except for some backups that sometimes fail with this message:
      ExternalAPIException: No file size in transfer url response.

      After retrying the backup a few times it works.

      Could you help to see what's happening?

      Logs for a failed backup of VM0001 - https://pastebin.com/Y3tLrH6S
      Logs for the successful backup of the same VM a few minutes later - https://pastebin.com/DkifBdVQ

    • Hi!

      Sorry for delay.

      This error is mostly depend on connectivity issues. When we receive information from Manager and not correct information from host. The information of file to transfer is first which we check.

      It could be affected by SSL chain. When we could communicate by SSL to Manager but not for all hosts.

      You can go to setup of Virtual provides, and clear all certs, and switch on "Trust all Certificates" on Manager and each of hosts. Then run Sync and try to backup.
      Regards

    • @krzysztof-sz

      Hello, no worries.

      Cleared all certs, ran Sync (OK) and tried a new backup but keep getting the same errors:
      https://pastebin.com/Meqp1GtT

      What idea what else I can do to get this working?

    • @carvalhoiv it could also be a network connection problem to oVirt manager or its hosts. To connect, we require two ports open on the firewall: 54322 and 54323. Please make sure that these ports are open on the oVirt manager and its hosts.

    • @lsroga

      Just tested and connection to hosted_engine at port 54323 is OK and from to the hosts on port 54322 is also OK.

      Will try now to restart one host, after the restart move a VM to it and try the backup again.

    • Ok, rebooted a host but keep getting the same errors.

      Tried to backup a VM from another ovirt cluster and that ran correctly.

      What else could I look into?

    • Enabled DEBUG mode and found this on appserver log:

      [2024-03-07 17:41:12.737] INFO [pool-5-thread-1] CertificateCache.getOrCreateTrustedCertificateWithDecodedChain:227 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Getting certificate chain for TrustedCertificateChainId(sourceId=f6fa7cb3-6c6d-4903-be7e-7bc41a5dcb6a, sourceType=HYPERVISOR_MANAGER, chainType=X509) and task: 039badd6-c94d-44ad-b8e2-8ac8b5f58761: [Export] from 2024-03-07T17:39Z to 2024-03-08T03:39Z
      
      [2024-03-07 17:41:12.739] DEBUG [pool-5-thread-1] CertificateCache.getLocalCertificateChainIfAvailableAndUpToDate:265 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Getting locally stored certificate for TrustedCertificateChainId(sourceId=f6fa7cb3-6c6d-4903-be7e-7bc41a5dcb6a, sourceType=HYPERVISOR_MANAGER, chainType=X509)
      
      [2024-03-07 17:41:12.739] DEBUG [pool-5-thread-1] CertificateCache.getLocalCertificateChainIfAvailableAndUpToDate:268 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Locally stored certificate chain not available for TrustedCertificateChainId(sourceId=f6fa7cb3-6c6d-4903-be7e-7bc41a5dcb6a, sourceType=HYPERVISOR_MANAGER, chainType=X509)
      
      [2024-03-07 17:41:12.740] DEBUG [pool-5-thread-1] CertificateCache.getCertificateChainFromServer:298 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Getting certificate chain from SBR server: TrustedCertificateChainId(sourceId=f6fa7cb3-6c6d-4903-be7e-7bc41a5dcb6a, sourceType=HYPERVISOR_MANAGER, chainType=X509)
      
      [2024-03-07 17:41:12.786] DEBUG [pool-5-thread-1] CertificateCache.decodeCertificates:309 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Attempting to decode certificate chain for HYPERVISOR_MANAGER f6fa7cb3-6c6d-4903-be7e-7bc41a5dcb6a
      
      [2024-03-07 17:41:12.789] DEBUG [pool-5-thread-1] CertificateCache.refreshInMemoryKeyStore:103 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Refreshed in-memory key store - 2 aliases stored
      
      **[2024-03-07 17:41:12.894] DEBUG [pool-5-thread-1] CompositeTrustManager.lambda$anyTrustManagerTrustsServerCertificateChain$1:91 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Trust manager sun.security.ssl.X509TrustManagerImpl@72c0c2cc doesn't trust server certificate chain
      
      [2024-03-07 17:41:12.895] DEBUG [pool-5-thread-1] CompositeTrustManager.lambda$anyTrustManagerTrustsServerCertificateChain$1:91 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Trust manager sun.security.ssl.X509TrustManagerImpl@61a1843c doesn't trust server certificate chain**
      
      [2024-03-07 17:41:12.897] ERROR [pool-5-thread-1] ChunkApiDownloader.lambda$getFileSizeRetryPolicy$2:157 
      [039badd6-c94d-44ad-b8e2-8ac8b5f58761] Attempt failed
      

      Maybe it gives some clue on what's happening.

    • Hello again.

      After doing what @krzysztof-sz suggested all backups started failing with the same error.

      Today I tried to remove the Hypervisor Manager and add it again but keep getting the same error.

      Here is the backup log with debug mode enable: https://pastebin.com/XKi437bK

      Any help would be great as I can no longer protect these VM's.

    • Hello again.

      Is there anything I could do this solve this issue?

      Thanks!

    • Hello

      It looks that you do not clear certs and not set after that Trust All Certificates

      It should be looked like this in tab Certificates:
      bd1a13e4-53a4-4408-9e69-07d4fc63728f-image.png

      you also should set same in all hypervisors inside hvm:
      d0c39fb2-9773-41ca-8b22-3c1bf81eda99-image.png

      After clear certs, and set up TCA please rerun Inventory Sync and run backups.

      If you tried it, you could clear all info about HVisors removing them from that tab. Please be careful about that, that you should check that you switched off autoremove non-present vms in Backup Policy:
      511aab1d-abc0-43b4-acf9-fb4bba7b55f9-image.png

      Regards

    • @krzysztof-sz Yes, I tried that, didn't work.
      Could do 1 backup, the next one failed.
      I think I'll just reinstall it all.