※こちらは旧サイトです(新サイトはこちら

gsutilのrmが遅い時は「-m」をつける

2016-01-08 10:17:34

GCPでCloud Storageに溜まったウン十GBのデータを削除しようとして

$ gsutil rm -rf gs://path/to/backet/

としたのだけど、めちゃくちゃ時間がかかる

実行後、「-mを使うといいよ」というメッセージが見えたので、いったん中断してヘルプを読んでみる

$ gsutil help options
NAME
  options - Top-Level Command-Line Options

    (中略)

  -m          Causes supported operations (acl ch, acl set, cp, mv, rm, rsync,
              and setmeta) to run in parallel. This can significantly improve
              performance if you are performing operations on a large number of
              files over a reasonably fast network connection.

              gsutil performs the specified operation using a combination of
              multi-threading and multi-processing, using a number of threads
              and processors determined by the parallel_thread_count and
              parallel_process_count values set in the boto configuration
              file. You might want to experiment with these values, as the
              best values can vary based on a number of factors, including
              network speed, number of CPUs, and available memory.

              Using the -m option may make your performance worse if you
              are using a slower network, such as the typical network speeds
              offered by non-business home network plans. It can also make
              your performance worse for cases that perform all operations
              locally (e.g., gsutil rsync, where both source and desination URLs
              are on the local disk), because it can "thrash" your local disk.

              If a download or upload operation using parallel transfer fails
              before the entire transfer is complete (e.g. failing after 300 of
              1000 files have been transferred), you will need to restart the
              entire transfer.

              Also, although most commands will normally fail upon encountering
              an error when the -m flag is disabled, all commands will
              continue to try all operations when -m is enabled with multiple
              threads or processes, and the number of failed operations (if any)
              will be reported at the end of the command's execution.

どうやら、acl ch, acl set, cp, mv, rm, rsync, setmetaの実行時は、並列実行が出来るらしい

$ gsutil -m rm -rf gs://path/to/backet/

これで速くなりました