Nigeria No1. Music site And Complete Entertainment portal for Music Promotion WhatsApp:- +2349077287056
Sunday, 17 April 2022
Show HN: GTR, Toolkit to backup Google Takeout at 6GB/s+ to Azure https://bit.ly/36mdh23
Show HN: GTR, Toolkit to backup Google Takeout at 6GB/s+ to Azure After seeing all those posts about Google accounts being banned for frivolous and automated reasons, I started to use Google Takeout more and more to prepare for the worst. If you aren't aware of what Google Takeout it, it is a Google service that allows you to download archives of all your data from Google. I understand that this may be kind of niche, but if the size of your Google Takeout is large and prohibitive to transfer and backup, this toolkit I made may be right for you. Problem is, my Takeout jobs are 1.25TB as it also includes the videos I've uploaded in my YouTube account. Without them, it's 300GB which is still a very large amount to me. It got really old to be transferring 1.25TB by hand manually. It's a pain to do it even on a gigabit connection and it is also a pain to do it in a VPS. At most I got 300MB/s doing it inside a VPS but every session took an hour or three to complete and it was rather high-touch. The Google Takeout interface is hostile to automation and download links obtained from it are only valid for 15 minutes before you must re-enter your credentials. You can't queue up downloads. Not only that, you must have some temporary storage on whatever computer you have before you send it off to some final archival storage. What a pain! In HN-overkill fashion, I came up with a toolkit to make this whole process much, much faster. I noticed that each connection of a download from Google Takeout archive seemed to be limited to 30MB/s. However, multiple connections scaled this up well. 5 connections, 150MB/s. How about 200 connections? 6000MB/s! I noticed that Azure had functionality to do "server-to-server" transfers of data from public URLs with different data ranges. It seems this is used for built-in transfer of resources from external object storage services such as S3 or GCS. I noticed that you can send as many parallel commands to Azure as you want to do as many transfers in parallel as possible. As it was Google, I'm sure their infrastructure could handle it. I noticed that there were extensions for Chromium browsers that could intercept downloads and get their "final download link". So I glued all this stuff together. Unfortunately, there were some issues with some bugs in Azure that prevented direct downloading of Google links and Azure only exposed their endpoints over HTTP 1.1 which greatly limits the amount of parallel downloads from a browser. I noticed that Cloudflare Workers can be used to overcome all these limitations by base64-ing the Google URLs and being proxied before sending them to Azure and HTTP3-izing the Azure endpoint. Another great thing is that Cloudflare Workers does not care about charging for ingress and egress bandwidth. Also, like Google, Cloudflare has an absurd amount of bandwidth, compute, and peering. With all this combined, I am able to get 6GB/s+ transfers of my 50GB archives from Google Takeout to Azure Storage and am able to back it up periodically without having to setup a VPS, find storage, find bandwidth, or really having any "large" computing or networking resources. I use this toolkit a lot myself and it may be useful for you too if you're in the same situation as me! It takes about an hour to setup, but takes about 3 minutes every time you want to backup. https://bit.ly/37USRh8 April 18, 2022 at 05:00AM
Labels:
Hacker News
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment