Pietro Albini

Downloading all the crates on crates.io

Written on April 9, 2020.

There are a lot of reasons you might want to down­load all the crates ever uploaded to crates.io, Rust’s pack­age registry: code analy­sis across the whole public ecosys­tem, host­ing a mirror for your compa­ny, or count­less other ideas and projects.

The team behind crates.io receives a lot of support request asking what’s the best and least impact­ful way to do this, so here is a little guide on how to do that!

Getting a list of all the crates

crates.io offers multi­ple way to inter­act with its data: the crates.io-index GitHub repos­i­to­ry, exper­i­men­tal daily data­base dumps and the crates.io API.

The way I recom­mend to get the list of all the crates is to rely on the index: the exper­i­men­tal data­base dumps are more heavy­weight and are only updated daily, while usage of the API is governed by the crawlers policy (lim­it­ing you to one API call per second). If you absolutely need to use the API please talk with us by email­ing help@crates.io, and we’ll figure out a solution.

The index is a git repository, and the format of its content is defined by RFC 2141. There are crates such as crates-index that allow you to easily query its contents, and I recom­mend using them when­ever possible.

Downloading the packages

The best way to down­load the pack­ages is to fetch them directly from our CDN. Compared to call­ing the crates.io API, the CDN does not have rate limits and is faster (as the API redi­rects you to the CDN after updat­ing the down­load coun­t). The CDN URLs follow this pattern:

https://static.crates.io/crates/{name}/{name}-{version}.crate

For exam­ple, here is the link to down­load Serde 1.0.0. Pack­ages are tar.gz files.

If you want to ensure the contents of the CDN were not tampered with you can verify the SHA256 check­sum of the file you down­loaded by compar­ing it with the cksum field in the index.

Keeping your local copy up to date

The best way to keep your local copy up to date is to fetch a fresh list of crates avail­able on crates.io and check if all of them are present in the local system, down­load­ing the ones you’re miss­ing. I person­ally recom­mend this approach as it’s less error-prone, and it heals your copy auto­mat­i­cally if for what­ever reason some of the changes are lost during a previ­ous update.

Another inter­est­ing approach you could imple­ment is to get the differ­ence since the last update of the index with git diff, pars­ing its output to get the list of crates that were added. There are also third-­party crates such as crates-index-diff that auto­mate this process for you. This approach is more frag­ile and error-prone, but it might be the only sensi­ble solu­tion if check­ing whether you down­loaded a crate or not is slow or expensive.

Common issues to be aware of

While the basics of down­load­ing the contents of crates.io are simple, there are a couple of issues to be aware of when imple­ment­ing such tooling: