Indexing packages that have been uploaded to packagecloud and making them available in the package manager's metadata is handled by a set of background jobs. Adding additional parallelization to this process differs depending on the package type being indexed. Certain types of package metadata cannot be handled concurrently due to technical restrictions of that metadata format.

In order to adjust the number of background workers handling indexing, you should modify your /etc/packagecloud/packagecloud.rb file and set the resque['index_worker_count'] option to the number of index worker processes. Read the following sections to understand how parallelization affects the different package types.

Fast indexing requires:

In our internal tests we are able to achieve the following reindexing times for APT and YUM repositories:

Debian packages and APT repositories

The APT repository metadata for a single repository for a single version of Debian or Ubuntu cannot be parallelized by multiple processes. If you upload packages to different repositories, or different Ubuntu/Debian versions within the same repository, these operations can be handled in parallel.

Examples to help illustrate what can and cannot be parallelized:

You can increase reindex speed by disabling file list metadata for your repository. This metadata is very large, takes a lot of time to generate, and is generally only used by a small percentage of users. Disabling this metadata for APT repositories will mean that users will no longer be able to run "apt file" to determine which package from the repository provided the specified file. For extremely large packages (e.g., Chef Omnibus style packages) file list metadata may not be particularly useful – a user probably knows that all files in /opt/example are from the Example package.

You'll need to determine if you want to support filelist metadata, but for large repositories index time can be sped up significantly if this is disabled.

RPM packages and YUM repositories

The YUM repository metadata for a single repository for a single version of CentOS, Enterprise Linux, or any other YUM-based system for a particular CPU architecture can be parallelized. If you upload multiple packages for the same repository, with the same version of Enterprise Linux, CentOS, with the same CPU architecture - these uploads cannot be parallelized.

Examples to help illustrate what can and cannot be parallelized:

You can increase reindex speed by disabling file list metadata for your repository. This metadata is very large, takes a lot of time to generate, and is generally only used by a small percentage of users. Disabling this metadata for YUM repositories will mean that users will no longer be able to run "yum whatprovides" to determine which package from the repository provided the specified file. For extremely large packages (e.g., Chef Omnibus style packages) file list metadata may not be particularly useful – a user probably knows that all files in /opt/example are from the Example package.

You'll need to determine if you want to support filelist metadata, but for large repositories index time can be sped up significantly if this is disabled.

Python packages and PyPI repositories

PyPI metadata can only be generated serially for a particular repository. Adding additional worker processes will not enable faster reindexing of PyPI repositories.

Multiple repositories can be reindexed in parallel. Adding additional worker processes will allow you to reindex two separate repositories concurrently.

Node.js and NPM registries

Multiple Node.js uploads to the same repository will cause multiple reindex jobs to be queue, which can be processed concurrently for distinct packages within the same repository. Multuple uploads of the same package (e.g. example-1.0, example-1.1, example-1.5, ...) in the same repository are processed serially.

Adding additional worker processes will allow multiple distinct packages in the same repository to be reindexed concurrently.

RubyGem packages and repositories

Multiple RubyGem uploads to different repositories will cause multiple reindex jobs to be queued, which can be processed concurrently for each of the repositories. Reindex jobs for the same RubyGem repository will be processed serially.

Adding additional worker processes will allow multiple distinct RubyGem repositories to be reindexed concurrently.

Java packages and Maven repositories

Currently, no background processes are used for generating Maven repositories, so no worker jobs are necessary for this repository type.