5 Package-URL Specification

PURL standards for Package-URL.

A purl is a URL composed of seven components:

Table 1: Components of a PURL
Component Requirement Description
scheme Required The URL scheme with the constant value of "pkg". One of the primary reasons for this single scheme is to facilitate the future official registration of the "pkg" scheme for package URLs.
type Required The package "type" or package "protocol" such as maven, npm, nuget, gem, pypi, etc.
namespace Optional A name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization. Namespace is type-specific.
name Required The name of the package.
version Optional The version of the package.
qualifiers Optional Qualifier data for a package such as OS, architecture, repository, etc. Qualifiers are type-specific.
subpath Optional Subpath within a package, relative to the package root.

Components are separated by a specific character for unambiguous parsing. Components are designed such that they form a hierarchy from the most significant on the left to the least significant on the right.

5.1 A PURL is a URL

  • A purl is a valid URL and URI that conforms to the URL definitions or specifications at:
    • https://tools.ietf.org/html/rfc3986
    • https://en.wikipedia.org/wiki/URL#Syntax
    • https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
    • https://url.spec.whatwg.org/
  • A purl is a valid URL because it is a locator even though it has no Authority URL component: each type has a default repository location when defined.
  • The purl components are mapped to these URL components:
    • scheme: this is a URL scheme with a constant value: pkg
    • purl type, namespace, name and version components: these are collectively mapped to a URL path
    • purl qualifiers: this maps to a URL query
    • purl subpath: this is a URL fragment
  • Special URL schemes as defined in https://url.spec.whatwg.org/ such as file://, https://, http:// and ftp:// are NOT valid purl types. They are valid URL or URI schemes but they are not purl. They may be used to reference URLs in separate attributes outside of a purl or in a purl qualifier.
  • Version control system (VCS) URLs such git://, svn://, hg:// or as defined in Python pip or SPDX download locations are NOT valid purl types. They are valid URL or URI schemes but they are not purl. They are a closely related, compact and uniform way to reference VCS URLs. They may be used as references in separate attributes outside of a purl or in a purl qualifier.
  • A purl must NOT contain a URL Authority because there is no support for username, password, host or port components. A namespace segment may sometimes look like a host, but its interpretation is specific to a type.

5.2 Permitted characters

A canonical purl is composed of these permitted ASCII characters:

  • the Alphanumeric Characters: A to Z, a to z, 0 to 9,
  • the Punctuation Characters: .-_ (period '.', dash '-', underscore '_' and tilde ''),
  • the Plus Character: + (plus '+'),
  • the Percent Character: % (percent sign '%'), and
  • the Separator Characters :/@?=&# (colon ':', slash '/', at sign '@', question mark '?', equal sign '=', ampersand '&' and pound sign '#').

5.3 Separator characters

A canonical purl use the following separator characters:

  • ':' (colon) is the separator between scheme and type
  • '/' (slash) is the separator between type, namespace and name
  • '/' (slash) is the separator between subpath segments
  • '@' (at sign) is the separator between name and version
  • '?' (question mark) is the separator before qualifiers
  • '=' (equals) is the separator between a key and a value of a qualifier
  • '&' (ampersand) is the separator between qualifiers (each being a key=value pair)
  • '#' (number sign) is the separator before subpath

5.4 Character encoding

  • In the "Rules for each purl component" section, each component defines when and how to apply percent-encoding and decoding to its content.
  • When percent-encoding is required by a component definition, the component string MUST first be encoded as UTF-8.
  • In the component string, each "data octet" MUST be replaced by the percent-encoded "character triplet" applying the percent-encoding mechanism defined in RFC 3986 section 2.1 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.1), including the RFC definition of "data octet" and "character triplet", and using these definitions for RFC's "allowed set" and "delimiters":
    • "allowed set" is composed of the Alphanumeric Characters and the Punctuation Characters
    • "delimiters" is composed of the Separator Characters
  • The following characters MUST NOT be percent-encoded:
    • the Alphanumeric Characters,
    • the Punctuation Characters,
    • the Separator Characters when being used as purl separators,
    • the colon ':', whether used as a Separator Character or otherwise, and
    • the percent sign '%' when used to represent a percent-encoded character.
  • Where the space ' ' is permitted, it MUST be percent-encoded as '%20'.
  • With the exception of the percent-encoding mechanism, the rules regarding percent-encoding are defined by this specification alone.

5.5 Component-level rules

A purl string is an ASCII URL string composed of seven components. Except as expressly stated otherwise in this section, each component:

  • MAY be composed of any of the characters defined in the "Permitted characters" section
  • MUST be encoded as defined in the "Character encoding" section

The rules for each component are:

5.5.1 Scheme

  • The scheme is a constant with the value "pkg".
  • The scheme MUST be followed by an unencoded colon ':'.
  • PURL parsers MUST accept URLs where the scheme and colon ':' are followed by one or more slash '/' characters, such as 'pkg://', and MUST ignore and remove all such '/' characters.

5.5.2 Type

  • The package type MUST be composed only of ASCII letters and numbers, period '.', plus '+', and dash '-'.
  • The type MUST start with an ASCII letter.
  • The type MUST NOT be percent-encoded.
  • The type is case insensitive. The canonical form is lowercase.

5.5.3 Namespace

  • The namespace is optional, unless required by the package's type definition.
  • If present, the namespace MAY contain one or more segments, separated by a single unencoded slash '/' character.
  • All leading and trailing slashes '/' are not significant and SHOULD be stripped in the canonical form. They are not part of the namespace.
  • Each namespace segment MUST be a percent-encoded string.
  • When percent-decoded, a segment:
    • MUST NOT contain any slash '/' characters.
    • MUST NOT be empty.
    • MAY contain any Unicode character other than '/' unless the package's type definition provides otherwise.
  • A URL host or Authority MUST NOT be used as a namespace. Use instead a repository_url qualifier. Note however that for some types, the namespace may look like a host.

5.5.4 Name

  • The name is prefixed by a single slash '/' separator when the namespace is not empty.
  • All leading and trailing slashes '/' are not significant and SHOULD be stripped in the canonical form. They are not part of the name.
  • A name MUST be a percent-encoded string.
  • When percent-decoded, a name MAY contain any Unicode character unless prohibited by the package's type definition in PURL-TYPES.rst.

5.5.5 Version

  • The version is prefixed by a '@' separator when not empty.
  • This '@' is not part of the version.
  • A version MUST be a percent-encoded string.
  • When percent-decoded, a version MAY contain any Unicode character unless the package's type definition provides otherwise.
  • A version is a plain and opaque string.

5.5.6 Qualifiers

  • The qualifiers component MUST be prefixed by an unencoded question mark '?' separator when not empty. This '?' separator is not part of the qualifiers component.
  • The qualifiers component is composed of one or more key=value pairs. Multiple key=value pairs MUST be separated by an unencoded ampersand '&'. This '&' separator is not part of an individual qualifier.
  • A key and value MUST be separated by the unencoded equal sign '=' character. This '=' separator is not part of the key or value.
  • A value MUST NOT be an empty string: a key=value pair with an empty value is the same as if no key=value pair exists for this key.
  • For each key=value pair:
    • The key MUST be composed only of lowercase ASCII letters and numbers, period '.', dash '-' and underscore '_'.
    • A key MUST start with an ASCII letter.
    • A key MUST NOT be percent-encoded.
    • Each key MUST be unique among all the keys of the qualifiers component.
    • A value MAY be composed of any character and all characters MUST be encoded as described in the "Character encoding" section.

5.5.7 Subpath

  • The subpath string is prefixed by a '#' separator when not empty.
  • This '#' is not part of the subpath.
  • The subpath contains zero or more segments, separated by slash '/'.
  • Leading and trailing slashes '/' are not significant and SHOULD be stripped in the canonical form.
  • Each subpath segment MUST be a percent-encoded string.
  • When percent-decoded, a segment:
    • MUST NOT contain a '/'
    • MUST NOT be any of '..' or '.'
    • MUST NOT be empty
  • The subpath MUST be interpreted as relative to the root of the package