boltons/docs/urlutils.rst

190 lines
5.8 KiB
ReStructuredText

``urlutils`` - Structured URL
=============================
.. automodule:: boltons.urlutils
.. versionadded:: 17.2
The URL type
------------
.. autoclass:: boltons.urlutils.URL
.. attribute:: URL.scheme
The scheme is an ASCII string, normally lowercase, which
specifies the semantics for the rest of the URL, as well as
network protocol in many cases. For example, "http" in
"http://hatnote.com".
.. attribute:: URL.username
The username is a string used by some schemes for
authentication. For example, "public" in
"ftp://public@example.com".
.. attribute:: URL.password
The password is a string also used for
authentication. Technically deprecated by `RFC 3986 Section
7.5`_, they're still used in cases when the URL is private or
the password is public. For example "password" in
"db://private:password@127.0.0.1".
.. _RFC 3986 Section 7.5: https://tools.ietf.org/html/rfc3986#section-7.5
.. attribute:: URL.host
The host is a string used to resolve the network location of the
resource, either empty, a domain, or IP address (v4 or
v6). "example.com", "127.0.0.1", and "::1" are all good examples
of host strings.
Per spec, fully-encoded output from :attr:`~URL.to_text()` is
`IDNA encoded`_ for compatibility with DNS.
.. _IDNA encoded: https://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_IDNA_encoding
.. attribute:: URL.port
The port is an integer used, along with :attr:`host`, in
connecting to network locations. ``8080`` is the port in
"http://localhost:8080/index.html".
.. note::
As is the case for 80 for HTTP and 22 for SSH, many schemes
have default ports, and `Section 3.2.3 of RFC 3986`_ states
that when a URL's port is the same as its scheme's default
port, the port should not be emitted::
>>> URL(u'https://github.com:443/mahmoud/boltons').to_text()
u'https://github.com/mahmoud/boltons'
Custom schemes can register their port with
:func:`~boltons.urlutils.register_scheme`. See
:attr:`URL.default_port` for more info.
.. _Section 3.2.3 of RFC 3986: https://tools.ietf.org/html/rfc3986#section-3.2.3
.. attribute:: URL.path
The string starting with the first leading slash after the
authority part of the URL, ending with the first question
mark. Often percent-quoted for network use. "/a/b/c" is the path
of "http://example.com/a/b/c?d=e".
.. attribute:: URL.path_parts
The :class:`tuple` form of :attr:`~URL.path`, split on
slashes. Empty slash segments are preserved, including that of
the leading slash::
>>> url = URL(u'http://example.com/a/b/c')
>>> url.path_parts
(u'', u'a', u'b', u'c')
.. attribute:: URL.query_params
An instance of :class:`~boltons.urlutils.QueryParamDict`, an
:class:`~boltons.dictutils.OrderedMultiDict` subtype, mapping
textual keys and values which follow the first question mark
after the :attr:`path`. Also available as the handy alias
``qp``::
>>> url = URL('http://boltons.readthedocs.io/en/latest/?utm_source=docs&sphinx=ok')
>>> url.qp.keys()
[u'utm_source', u'sphinx']
Also percent-encoded for network use cases.
.. attribute:: URL.fragment
The string following the first '#' after the
:attr:`query_params` until the end of the URL. It has no
inherent internal structure, and is percent-quoted.
.. automethod:: URL.from_parts
.. automethod:: URL.to_text
.. autoattribute:: URL.default_port
.. autoattribute:: URL.uses_netloc
.. automethod:: URL.get_authority
.. automethod:: URL.normalize
.. automethod:: URL.navigate
Related functions
~~~~~~~~~~~~~~~~~
.. autofunction:: boltons.urlutils.find_all_links
.. autofunction:: boltons.urlutils.register_scheme
Low-level functions
-------------------
A slew of functions used internally by :class:`~boltons.urlutils.URL`.
.. autofunction:: boltons.urlutils.parse_url
.. autofunction:: boltons.urlutils.parse_host
.. autofunction:: boltons.urlutils.parse_qsl
.. autofunction:: boltons.urlutils.resolve_path_parts
.. autoclass:: boltons.urlutils.QueryParamDict
:members:
Quoting
~~~~~~~
URLs have many parts, and almost as many individual "quoting"
(encoding) strategies.
.. autofunction:: boltons.urlutils.quote_userinfo_part
.. autofunction:: boltons.urlutils.quote_path_part
.. autofunction:: boltons.urlutils.quote_query_part
.. autofunction:: boltons.urlutils.quote_fragment_part
There is however, only one unquoting strategy:
.. autofunction:: boltons.urlutils.unquote
Useful constants
----------------
.. attribute:: boltons.urlutils.SCHEME_PORT_MAP
A mapping of URL schemes to their protocols' default
ports. Painstakingly assembled from the `IANA scheme registry`_,
`port registry`_, and independent research.
Keys are lowercase strings, values are integers or None, with None
indicating that the scheme does not have a default port (or may not
support ports at all)::
>>> boltons.urlutils.SCHEME_PORT_MAP['http']
80
>>> boltons.urlutils.SCHEME_PORT_MAP['file']
None
See :attr:`URL.port` for more info on how it is used. See
:attr:`~boltons.urlutils.NO_NETLOC_SCHEMES` for more scheme info.
Also `available in JSON`_.
.. _IANA scheme registry: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
.. _port registry: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml
.. _available in JSON: https://gist.github.com/mahmoud/2fe281a8daaff26cfe9c15d2c5bf5c8b
.. attribute:: boltons.urlutils.NO_NETLOC_SCHEMES
This is a :class:`set` of schemes explicitly do not support network
resolution, such as "mailto" and "urn".