2017-03-16 07:00:52 +00:00
|
|
|
``urlutils`` - Structured URL
|
|
|
|
=============================
|
|
|
|
|
|
|
|
.. automodule:: boltons.urlutils
|
|
|
|
|
2017-03-17 18:50:55 +00:00
|
|
|
.. versionadded:: 17.2
|
|
|
|
|
2017-03-18 20:50:42 +00:00
|
|
|
The URL type
|
|
|
|
------------
|
2017-03-16 07:00:52 +00:00
|
|
|
|
|
|
|
.. autoclass:: boltons.urlutils.URL
|
2017-03-18 20:50:42 +00:00
|
|
|
|
|
|
|
.. attribute:: URL.scheme
|
|
|
|
|
|
|
|
The scheme is an ASCII string, normally lowercase, which
|
|
|
|
specifies the semantics for the rest of the URL, as well as
|
|
|
|
network protocol in many cases. For example, "http" in
|
|
|
|
"http://hatnote.com".
|
|
|
|
|
|
|
|
.. attribute:: URL.username
|
|
|
|
|
|
|
|
The username is a string used by some schemes for
|
|
|
|
authentication. For example, "public" in
|
|
|
|
"ftp://public@example.com".
|
|
|
|
|
|
|
|
.. attribute:: URL.password
|
|
|
|
|
|
|
|
The password is a string also used for
|
|
|
|
authentication. Technically deprecated by `RFC 3986 Section
|
|
|
|
7.5`_, they're still used in cases when the URL is private or
|
|
|
|
the password is public. For example "password" in
|
|
|
|
"db://private:password@127.0.0.1".
|
|
|
|
|
|
|
|
.. _RFC 3986 Section 7.5: https://tools.ietf.org/html/rfc3986#section-7.5
|
|
|
|
|
|
|
|
.. attribute:: URL.host
|
|
|
|
|
|
|
|
The host is a string used to resolve the network location of the
|
|
|
|
resource, either empty, a domain, or IP address (v4 or
|
|
|
|
v6). "example.com", "127.0.0.1", and "::1" are all good examples
|
|
|
|
of host strings.
|
|
|
|
|
|
|
|
Per spec, fully-encoded output from :attr:`~URL.to_text()` is
|
|
|
|
`IDNA encoded`_ for compatibility with DNS.
|
|
|
|
|
|
|
|
.. _IDNA encoded: https://en.wikipedia.org/wiki/Internationalized_domain_name#Example_of_IDNA_encoding
|
|
|
|
|
|
|
|
.. attribute:: URL.port
|
|
|
|
|
|
|
|
The port is an integer used, along with :attr:`host`, in
|
|
|
|
connecting to network locations. ``8080`` is the port in
|
|
|
|
"http://localhost:8080/index.html".
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
2017-03-19 00:27:31 +00:00
|
|
|
As is the case for 80 for HTTP and 22 for SSH, many schemes
|
|
|
|
have default ports, and `Section 3.2.3 of RFC 3986`_ states
|
|
|
|
that when a URL's port is the same as its scheme's default
|
|
|
|
port, the port should not be emitted::
|
2017-03-18 20:50:42 +00:00
|
|
|
|
|
|
|
>>> URL(u'https://github.com:443/mahmoud/boltons').to_text()
|
|
|
|
u'https://github.com/mahmoud/boltons'
|
|
|
|
|
|
|
|
Custom schemes can register their port with
|
|
|
|
:func:`~boltons.urlutils.register_scheme`. See
|
|
|
|
:attr:`URL.default_port` for more info.
|
|
|
|
|
2017-03-19 00:33:04 +00:00
|
|
|
.. _Section 3.2.3 of RFC 3986: https://tools.ietf.org/html/rfc3986#section-3.2.3
|
2017-03-19 00:27:31 +00:00
|
|
|
|
2017-03-18 20:50:42 +00:00
|
|
|
.. attribute:: URL.path
|
|
|
|
|
|
|
|
The string starting with the first leading slash after the
|
|
|
|
authority part of the URL, ending with the first question
|
|
|
|
mark. Often percent-quoted for network use. "/a/b/c" is the path
|
|
|
|
of "http://example.com/a/b/c?d=e".
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: URL.path_parts
|
|
|
|
|
|
|
|
The :class:`tuple` form of :attr:`~URL.path`, split on
|
|
|
|
slashes. Empty slash segments are preserved, including that of
|
|
|
|
the leading slash::
|
|
|
|
|
|
|
|
>>> url = URL(u'http://example.com/a/b/c')
|
|
|
|
>>> url.path_parts
|
|
|
|
(u'', u'a', u'b', u'c')
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: URL.query_params
|
|
|
|
|
|
|
|
An instance of :class:`~boltons.urlutils.QueryParamDict`, an
|
|
|
|
:class:`~boltons.dictutils.OrderedMultiDict` subtype, mapping
|
|
|
|
textual keys and values which follow the first question mark
|
|
|
|
after the :attr:`path`. Also available as the handy alias
|
|
|
|
``qp``::
|
|
|
|
|
|
|
|
>>> url = URL('http://boltons.readthedocs.io/en/latest/?utm_source=docs&sphinx=ok')
|
|
|
|
>>> url.qp.keys()
|
|
|
|
[u'utm_source', u'sphinx']
|
|
|
|
|
|
|
|
Also percent-encoded for network use cases.
|
|
|
|
|
|
|
|
.. attribute:: URL.fragment
|
|
|
|
|
|
|
|
The string following the first '#' after the
|
|
|
|
:attr:`query_params` until the end of the URL. It has no
|
|
|
|
inherent internal structure, and is percent-quoted.
|
|
|
|
|
|
|
|
.. automethod:: URL.from_parts
|
|
|
|
.. automethod:: URL.to_text
|
|
|
|
|
|
|
|
.. autoattribute:: URL.default_port
|
|
|
|
.. autoattribute:: URL.uses_netloc
|
|
|
|
|
|
|
|
.. automethod:: URL.get_authority
|
|
|
|
|
|
|
|
.. automethod:: URL.normalize
|
|
|
|
.. automethod:: URL.navigate
|
|
|
|
|
|
|
|
|
2017-03-16 07:00:52 +00:00
|
|
|
|
|
|
|
Related functions
|
|
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. autofunction:: boltons.urlutils.find_all_links
|
|
|
|
|
|
|
|
.. autofunction:: boltons.urlutils.register_scheme
|
|
|
|
|
|
|
|
|
|
|
|
Low-level functions
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
A slew of functions used internally by :class:`~boltons.urlutils.URL`.
|
|
|
|
|
|
|
|
.. autofunction:: boltons.urlutils.parse_url
|
|
|
|
.. autofunction:: boltons.urlutils.parse_host
|
|
|
|
.. autofunction:: boltons.urlutils.parse_qsl
|
|
|
|
.. autofunction:: boltons.urlutils.resolve_path_parts
|
|
|
|
|
|
|
|
.. autoclass:: boltons.urlutils.QueryParamDict
|
|
|
|
:members:
|
|
|
|
|
|
|
|
Quoting
|
|
|
|
~~~~~~~
|
|
|
|
|
|
|
|
URLs have many parts, and almost as many individual "quoting"
|
|
|
|
(encoding) strategies.
|
|
|
|
|
|
|
|
.. autofunction:: boltons.urlutils.quote_userinfo_part
|
|
|
|
.. autofunction:: boltons.urlutils.quote_path_part
|
|
|
|
.. autofunction:: boltons.urlutils.quote_query_part
|
|
|
|
.. autofunction:: boltons.urlutils.quote_fragment_part
|
|
|
|
|
|
|
|
There is however, only one unquoting strategy:
|
|
|
|
|
|
|
|
.. autofunction:: boltons.urlutils.unquote
|
2017-03-18 20:50:42 +00:00
|
|
|
|
|
|
|
Useful constants
|
|
|
|
----------------
|
|
|
|
|
|
|
|
.. attribute:: boltons.urlutils.SCHEME_PORT_MAP
|
|
|
|
|
|
|
|
A mapping of URL schemes to their protocols' default
|
|
|
|
ports. Painstakingly assembled from the `IANA scheme registry`_,
|
|
|
|
`port registry`_, and independent research.
|
|
|
|
|
|
|
|
Keys are lowercase strings, values are integers or None, with None
|
|
|
|
indicating that the scheme does not have a default port (or may not
|
|
|
|
support ports at all)::
|
|
|
|
|
|
|
|
>>> boltons.urlutils.SCHEME_PORT_MAP['http']
|
|
|
|
80
|
|
|
|
>>> boltons.urlutils.SCHEME_PORT_MAP['file']
|
|
|
|
None
|
|
|
|
|
|
|
|
See :attr:`URL.port` for more info on how it is used. See
|
|
|
|
:attr:`~boltons.urlutils.NO_NETLOC_SCHEMES` for more scheme info.
|
|
|
|
|
|
|
|
Also `available in JSON`_.
|
|
|
|
|
|
|
|
.. _IANA scheme registry: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
|
|
|
|
.. _port registry: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml
|
|
|
|
.. _available in JSON: https://gist.github.com/mahmoud/2fe281a8daaff26cfe9c15d2c5bf5c8b
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: boltons.urlutils.NO_NETLOC_SCHEMES
|
|
|
|
|
|
|
|
This is a :class:`set` of schemes explicitly do not support network
|
|
|
|
resolution, such as "mailto" and "urn".
|