215 lines
6.6 KiB
Plaintext
215 lines
6.6 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# HttpParser\n",
|
|
"\n",
|
|
"`HttpParser` class is at the heart of everything related to HTTP. It is used by `Web server` and `Proxy server` core and their plugin eco-system. As the name suggests, it is capable of parsing both HTTP request and response packets. It can also parse HTTP look-a-like protocols like ICAP, SIP etc. Most importantly, remember that `HttpParser` was originally written to handle HTTP packets arriving in the context of a proxy server and till date its default behavior favors the same flavor."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## HTTP Web Requests\n",
|
|
"\n",
|
|
"Let's parse a typical HTTP web request using `HttpParser`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"b'GET / HTTP/1.1\\r\\nHost: jaxl.com\\r\\n\\r\\n'\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from proxy.http.methods import httpMethods\n",
|
|
"from proxy.http.parser import HttpParser, httpParserTypes, httpParserStates\n",
|
|
"from proxy.common.constants import HTTP_1_1\n",
|
|
"\n",
|
|
"get_request = HttpParser(httpParserTypes.REQUEST_PARSER)\n",
|
|
"get_request.parse(memoryview(b'GET / HTTP/1.1\\r\\nHost: jaxl.com\\r\\n\\r\\n'))\n",
|
|
"\n",
|
|
"# Rebuild the raw request\n",
|
|
"print(get_request.build())\n",
|
|
"\n",
|
|
"assert get_request.is_complete\n",
|
|
"assert get_request.method == httpMethods.GET\n",
|
|
"assert get_request.version == HTTP_1_1\n",
|
|
"assert get_request.host == None\n",
|
|
"assert get_request.port == 80\n",
|
|
"assert get_request._url != None\n",
|
|
"assert get_request._url.remainder == b'/'\n",
|
|
"assert get_request.has_header(b'host')\n",
|
|
"assert get_request.header(b'host') == b'jaxl.com'\n",
|
|
"assert len(get_request.headers) == 1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"> NOTE:\n",
|
|
">\n",
|
|
"> `get_request.host` is `None`\n",
|
|
"> \n",
|
|
"> We use `get_request.header(b'host') == b'jaxl.com'` to get the expected host value"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## HTTP Proxy Requests\n",
|
|
"\n",
|
|
"Next, let's parse a HTTP proxy request using `HttpParser`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"b'GET / HTTP/1.1\\r\\nHost: jaxl.com\\r\\n\\r\\n'\n",
|
|
"b'GET http://jaxl.com:80/ HTTP/1.1\\r\\nHost: jaxl.com\\r\\n\\r\\n'\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"proxy_request = HttpParser(httpParserTypes.REQUEST_PARSER)\n",
|
|
"proxy_request.parse(memoryview(b'GET http://jaxl.com/ HTTP/1.1\\r\\nHost: jaxl.com\\r\\n\\r\\n'))\n",
|
|
"\n",
|
|
"print(proxy_request.build())\n",
|
|
"print(proxy_request.build(for_proxy=True))\n",
|
|
"\n",
|
|
"assert proxy_request.is_complete\n",
|
|
"assert proxy_request.method == httpMethods.GET\n",
|
|
"assert proxy_request.version == HTTP_1_1\n",
|
|
"assert proxy_request.host == b'jaxl.com'\n",
|
|
"assert proxy_request.port == 80\n",
|
|
"assert proxy_request._url != None\n",
|
|
"assert proxy_request._url.remainder == b'/'\n",
|
|
"assert proxy_request.has_header(b'host')\n",
|
|
"assert proxy_request.header(b'host') == b'jaxl.com'\n",
|
|
"assert len(proxy_request.headers) == 1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"> NOTE:\n",
|
|
">\n",
|
|
"> Notice how `proxy_request.build()` and `proxy_request.build(for_proxy=True)` behave\n",
|
|
">\n",
|
|
"> Also, here `proxy_request.host` field is populated\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## HTTPS Proxy Requests\n",
|
|
"\n",
|
|
"To conclude, let's parse a HTTPS proxy request"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"b'CONNECT / HTTP/1.1\\r\\nHost: jaxl.com:443\\r\\n\\r\\n'\n",
|
|
"b'CONNECT jaxl.com:443 HTTP/1.1\\r\\nHost: jaxl.com:443\\r\\n\\r\\n'\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"connect_request = HttpParser(httpParserTypes.REQUEST_PARSER)\n",
|
|
"connect_request.parse(memoryview(b'CONNECT jaxl.com:443 HTTP/1.1\\r\\nHost: jaxl.com:443\\r\\n\\r\\n'))\n",
|
|
"\n",
|
|
"print(connect_request.build())\n",
|
|
"print(connect_request.build(for_proxy=True))\n",
|
|
"\n",
|
|
"assert connect_request.is_complete\n",
|
|
"assert connect_request.is_https_tunnel\n",
|
|
"assert connect_request.version == HTTP_1_1\n",
|
|
"assert connect_request.host == b'jaxl.com'\n",
|
|
"assert connect_request.port == 443\n",
|
|
"assert connect_request._url != None\n",
|
|
"assert connect_request._url.remainder == None\n",
|
|
"assert connect_request.has_header(b'host')\n",
|
|
"assert connect_request.header(b'host') == b'jaxl.com:443'\n",
|
|
"assert len(connect_request.headers) == 1"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Take Away\n",
|
|
"\n",
|
|
"- `host` and `port` attributes represent the `host:port` present in the HTTP packet request line. For comparison purposes, below are all the three request lines again. Notice how there is no `host:port` available only for the web server get request\n",
|
|
" | Request Type | Request Line |\n",
|
|
" | ------------------| ---------------- |\n",
|
|
" | `get_request` | `GET / HTTP/1.1` |\n",
|
|
" | `proxy_request` | `GET http://jaxl.com HTTP/1.1` |\n",
|
|
" | `connect_request` | `CONNECT jaxl.com:443 HTTP/1.1` |\n",
|
|
"- `_url` attribute is an instance of `Url` parser and contains parsed information about the URL found in the request line\n",
|
|
"\n",
|
|
"Few of the other handy properties within `HttpParser` are:\n",
|
|
"\n",
|
|
"- `is_complete`\n",
|
|
"- `is_http_1_1_keep_alive`\n",
|
|
"- `is_connection_upgrade`\n",
|
|
"- `is_https_tunnel`\n",
|
|
"- `is_chunked_encoded`\n",
|
|
"- `content_expected`\n",
|
|
"- `body_expected`"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"interpreter": {
|
|
"hash": "da9d6927d62b2b95bde149eedfbd5367cb7f465aad65a736f49c99ee3db39df7"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.10.0 64-bit ('venv310': venv)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.0"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|