# HttpParser

`HttpParser` class is at the heart of everything related to HTTP. It is used by Web server and Proxy server core and their plugin eco-system. As the name suggests, it is capable of parsing both HTTP request and response packets. It can also parse HTTP look-a-like protocols like ICAP, SIP etc. Most importantly, remember that `HttpParser` was originally written to handle HTTP packets arriving in the context of a proxy server and till date its default behavior favors the same flavor.

Let's start by parsing a HTTP web request using `HttpParser`

In [2]:
from proxy.http.methods import httpMethods
from proxy.http.parser import HttpParser, httpParserTypes, httpParserStates
from proxy.common.constants import HTTP_1_1

get_request = HttpParser(httpParserTypes.REQUEST_PARSER)
get_request.parse(memoryview(b'GET / HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'))

print(get_request.build())

assert get_request.is_complete
assert get_request.method == httpMethods.GET
assert get_request.version == HTTP_1_1
assert get_request.host == None
assert get_request.port == 80
assert get_request._url != None
assert get_request._url.remainder == b'/'
assert get_request.has_header(b'host')
assert get_request.header(b'host') == b'jaxl.com'
assert len(get_request.headers) == 1

b'GET / HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'


Next, let's parse a HTTP proxy request using `HttpParser`

In [3]:
proxy_request = HttpParser(httpParserTypes.REQUEST_PARSER)
proxy_request.parse(memoryview(b'GET http://jaxl.com/ HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'))

print(proxy_request.build())
print(proxy_request.build(for_proxy=True))

assert proxy_request.is_complete
assert proxy_request.method == httpMethods.GET
assert proxy_request.version == HTTP_1_1
assert proxy_request.host == b'jaxl.com'
assert proxy_request.port == 80
assert proxy_request._url != None
assert proxy_request._url.remainder == b'/'
assert proxy_request.has_header(b'host')
assert proxy_request.header(b'host') == b'jaxl.com'
assert len(proxy_request.headers) == 1

b'GET / HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'
b'GET http://jaxl.com:80/ HTTP/1.1\r\nHost: jaxl.com\r\n\r\n'


Notice how `proxy_request.build()` and `proxy_request.build(for_proxy=True)` behave. Also, notice how `proxy_request.host` field is populated for a HTTP proxy packet but not for the prior HTTP web request packet example.

To conclude, let's parse a HTTPS proxy request

In [4]:
connect_request = HttpParser(httpParserTypes.REQUEST_PARSER)
connect_request.parse(memoryview(b'CONNECT jaxl.com:443 HTTP/1.1\r\nHost: jaxl.com:443\r\n\r\n'))

print(connect_request.build())
print(connect_request.build(for_proxy=True))

assert connect_request.is_complete
assert connect_request.is_https_tunnel
assert connect_request.version == HTTP_1_1
assert connect_request.host == b'jaxl.com'
assert connect_request.port == 443
assert connect_request._url != None
assert connect_request._url.remainder == None
assert connect_request.has_header(b'host')
assert connect_request.header(b'host') == b'jaxl.com:443'
assert len(connect_request.headers) == 1

b'CONNECT / HTTP/1.1\r\nHost: jaxl.com:443\r\n\r\n'
b'CONNECT jaxl.com:443 HTTP/1.1\r\nHost: jaxl.com:443\r\n\r\n'


### Take Away

- `host` and `port` attributes represent the `host:port` present in the HTTP packet request line. For comparison purposes, below are all the three request lines again. Notice how there is no `host:port` available only for the web server get request
 | Request Type | Request Line |
 | ------------------| ---------------- |
 | `get_request` | `GET / HTTP/1.1` |
 | `proxy_request` | `GET http://jaxl.com HTTP/1.1` |
 | `connect_request` | `CONNECT jaxl.com:443 HTTP/1.1` |
- `_url` attribute is an instance of `Url` parser and contains parsed information about the URL found in the request line

Few of the other handy properties within `HttpParser` are:

- `is_complete`
- `is_http_1_1_keep_alive`
- `is_connection_upgrade`
- `is_https_tunnel`
- `is_chunked_encoded`
- `content_expected`
- `body_expected`