70 lines
3.6 KiB
Plaintext
70 lines
3.6 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Welcome\n",
|
|
"\n",
|
|
"## Background\n",
|
|
"\n",
|
|
"`proxy.py` was released on 20th August, 2013 as a single file HTTP proxy server implementation with no external dependencies. See the [first commit](https://github.com/abhinavsingh/proxy.py/commit/75044a72d9c7b4b8910ba551006b801eafdf3c47) and [read introductory blog](https://abhinavsingh.com/proxy-py-a-lightweight-single-file-http-proxy-server-in-python/) to get an insight about why `proxy.py` was created.\n",
|
|
"\n",
|
|
"## Introduction\n",
|
|
"\n",
|
|
"Today, `proxy.py` has matured into a full blown networking library with focus on being lightweight, ability to deliver maximum performance while being extendible. Unlike other Python servers, `proxy.py` doesn't need a `WSGI` or `UWSI` frontend, which then usually has to be placed behind a reverse proxy e.g. `Nginx` or `Apache`. Of-course, `proxy.py` can be placed directly behind a load-balancer _(optionally capable of speaking HA proxy protocol)_.\n",
|
|
"\n",
|
|
"## Working with proxy.py\n",
|
|
"\n",
|
|
"To work with `proxy.py`, you must follow these critical concepts:\n",
|
|
"\n",
|
|
"1. Avoid using synchronous IO operations within your code\n",
|
|
"\n",
|
|
" `proxy.py` is asynchronous in nature and by making a synchronous call in your plugin code, you may block the entire core event loop. For asynchronous operations, you must tie into the `proxy.py` event loop using the provided plugin APIs.\n",
|
|
"\n",
|
|
"2. Plugin instances are NOT global\n",
|
|
"\n",
|
|
" Plugin instances are created for every request. Hence, your plugin code must be written to handle execution of a single request. `proxy.py` will internally take care of concurrency for you.\n",
|
|
"\n",
|
|
"## The Concept Of Work\n",
|
|
"\n",
|
|
"`proxy.py` core is written with a high level concept of `work`.\n",
|
|
"\n",
|
|
"- A running instance can receive `work` from one or multiple `sources`\n",
|
|
" - Example, when `proxy.py` starts, an accepted client connection is a `work` coming from TCP socket `sources`\n",
|
|
"- Handlers can be written to process various types of `work`\n",
|
|
" - Example, `HttpProtocolHandler` handles HTTP client connections `work`\n",
|
|
"- A client connection can come from a variety of `sources`\n",
|
|
" - TCP sockets\n",
|
|
" - UDP sockets\n",
|
|
" - Unix sockets\n",
|
|
" - Raw sockets\n",
|
|
"\n",
|
|
"In fact, `work` can be any processing unit. It doesn't have to be a client connection. Example:\n",
|
|
"\n",
|
|
"- A file on disk can act as the `source` and each line in that file as the `work` definition\n",
|
|
"- Imagine tailing a file on disk as `source` and processing each line as a separate `work` object\n",
|
|
"- If you want, each line in the file can also be a URL to be scrapped or download\n",
|
|
"- If you want, your `work` handlers can append new URLs _(discovered by scrapping previous URL entries)_ back in the file, creating an infinite feedback loop between the `work` processing core.\n",
|
|
"\n",
|
|
"And just like that we have created a web scraper!!!\n",
|
|
"\n",
|
|
"To extend this generic concept, now imagine a distributed queue as the `source` of our `work`, where each published message in the queue is our `work` payload. Some examples of such `sources` can be:\n",
|
|
"- A `Redis` channel\n",
|
|
"- Google Cloud PubSub channel\n",
|
|
"- Kafka queues\n",
|
|
"\n",
|
|
"And just like that we have created a distributed `work` executor!!!"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"language_info": {
|
|
"name": "python"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|