OCR Subnet Tutorial
In this tutorial you will learn how to quickly convert your validated idea into a functional Bittensor subnet. This tutorial begins with a Python notebook that contains the already validated code for optical character recognition (OCR). We demonstrate how straightforward it is to start with such notebooks and produce a working subnet.
Motivation
Bittensor subnets are:
- Naturally suitable for continuous improvement of the subnet miners.
- High throughput environments to accomplish such improvements.
This is the motivation for creating an OCR subnet for this tutorial. By using the OCR subnet, one can extract the text from an entire library of books in a matter of hours or days. Moreover, when we expose the subnet miners, during training, to examples of real-world use-cases, the OCR subnet can be fine-tuned to be maximally effective.
Takeaway lessons
When you complete this tutorial, you will know the following:
- How to convert your Python notebook containing the validated idea into a working Bittensor subnet.
- How to use the Bittensor Subnet Template to accomplish this goal.
- How to perform subnet validation and subnet mining.
- How to design your own subnet incentive mechanism.
Tutorial code
Python notebook
The Python notebook we use in this tutorial contains all the three essential components of the OCR subnet:
OCR subnet repository
- We will use the OCR subnet repository as our starting point and then incorporate the notebook code to build the OCR subnet.
Tutorial method
For the rest of this tutorial we will proceed by demonstrating which blocks of Python notebook code are copied into specific sections of the OCR subnet repository.
Prerequisites
Required reading
If you are new to Bittensor, read the following sections before you proceed:
- Introduction that describes how subnets form the heartbeat of the Bittensor network.
- Bittensor Building Blocks that presents the basic building blocks you use to develop your subnet incentive mechanism.
- Anatomy of Incentive Mechanism that introduces the general concept of a subnet incentive mechanism.
OCR subnet summary
This tutorial OCR subnet works like this. The below numbered items correspond to the numbers in the diagram:
- The subnet validator sends a challenge simultaneously to multiple subnet miners. In this tutorial the challenge consists of an image file of a synthetic invoice document. The serialized image file is attached to a synapse object called
OCRSynapse
. This step constitutes the query from the subnet validator to subnet miners. - The subnet miners respond after performing the challenge task. After receiving the synapse object containing the image data, each miner then performs the task of extracting, from the image data, its contents, including the text content, the positional information of the text, the fonts used in the text and the font size.
- The subnet validator then scores each subnet miner based on the quality of the response and how quickly the miner completed the task. The subnet validator uses the original synthetic invoice document as the ground truth for this step.
- Finally, the subnet validator sets the weights for the subnet miners by sending the weights to the blockchain.
Step 1: Generate challenge and query the miners
Step 1.1: Synthetic PDF as challenge
In this tutorial, the subnet validator will generate synthetic data, which is a PDF document containing an invoice. The subnet validator will use this synthetic PDF as the basis for assessing the subnet miner performance. Synthetic data is an appropriate choice as it provides an unlimited source of customizable validation data. It also enables the subnet validators to gradually increase the difficulty of the task so that the miners are required to continuously improve. This is in contrast to using a pre-existing dataset from the web, where subnet miners can "lookup" the answers on the web.
The contents of the PDF document are the ground truth labels. The subnet validator uses them to score the miner responses. The synthetic PDF document is corrupted with different types of noise to mimic poorly scanned documents. The amount of noise can also be gradually increased to make the task more challenging.
To generate this challenge, the subnet validator applies the following steps:
- Creates a synthetic invoice document using the Python Faker library.
- Converts this synthetic data into PDF using ReportLab Python library.
- Finally, the validator creates the challenge by converting this PDF into a corrupted image, called
noisy_image
.
Code snapshot
See below for a snapshot view of the code.
# Generates a PDF invoice from the raw data passed in as "invoice_data" dictionary
# and saves the PDF with "filename"
def create_invoice(invoice_data, filename):
...
# Using Faker, generate sample data for the invoice
invoice_info = {
"company_name": fake.company(),
"company_address": fake.address(),
"company_city_zip": f'{fake.city()}, {fake.zipcode()}',
...
}
...
# Pass the "invoice_info" containing the Faker-generated raw data
# to create_invoice() method and generate the synthetic invoice PDF
pdf_filename = "sample_invoice.pdf"
data = create_invoice(invoice_info, pdf_filename)
...
# Loads PDF and converts it into usable PIL image using Pillow library
# Used by the corrupt_image() method
def load_image(pdf_path, page=0, zoom_x=1.0, zoom_y=1.0):
...
# Accepts a PDF, uses load_image() method to convert to image
# and adds noise, blur, spots, rotates the page, curls corners, darkens edges so
# that the overall result is noisy. Saves back in PDF format.
# This is our corrupted synthetic PDF document.
def corrupt_image(input_pdf_path, output_pdf_path, border=50, noise=0.1, spot=(100,100), scale=0.95, theta=0.2, blur=0.5):
...
Collab Notebook source: The validated code for the above synthetic PDF generation logic is in Validation flow cell.
All we have to do is to copy the above Notebook code into a proper place in the OCR subnet repo.
...
├── ocr_subnet
│ ├── __init__.py
│ ...
│ └── validator
│ ├── __init__.py
│ ├── corrupt.py
│ ├── forward.py
│ ├── generate.py
│ ├── reward.py
│ └── utils.py
...
We copy the above Notebook code into the following code files. Click on the OCR repo file names to see the copied code:
Python Notebook source | OCR repo destination |
---|---|
Methods: create_invoice , random_items , load_image , and lists items_list and invoice_info and all the import statements in cell 34. | ocr_subnet/validator/generate.py |
Method: corrupt_image | ocr_subnet/validator/corrupt.py |
Step 1.2: Query miners
Next, the subnet validator sends this noisy_image
to the miners, tasking them to perform OCR and content extraction.
Collab Notebook source: In the validated Collab Notebook code, this step is accomplished by directly passing the path information of the noisy_image
from the Validator cell to the miner.
Define OCRSynapse class
However, in a Bittensor subnet, any communication between a subnet validator and a subnet miner must use an object of the type Synapse
. Hence, the subnet validator must attach the corrupted image to a Synapse
object and send this object to the miners. The miners will then update the passed synapse by attaching their responses into this same object and send them back to the subnet validator.
Code snapshot
# OCRSynapse class, using bt.Synapse as its base.
# This protocol enables communication between the miner and the validator.
# Attributes:
# - image: A pdf image to be processed by the miner.
# - response: List[dict] containing data extracted from the image.
class OCRSynapse(bt.Synapse):
"""
A simple OCR synapse protocol representation which uses bt.Synapse as its base.
This protocol enables communication between the miner and the validator.
Attributes:
- image: A pdf image to be processed by the miner.
- response: List[dict] containing data extracted from the image.
"""
# Required request input, filled by sending dendrite caller. It is a base64 encoded string.
base64_image: str
# Optional request output, filled by receiving axon.
response: typing.Optional[typing.List[dict]] = None
The OCRSynapse
object can only contain serializable objects. This is because both the subnet validators and the subnet miners must be able to deserialize after receiving the object.
See the OCRSynapse
class definition in ocr_subnet/protocol.py.
...
├── ocr_subnet
│ ├── __init__.py
│ ├── base
│ │ ├── __init__.py
│ │ ...
│ ├── protocol.py
...
Send OCRSynapse to miners
With the OCRSynapse
class defined, next we use the network client dendrite
of the subnet validator to send queries to the Axon
server of the subnet miners.