apps.deed.models.py
- class apps.deed.models.DeedPage(*args, **kwargs)
Each DeedPage object represents a single page of a property record that has been uploaded, processed, and ingested into the Django portion of the Deed Machine. Multi-page documents like multi-page TIF files will be split into individual DeedPage objects during initial processing.
- Parameters:
id (BigAutoField) – Primary key: ID
s3_lookup (CharField) –
S3 lookup
Unique identifier of this page’s image on S3
doc_num (CharField) –
Doc num
County document number. Not unique as some documents will have several or many pages. Some county documents don’t have document numbers, but rather are identifed by Book and Page. In those cases, a doc_num is constructed by a book and page combination. Doc num can come from S3 metadata extracted via regex or be manually created and linked by using a supplemental information file at the time of Django import after initial processing.
doc_alt_id (CharField) –
Doc alt id
An optional alternate document ID.
batch_id (CharField) –
Batch id
An optional designator of batch, often referring to the parent folder an image was supplied in.
book_id (CharField) –
Book id
Book ID. Not present for all records or all counties.
page_num (IntegerField) –
Page num
Page number of original record. Not present for all records or all counties. Note the difference with split_page_num, which is an automatically generated page number used to differentiate pages programmatically split in the Deed Machine process, usually with multi-page TIF file.
split_page_num (IntegerField) –
Split page num
Alternate page numbering that happens when a multipage image file (generally a TIF) has been split by the ingestion process. Note the difference to page_num, which is a page number value supplied with the original record in the county’s numbering system.
doc_date (DateField) –
Doc date
Date of document.
doc_type (CharField) –
Doc type
Document type, e.g. deed, Torrens certificate, mortgage, etc.
public_uuid (CharField) –
Public uuid
The randomly generated UUID created for the web version of this image in order to deter systematically scraping of publicly visible data. Generated by initial processing stage.
page_image_web (ImageField) –
Page image web
ImageField that stores a link to web-friendly, watermarked JPEG used for transcription
page_stats (FileField) –
Page stats
FileField that stores a link to metadata JSON file generated about the image during the OCR process, including how much is determined to be handwritten.
page_ocr_text (FileField) –
Page ocr text
FileField that stores a link to a .txt file containing the concatenated full text of the page extracted by OCR.
page_ocr_json (FileField) –
Page ocr json
FileField that stores a link to the complete Textract JSON OCR result for this page, including much more metadata and contextural information than page_ocr_text.
bool_match (BooleanField) –
Bool match
Is there a suspected covenant?
bool_exception (BooleanField) –
Bool exception
Was a disqualifying term found that exempts this from transcription? (E.G. a death certificate or military discharge)
bool_manual (BooleanField) –
Bool manual
Has bool_match or bool_exception been manually overwritten?
doc_page_count (IntegerField) –
Doc page count
How many pages does the Deed Machine think make up this document? Aids in Zooniverse setup.
prev_page_image_web (ImageField) –
Prev page image web
ImageField link to web-friendly image of previous page (by Deed Machine’s calculation). Used to show previous page in Zooniverse if this is a potential covenant needing transcription.
next_page_image_web (ImageField) –
Next page image web
ImageField link to web-friendly image of following page (by Deed Machine’s calculation). Used to show following page in Zooniverse if this is a potential covenant needing transcription.
next_next_page_image_web (ImageField) –
Next next page image web
ImageField link to web-friendly image of following page of the following page (e.g. this page + 2) (by Deed Machine’s calculation). Used to show following page of the following page in Zooniverse if this is a potential covenant needing transcription.
page_image_web_highlighted (ImageField) –
Page image web highlighted
ImageField that stores a link to web-friendly, watermarked, and highlighted JPEG used for transcription
prev_page_image_lookup (CharField) –
Prev page image lookup
Same idea as prev_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
next_page_image_lookup (CharField) –
Next page image lookup
Same idea as next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
next_next_page_image_lookup (CharField) –
Next next page image lookup
Same idea as next_next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
Relationship fields:
- Parameters:
workflow (
ForeignKeytoZooniverseWorkflow) –Workflow (related name:
deedpage)Foreign key to associated workflow
zooniverse_subject (
ForeignKeytoZooniverseSubject) –Zooniverse subject (related name:
subject_legacy)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit
zooniverse_subject_1st_page (
ForeignKeytoZooniverseSubject) –Zooniverse subject 1st page (related name:
subject_1st_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
zooniverse_subject_2nd_page (
ForeignKeytoZooniverseSubject) –Zooniverse subject 2nd page (related name:
subject_2nd_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
zooniverse_subject_3rd_page (
ForeignKeytoZooniverseSubject) –Zooniverse subject 3rd page (related name:
subject_3rd_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
matched_terms (
ManyToManyFieldtoMatchTerm) –Matched terms (related name:
deedpage)M2M field storing which potential racial covenant (or exception) terms were matched, if any
- exception DoesNotExist
- exception MultipleObjectsReturned
- batch_id
Type:
CharFieldBatch id
An optional designator of batch, often referring to the parent folder an image was supplied in.
- bool_exception
Type:
BooleanFieldBool exception
Was a disqualifying term found that exempts this from transcription? (E.G. a death certificate or military discharge)
- bool_manual
Type:
BooleanFieldBool manual
Has bool_match or bool_exception been manually overwritten?
- bool_match
Type:
BooleanFieldBool match
Is there a suspected covenant?
- doc_num
Type:
CharFieldDoc num
County document number. Not unique as some documents will have several or many pages. Some county documents don’t have document numbers, but rather are identifed by Book and Page. In those cases, a doc_num is constructed by a book and page combination. Doc num can come from S3 metadata extracted via regex or be manually created and linked by using a supplemental information file at the time of Django import after initial processing.
- doc_page_count
Type:
IntegerFieldDoc page count
How many pages does the Deed Machine think make up this document? Aids in Zooniverse setup.
- matched_terms
Type:
ManyToManyFieldtoMatchTermMatched terms (related name:
deedpage)M2M field storing which potential racial covenant (or exception) terms were matched, if any
- next_next_page_image_lookup
Type:
CharFieldNext next page image lookup
Same idea as next_next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
- next_next_page_image_web
Type:
ImageFieldNext next page image web
ImageField link to web-friendly image of following page of the following page (e.g. this page + 2) (by Deed Machine’s calculation). Used to show following page of the following page in Zooniverse if this is a potential covenant needing transcription.
- next_page_image_lookup
Type:
CharFieldNext page image lookup
Same idea as next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
- next_page_image_web
Type:
ImageFieldNext page image web
ImageField link to web-friendly image of following page (by Deed Machine’s calculation). Used to show following page in Zooniverse if this is a potential covenant needing transcription.
- page_image_web
Type:
ImageFieldPage image web
ImageField that stores a link to web-friendly, watermarked JPEG used for transcription
- page_image_web_highlighted
Type:
ImageFieldPage image web highlighted
ImageField that stores a link to web-friendly, watermarked, and highlighted JPEG used for transcription
- page_num
Type:
IntegerFieldPage num
Page number of original record. Not present for all records or all counties. Note the difference with split_page_num, which is an automatically generated page number used to differentiate pages programmatically split in the Deed Machine process, usually with multi-page TIF file.
- page_ocr_json
Type:
FileFieldPage ocr json
FileField that stores a link to the complete Textract JSON OCR result for this page, including much more metadata and contextural information than page_ocr_text.
- page_ocr_text
Type:
FileFieldPage ocr text
FileField that stores a link to a .txt file containing the concatenated full text of the page extracted by OCR.
- page_stats
Type:
FileFieldPage stats
FileField that stores a link to metadata JSON file generated about the image during the OCR process, including how much is determined to be handwritten.
- prev_page_image_lookup
Type:
CharFieldPrev page image lookup
Same idea as prev_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.
- prev_page_image_web
Type:
ImageFieldPrev page image web
ImageField link to web-friendly image of previous page (by Deed Machine’s calculation). Used to show previous page in Zooniverse if this is a potential covenant needing transcription.
- public_uuid
Type:
CharFieldPublic uuid
The randomly generated UUID created for the web version of this image in order to deter systematically scraping of publicly visible data. Generated by initial processing stage.
- split_page_num
Type:
IntegerFieldSplit page num
Alternate page numbering that happens when a multipage image file (generally a TIF) has been split by the ingestion process. Note the difference to page_num, which is a page number value supplied with the original record in the county’s numbering system.
- property thumbnail_preview
Used to display thumbnail of DeedPage in admin view.
- workflow
Type:
ForeignKeytoZooniverseWorkflowWorkflow (related name:
deedpage)Foreign key to associated workflow
- zooniverse_subject
Type:
ForeignKeytoZooniverseSubjectZooniverse subject (related name:
subject_legacy)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit
- zooniverse_subject_1st_page
Type:
ForeignKeytoZooniverseSubjectZooniverse subject 1st page (related name:
subject_1st_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
- zooniverse_subject_2nd_page
Type:
ForeignKeytoZooniverseSubjectZooniverse subject 2nd page (related name:
subject_2nd_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
- zooniverse_subject_3rd_page
Type:
ForeignKeytoZooniverseSubjectZooniverse subject 3rd page (related name:
subject_3rd_page)After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images
- class apps.deed.models.MatchTerm(*args, **kwargs)
A term found in a DeedPage during covenant detection in the initial processing stage. Could be a term indicating a racial covenant OR an exception, like an indicator of a birth certificate or military discharge.
- Parameters:
id (BigAutoField) – Primary key: ID
term (CharField) – Term
Reverse relationships:
- Parameters:
deedpage (Reverse
ManyToManyFieldfromDeedPage) – All deed pages of this match term (related name ofmatched_terms)
- exception DoesNotExist
- exception MultipleObjectsReturned
- class apps.deed.models.SearchHitReport(id, workflow, report_csv, num_hits, num_exceptions, created_at)
- Parameters:
id (BigAutoField) – Primary key: ID
report_csv (FileField) –
Report csv
FileField linking to CSV stored on s3 showing results of hit ingestion
num_hits (IntegerField) –
Num hits
Number of DeedPage records with at least one hit
num_exceptions (IntegerField) –
Num exceptions
Number of DeedPage records with at least one exception term, which will shield it from transcription.
created_at (DateTimeField) – Created at
Relationship fields:
- Parameters:
workflow (
ForeignKeytoZooniverseWorkflow) –Workflow (related name:
searchhitreport)Foreign key to associated workflow
- exception DoesNotExist
- exception MultipleObjectsReturned
- num_exceptions
Type:
IntegerFieldNum exceptions
Number of DeedPage records with at least one exception term, which will shield it from transcription.
- num_hits
Type:
IntegerFieldNum hits
Number of DeedPage records with at least one hit
- report_csv
Type:
FileFieldReport csv
FileField linking to CSV stored on s3 showing results of hit ingestion
- workflow
Type:
ForeignKeytoZooniverseWorkflowWorkflow (related name:
searchhitreport)Foreign key to associated workflow