apps.deed.models.py

class apps.deed.models.DeedPage(*args, **kwargs)

Each DeedPage object represents a single page of a property record that has been uploaded, processed, and ingested into the Django portion of the Deed Machine. Multi-page documents like multi-page TIF files will be split into individual DeedPage objects during initial processing.

Parameters:
  • id (BigAutoField) – Primary key: ID

  • s3_lookup (CharField) –

    S3 lookup

    Unique identifier of this page’s image on S3

  • doc_num (CharField) –

    Doc num

    County document number. Not unique as some documents will have several or many pages. Some county documents don’t have document numbers, but rather are identifed by Book and Page. In those cases, a doc_num is constructed by a book and page combination. Doc num can come from S3 metadata extracted via regex or be manually created and linked by using a supplemental information file at the time of Django import after initial processing.

  • doc_alt_id (CharField) –

    Doc alt id

    An optional alternate document ID.

  • batch_id (CharField) –

    Batch id

    An optional designator of batch, often referring to the parent folder an image was supplied in.

  • book_id (CharField) –

    Book id

    Book ID. Not present for all records or all counties.

  • page_num (IntegerField) –

    Page num

    Page number of original record. Not present for all records or all counties. Note the difference with split_page_num, which is an automatically generated page number used to differentiate pages programmatically split in the Deed Machine process, usually with multi-page TIF file.

  • split_page_num (IntegerField) –

    Split page num

    Alternate page numbering that happens when a multipage image file (generally a TIF) has been split by the ingestion process. Note the difference to page_num, which is a page number value supplied with the original record in the county’s numbering system.

  • doc_date (DateField) –

    Doc date

    Date of document.

  • doc_type (CharField) –

    Doc type

    Document type, e.g. deed, Torrens certificate, mortgage, etc.

  • public_uuid (CharField) –

    Public uuid

    The randomly generated UUID created for the web version of this image in order to deter systematically scraping of publicly visible data. Generated by initial processing stage.

  • page_image_web (ImageField) –

    Page image web

    ImageField that stores a link to web-friendly, watermarked JPEG used for transcription

  • page_stats (FileField) –

    Page stats

    FileField that stores a link to metadata JSON file generated about the image during the OCR process, including how much is determined to be handwritten.

  • page_ocr_text (FileField) –

    Page ocr text

    FileField that stores a link to a .txt file containing the concatenated full text of the page extracted by OCR.

  • page_ocr_json (FileField) –

    Page ocr json

    FileField that stores a link to the complete Textract JSON OCR result for this page, including much more metadata and contextural information than page_ocr_text.

  • bool_match (BooleanField) –

    Bool match

    Is there a suspected covenant?

  • bool_exception (BooleanField) –

    Bool exception

    Was a disqualifying term found that exempts this from transcription? (E.G. a death certificate or military discharge)

  • bool_manual (BooleanField) –

    Bool manual

    Has bool_match or bool_exception been manually overwritten?

  • doc_page_count (IntegerField) –

    Doc page count

    How many pages does the Deed Machine think make up this document? Aids in Zooniverse setup.

  • prev_page_image_web (ImageField) –

    Prev page image web

    ImageField link to web-friendly image of previous page (by Deed Machine’s calculation). Used to show previous page in Zooniverse if this is a potential covenant needing transcription.

  • next_page_image_web (ImageField) –

    Next page image web

    ImageField link to web-friendly image of following page (by Deed Machine’s calculation). Used to show following page in Zooniverse if this is a potential covenant needing transcription.

  • next_next_page_image_web (ImageField) –

    Next next page image web

    ImageField link to web-friendly image of following page of the following page (e.g. this page + 2) (by Deed Machine’s calculation). Used to show following page of the following page in Zooniverse if this is a potential covenant needing transcription.

  • page_image_web_highlighted (ImageField) –

    Page image web highlighted

    ImageField that stores a link to web-friendly, watermarked, and highlighted JPEG used for transcription

  • prev_page_image_lookup (CharField) –

    Prev page image lookup

    Same idea as prev_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

  • next_page_image_lookup (CharField) –

    Next page image lookup

    Same idea as next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

  • next_next_page_image_lookup (CharField) –

    Next next page image lookup

    Same idea as next_next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

Relationship fields:

Parameters:
  • workflow (ForeignKey to ZooniverseWorkflow) –

    Workflow (related name: deedpage)

    Foreign key to associated workflow

  • zooniverse_subject (ForeignKey to ZooniverseSubject) –

    Zooniverse subject (related name: subject_legacy)

    After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit

  • zooniverse_subject_1st_page (ForeignKey to ZooniverseSubject) –

    Zooniverse subject 1st page (related name: subject_1st_page)

    After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

  • zooniverse_subject_2nd_page (ForeignKey to ZooniverseSubject) –

    Zooniverse subject 2nd page (related name: subject_2nd_page)

    After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

  • zooniverse_subject_3rd_page (ForeignKey to ZooniverseSubject) –

    Zooniverse subject 3rd page (related name: subject_3rd_page)

    After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

  • matched_terms (ManyToManyField to MatchTerm) –

    Matched terms (related name: deedpage)

    M2M field storing which potential racial covenant (or exception) terms were matched, if any

exception DoesNotExist
exception MultipleObjectsReturned
batch_id

Type: CharField

Batch id

An optional designator of batch, often referring to the parent folder an image was supplied in.

book_id

Type: CharField

Book id

Book ID. Not present for all records or all counties.

bool_exception

Type: BooleanField

Bool exception

Was a disqualifying term found that exempts this from transcription? (E.G. a death certificate or military discharge)

bool_manual

Type: BooleanField

Bool manual

Has bool_match or bool_exception been manually overwritten?

bool_match

Type: BooleanField

Bool match

Is there a suspected covenant?

doc_alt_id

Type: CharField

Doc alt id

An optional alternate document ID.

doc_date

Type: DateField

Doc date

Date of document.

doc_num

Type: CharField

Doc num

County document number. Not unique as some documents will have several or many pages. Some county documents don’t have document numbers, but rather are identifed by Book and Page. In those cases, a doc_num is constructed by a book and page combination. Doc num can come from S3 metadata extracted via regex or be manually created and linked by using a supplemental information file at the time of Django import after initial processing.

doc_page_count

Type: IntegerField

Doc page count

How many pages does the Deed Machine think make up this document? Aids in Zooniverse setup.

doc_type

Type: CharField

Doc type

Document type, e.g. deed, Torrens certificate, mortgage, etc.

matched_terms

Type: ManyToManyField to MatchTerm

Matched terms (related name: deedpage)

M2M field storing which potential racial covenant (or exception) terms were matched, if any

next_next_page_image_lookup

Type: CharField

Next next page image lookup

Same idea as next_next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

next_next_page_image_web

Type: ImageField

Next next page image web

ImageField link to web-friendly image of following page of the following page (e.g. this page + 2) (by Deed Machine’s calculation). Used to show following page of the following page in Zooniverse if this is a potential covenant needing transcription.

next_page_image_lookup

Type: CharField

Next page image lookup

Same idea as next_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

next_page_image_web

Type: ImageField

Next page image web

ImageField link to web-friendly image of following page (by Deed Machine’s calculation). Used to show following page in Zooniverse if this is a potential covenant needing transcription.

page_image_web

Type: ImageField

Page image web

ImageField that stores a link to web-friendly, watermarked JPEG used for transcription

page_image_web_highlighted

Type: ImageField

Page image web highlighted

ImageField that stores a link to web-friendly, watermarked, and highlighted JPEG used for transcription

page_num

Type: IntegerField

Page num

Page number of original record. Not present for all records or all counties. Note the difference with split_page_num, which is an automatically generated page number used to differentiate pages programmatically split in the Deed Machine process, usually with multi-page TIF file.

page_ocr_json

Type: FileField

Page ocr json

FileField that stores a link to the complete Textract JSON OCR result for this page, including much more metadata and contextural information than page_ocr_text.

page_ocr_text

Type: FileField

Page ocr text

FileField that stores a link to a .txt file containing the concatenated full text of the page extracted by OCR.

page_stats

Type: FileField

Page stats

FileField that stores a link to metadata JSON file generated about the image during the OCR process, including how much is determined to be handwritten.

prev_page_image_lookup

Type: CharField

Prev page image lookup

Same idea as prev_page_image_web, except this is the s3_lookup of the previous page rather than a link to the image.

prev_page_image_web

Type: ImageField

Prev page image web

ImageField link to web-friendly image of previous page (by Deed Machine’s calculation). Used to show previous page in Zooniverse if this is a potential covenant needing transcription.

public_uuid

Type: CharField

Public uuid

The randomly generated UUID created for the web version of this image in order to deter systematically scraping of publicly visible data. Generated by initial processing stage.

s3_lookup

Type: CharField

S3 lookup

Unique identifier of this page’s image on S3

split_page_num

Type: IntegerField

Split page num

Alternate page numbering that happens when a multipage image file (generally a TIF) has been split by the ingestion process. Note the difference to page_num, which is a page number value supplied with the original record in the county’s numbering system.

property thumbnail_preview

Used to display thumbnail of DeedPage in admin view.

workflow

Type: ForeignKey to ZooniverseWorkflow

Workflow (related name: deedpage)

Foreign key to associated workflow

zooniverse_subject

Type: ForeignKey to ZooniverseSubject

Zooniverse subject (related name: subject_legacy)

After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit

zooniverse_subject_1st_page

Type: ForeignKey to ZooniverseSubject

Zooniverse subject 1st page (related name: subject_1st_page)

After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

zooniverse_subject_2nd_page

Type: ForeignKey to ZooniverseSubject

Zooniverse subject 2nd page (related name: subject_2nd_page)

After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

zooniverse_subject_3rd_page

Type: ForeignKey to ZooniverseSubject

Zooniverse subject 3rd page (related name: subject_3rd_page)

After post-Zooniverse ingestion of subjects, used to join to the matching subject for this if it’s a hit. Assists in giving access to DeedPage record for editors in the manual correction process. This is assuming that each page can only be in a given position for a single Zooniverse Subject, even though the same page could be part of each Zooniverse Subject’s prev/next images

class apps.deed.models.MatchTerm(*args, **kwargs)

A term found in a DeedPage during covenant detection in the initial processing stage. Could be a term indicating a racial covenant OR an exception, like an indicator of a birth certificate or military discharge.

Parameters:

Reverse relationships:

Parameters:

deedpage (Reverse ManyToManyField from DeedPage) – All deed pages of this match term (related name of matched_terms)

exception DoesNotExist
exception MultipleObjectsReturned
class apps.deed.models.SearchHitReport(id, workflow, report_csv, num_hits, num_exceptions, created_at)
Parameters:
  • id (BigAutoField) – Primary key: ID

  • report_csv (FileField) –

    Report csv

    FileField linking to CSV stored on s3 showing results of hit ingestion

  • num_hits (IntegerField) –

    Num hits

    Number of DeedPage records with at least one hit

  • num_exceptions (IntegerField) –

    Num exceptions

    Number of DeedPage records with at least one exception term, which will shield it from transcription.

  • created_at (DateTimeField) – Created at

Relationship fields:

Parameters:

workflow (ForeignKey to ZooniverseWorkflow) –

Workflow (related name: searchhitreport)

Foreign key to associated workflow

exception DoesNotExist
exception MultipleObjectsReturned
num_exceptions

Type: IntegerField

Num exceptions

Number of DeedPage records with at least one exception term, which will shield it from transcription.

num_hits

Type: IntegerField

Num hits

Number of DeedPage records with at least one hit

report_csv

Type: FileField

Report csv

FileField linking to CSV stored on s3 showing results of hit ingestion

workflow

Type: ForeignKey to ZooniverseWorkflow

Workflow (related name: searchhitreport)

Foreign key to associated workflow