Data scraping

Recovering summary information from an invoice print run for indexing or web publication is a typical application for EscapeE sophisticated data extraction features. In our example, some of the data we want to recover, like the "name and address block", is in the same place on every page. The position of other data elements might, for example, depend on the number of line items on the invoice.

Click on the "view" links to follow the mark-up process.

  1. Sweep the mouse through the name and address field. This block is always in the same position on every page.
  2. The location of the "Net total" depends on the number of line items on the invoice. Its position is always marked by the caption "Invoice Net total:". Sweep out an area where the caption could occur and define this as a "Search Tag" called "NETTAG" with the parameter "Invoice Net total:". The red bracket indicates that the search tag parameter was found.
  3. Lastly mark-up the "NET" field and make the reference field "NETTAG". This means the field is always located in a fixed position relative to the caption "Invoice Net total:"
Repeat the SEARCH and REFERENCE process for other important fields like "Account reference", "Invoice number" and "Invoice date" and then choose the appropriate export format...


Typical invoice
Copyright © RedTitan Technology Limited 2003.