What is data collection?
Data capture is the process of getting information together for the computer to process.
Indirect and direct methods of capture
Indirect methods - This involves preparing data into a form that a machine can read before feeding the data into the computer e.g. key to disk.
Direct methods - Because of human errors in the handling of data. Methods have been developed so that the computer can read the information directly without keying in e.g. MICR, OCR, Barcodes, and mark sense.
MICR (magnetic ink character recognition)
In this data capture form, the computer can read certain numbers written in magnetic ink. You can see these numbers written on the bottom of the bank cheques. Without MICR cheques, someone would have to key all the information containers on the cheque into the computer.
OCR (Optical character recognition)
Optical character readers can read characters that are printed in ordinary ink by the amount of light they reflect. Usually, the reader can only be used with characters that are of a particular style or font.
Machines are now available which can read people’s handwriting directly. This is quite an improvement because people’s handwriting styles are very different.
Optical character readers are expensive devices because they are very complex. Before long we will see many more of these machines in use.
Barcodes
When a laser beam of light passes across the barcode. It detects the width of the lines and the computer can understand the information contained in them such as price manufacturers and size.
After grabbing the details, barcode scanners link to a host computer or tablet and transmit that data in real time, without additional human intervention. This helps to automate data collection processes and reduce errors like inventory tracking and processing point-of-sale transactions.
Mark sensing
It involves detecting pencil or ink marks made on a document. Usually, this involves drawing a line or marking a certain area on a document.
Mark sense is used mainly for making multiple choice exam papers, and questionnaires, making up information after electricity or gas meters have been read so that it can be input directly into the computer. Football pool coupons are also read using mark sensing.
Magnetic encoding
If you look at a bank credit card or a card that can be used in a cash dispenser you will notice that it has a dark band on the front. The stripe on the card consists of a magnetic material that can be used to hold information. this is called magnetic encoding.
magnetic encoding is sometimes used on price labels and goods in certain shops. this information contained on the price ticket is read directly into the computer using a wand reader.
ICR(Intelligent character recognition)
intelligent Character Recognition (ICR) also known as Intelligent OCR is a technology used for extracting handwritten text from image files. It is an advanced technology where Machine Learning algorithms and AI intelligently interpret data in forms & physical documents through the recognition of various handwritten styles and fonts.
ICR allows us to quickly read handwritten information on paper and convert it into a digital format. ICR engines work with OCR to automate data capture from forms and eliminate the need for manual keying for entering data. It features a high level of accuracy and is a reliable way to save time in processing a variety of information.
IDR(Intelligent Document Recognition)
Intelligent Document Recognition (IDR) extracts invoice information from the emailed documents to create invoices and then import them into Payables.
Many suppliers and customers choose to send and receive invoices electronically via email or messages. IDR can process these invoices and extract the fields to create invoices in Oracle Payables.
Manual keying
This method is still relevant with certain types of unstructured data where automated capture methods achieve low accuracy levels in volumes that are so low and variable that automation is not justified.
Digital forms web or App
When collecting information from users, which doesn’t exist already, it often makes sense to capture the data through a digital form either on the web, via an intranet page, or through a smartphone app. Digital forms can also be designed to structure the answers and data collected by avoiding too many direct answers. They also can dynamically adapt to responses and prepopulate where information exists already. Ie: address
Web scraping
Since there is now a huge amount of data on the web, web scraping tools, called web bots or crawlers (ie. Google spiders) are used to crawl through web pages and code to collect, analyze, and index specific data. Web scraping is used to capture and monitor many types of web data such as news, updates, prices, contacts, policies, share data, currencies, connected devices, comments, and reviews – basically anything accessible via the web.
Augmented reality
This method of data collection is closely associated with video analysis and involves the real-time processing of camera footage looking for programmed “trigger” objects. If a trigger object is identified, a process is executed for e.g. a display of an overlay graphic, video, or other web data. AR applications are increasing in popularity as the digital and physical worlds get closer ie Google Streetview, Skyview, and Pokemon.
Artificial intelligence and data capture
Artificial Intelligence is ultimately an umbrella term for different artificial intelligence techniques. AI is best viewed in the context of the use case and application. Examples of data capture methods are:
- Computer vision, image, or pattern recognition to improve the recognition of any type of image.
- Neural Networks & Machine learning to assist in accurate recognition training based on large data sets and assisted learning.
- Natural Language Processing for interpreting sentences and their respective meaning
To summarize all that we have written, make sure you watch the video below to understand better: