Friday, January 11, 2019

Java annotation – create/populate a bean object using annotation and reflection API


Java Annotation:


Annotations is a form of metadata for compiler/runtime environment. It provides data about a program. But it not a part of the program itself. Annotations do not have any impact on the operation of the code they annotate.

Annotations usage:

1)      Information for the compiler — Compilers can process annotation to detect error or as a hint to do some processing. E.g.

@Override method() indicates that the method() is overriding the implementation from the parent method(). Any mistake in signature can be detected by the compiler.

2)      Compile-time and deployment-time processing — A software tool can process annotation to generate new code, xml etc. E.g.

3)      Runtime processing — Annotation can be used at runtime to do meta-data based processing. E.g. getter/setter methods can be invoked on an object using annotations.


package com.bigdatacoder.examples;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.FIELD) // can use in fields only.
public @interface MapElement {

   
public String id();
}

Differentiate between Structured, Semi structured and Unstructured data


Structured Data: A dataset which has can fit into pre-defined row and column format. 

E.g. data stored in relational databases, like Oracle, MYSQL etc.
Example: An employee dataset in structured format

emp_id
emp_name
emp_age
emp_sal
emp_address_id
1
Michael Conway
45
250000
ADD1
2
Sameer P
33
150000
ADD2
3
Arya
18
85000
ADD3

Structured data can be linked to other structured dataset using relational keys. E.g emp_address_id might relate another address table

emp_address_id
addr_1
addr_2
city
state
zip
ADD1
9232 abc rd
Apt A222
Marlborogh
MA
02345
ADD2
40098 main st

Boston
MA
06789

Semi structured: Semi structure data cannot be represented in simple row and column format, but they do have a format which makes it easy to process. 

Example of such data structure is XML/JSON.

Example: The above Employee dataset in semi-structured JSON format

{
   {
"emp_id": 1,
"emp_name": "Michael Conway",
"emp_age": "45",
"emp_sal":"25000",
"emp_address":
{
"emp_address_id": 123
}
}
}

Please note that the structure is not as simple as structured dataset, still it can be analyzed. Using certain library (Java code to convert JSON-CSV TODO:create a tutorial and link it) tools (Chrome extension JSON-CSV converter TODO:create a tutorial and link it), the unstructured dataset could be converted into structured dataset.

Unstructured dataset: Any dataset which does not fall into the above categories. Almost 80% of the current dataset around world is unstructured. Unstructured data include raw text, multimedia content, audio/video, images and even your favorite novel/book etc.  While it might seem an internal structure within the data (like a book has indices and chapters), it is does not closely fit into the concept of pre-defined data format.

Example: the above employee data could be part of unstructured text data
The xyz Inc is a new startup company and its growing fast. We had a chance to interview its founder and CEO Michael Conway (Age: 45) at his residence 9293 abc rd, Apt A222, Marlborough, MA, 0234. Michael is busy in strategizing the business plan for the company.

Analyzing and processing unstructured data can be tedious due to potential incomplete, erroneous and non-useful data present. Given the fact that most of the data is unstructured, certain big data tools and ecosystem (like, Hadoop, spark) provides standard way to process it.