Aug 3, 2022

Big Data Analytic Tool and Comparisons

 Most effective in analyzing big data

Big data analysis is a crucial part of business analysis and business intelligence; therefore, one of the biggest concerns in choosing a big data analyzer is related to the benefits of a business system. A business intelligence analytics needs the following features and categories to analyze. (crm.org) (1) Deceptive analytics to query for what something is. (2) Diagnostic analytics that aims to query why something happened. (3) Predictive analytics that tells us what will happen in the future and forecasting. (4) Prescriptive analytics, which aims to what we should do.

Another essential pre-knowledge to choosing the best fit for data analytics is the core features of data management architecture. The core features of data analytics include data preparation, data mining, modeling, discovery, warehousing, data processing, data integration, and data transformation. These core features of big data analytics lead us to better understanding and a bigger picture of Volume, Variety, Velocity, Veracity, and Value.

Here are the pros and cons of the top data analytics in trend. (Baig, 2019)

Field
Data Analytic Tool
Pros

Cons
Reference
Data Storage
Hadoop with HDFS
- High bandwidth
- High scalability
- Write once and read many

- Cluster is difficault
- Join op is slow

(Lee, 2011)

Hbase
- Highly flexible
- Consistent
- Fault tolerate

- Not good for complicated applications
(Bakshi, 2011)
Data Processing
Hadoop
- Process huge volume of data very easily

- Hard to install
- Hard to organize
- Needs expert
(Mukherjee, 2011)


MapReduce
- Support Java Lang,
- Process independently

- it's just for batch-oriented processes
(Moon, 2014)

YARN
- Efficient maintain resources
- Continuety
- Scalability of process



(Ranjan, 2014)
Data Access
Pig
- Ensures the originality of data by decreasing replication and coding line
- Fast read/write operatoins

- No web interface
- No JDBC & ODBC network support
(Herodotou, 2011)

Hive
- Data accessibility
- Good loading
and querying interface
- Direct extracting data
- Can incorporate with Hbase

- Not support unstructured data set
- Not support complicated tasks
(Dhyani, 2014)

Cassandra 
- High throughout and efficient response time
- Supports ASID property


- Not support joint operations
- Not support sub-queries
- Limited storage space

(Abramova, 2013)

Mahout
- Support different data mining
- Support patterns
- Support huge volume of data

- Not decision tree algorithm
(Condie, 2013)

Jaql
- Support semi-structured data
- Support physical transparency

- Needs consistent format in select statement query and transform operation
(Rathee, 2013)
Data Management
Zookeeper
- Highly reliable
- Offers atomicity
- Offers synchronization
- Ensures availability of data

- Multiple stacks maintenance needed
(Fan, 2013)

Oozie
- Support execution of workflow in case of error
- Include web API services

- Not good for off-grid development
(Islam, 2012)



Reference

Abramova, V., & Bernardino, J. (2013). NoSQL databases: MongoDB vs cassandra. In Proceedings of the international conference on computer science and software engineering (pp. 14-22). ACM.

Baig, M. I., Shuib, L., & Yadegaridehkordi, E. (2019). Big data tools: Advantages and disadvantages. Journal of Soft Computing and Decision Support Systems, 6(6), 14-20. 

Bakshi, K. (2012). Considerations for big data: Architecture and approach. In 2012 Aerospace Conference (pp. 1-7). IEEE.

Condie, T., Mineiro, P., Polyzotis, N., & Weimer, M. (2013). Machine learning on big data. In 29th International Conference on Data Engineering (ICDE) (pp. 1242-1244). IEEE. 

crm.org. (2021, December 13). Top 15 Best Data Analytics Tools & Software Comparison 2022. CRM.Org. Retrieved 2022, from https://crm.org/news/best-data-analytics-tools

Islam, M., Huang, A. K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., & Abdelnur, A.(2012). Oozie: Towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (p. 4). ACM.

Fan, W., & Bifet, A. (2013). Mining big data: current status and forecast to the future. ACM sIGKDD Explorations Newsletter, 14(2), 1-5. 

Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., & Babu, S. (2011). Starfish: A Self-tuning System for Big Data Analytics. In Cidr, 11(2), 261-272.

Lee, Y., Kang, W., & Lee, Y. (2011). A Hadoop-based packet trace processing tool. In International Workshop on Traffic Monitoring and Analysis (pp. 51-63). Springer, Berlin, Heidelberg. 

Moon, S., Lee, J., & Kee, Y. S. (2014). Introducing ssds to the hadoop mapreduce tool. In 7th International Conference on Cloud Computing (pp. 272-279). IEEE.

Mukherjee, A., Datta, J., Jorapur, R., Singhvi, R., Haloi, S., & Akram, W. (2012). Shared disk big data analytics with apache hadoop. In 19th International Conference on High Performance Computing (pp. 1-6). IEEE.

Ranjan, R. (2014). Streaming big data processing in datacenter clouds. IEEE Cloud Computing, 21(1), 78-83.

Rathee, S. (2013). Big data and Hadoop with components like Flume, Pig, Hive and Jaql. In International conference on cloud, big data and trust (Vol. 15). 

Dhyani, B., & Barthwal, A. (2014). Big data analytics using Hadoop. International Journal of Computer Applications, 108(12), 265-270.

No comments:

Post a Comment

Big Data migrates to hybrid and multi-cloud environment

 IDC research predicts that the Global Datasphere will grow to 175 Zettabytes by 2025, and China's data sphere is on pace to become th...