Background: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.

Results: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.

Conclusions: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.

Methods: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040529PMC
http://dx.doi.org/10.1186/1471-2105-11-S12-S3DOI Listing

Publication Analysis

Top Keywords

hybrid cloud
12
cloud cluster
8
data intensive
8
data analysis
8
open source
8
iterative mapreduce
8
amazon azure
8
life sciences
8
mapreduce
7
applications
5

Similar Publications

Addressing the issue of excessive manual intervention in discharging fermented grains from underground tanks in traditional brewing technology, this paper proposes an intelligent grains-out strategy based on a multi-degree-of-freedom hybrid robot. The robot's structure and control system are introduced, along with analyses of kinematics solutions for its parallel components and end-effector speeds. According to its structural characteristics and working conditions, a visual-perception-based motion control method of discharging fermented grains is determined.

View Article and Find Full Text PDF

Sensor networks generate vast amounts of data in real-time, which challenges existing predictive maintenance frameworks due to high latency, energy consumption, and bandwidth requirements. This research addresses these limitations by proposing an edge-cloud hybrid framework, leveraging edge devices for immediate anomaly detection and cloud servers for in-depth failure prediction. A K-Nearest Neighbors (KNNs) model is deployed on edge devices to detect anomalies in real-time, reducing the need for continuous data transfer to the cloud.

View Article and Find Full Text PDF

State-of-the-Art Trends in Data Compression: COMPROMISE Case Study.

Entropy (Basel)

November 2024

Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška cesta 46, SI-2000 Maribor, Slovenia.

After a boom that coincided with the advent of the internet, digital cameras, digital video and audio storage and playback devices, the research on data compression has rested on its laurels for a quarter of a century. Domain-dependent lossy algorithms of the time, such as JPEG, AVC, MP3 and others, achieved remarkable compression ratios and encoding and decoding speeds with acceptable data quality, which has kept them in common use to this day. However, recent computing paradigms such as cloud computing, edge computing, the Internet of Things (IoT), and digital preservation have gradually posed new challenges, and, as a consequence, development trends in data compression are focusing on concepts that were not previously in the spotlight.

View Article and Find Full Text PDF

The maximum power delivered by a photovoltaic system is greatly influenced by atmospheric conditions such as irradiation and temperature and by surrounding objects like trees, raindrops, tall buildings, animal droppings, and clouds. The partial shading caused by these surrounding objects and the rapidly changing atmospheric parameters make maximum power point tracking (MPPT) challenging. This paper proposes a hybrid MPPT algorithm that combines the benefits of the salp swarm algorithm (SSA) and hill climbing (HC) techniques.

View Article and Find Full Text PDF

A hybrid AI based framework for enhancing security in satellite based IoT networks using high performance computing architecture.

Sci Rep

December 2024

Computer Engineering Department, UET Taxila, Rawalpindi, Punjab, 47050, Pakistan.

IoT device security has become a major concern as a result of the rapid expansion of the Internet of Things (IoT) and the growing adoption of cloud computing for central monitoring and management. In order to provide centrally managed services each IoT device have to connect to their respective High-Performance Computing (HPC) clouds. The ever increasing deployment of Internet of Things (IoT) devices linked to HPC clouds use various medium such as wired and wireless.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!