Emerging Trends and Events Shaping Web Development Community: 2012

Saturday, June 23, 2012

Testing Tools in SOA and Cloud Era

Traditionally when an organization or project team talks about quality, the focus is about functional testing, regression testing & load/stress testing and the market place had been dominated by few players like IBM, Micro Focus and HP/Mercury supplying the necessary tools to meet customers demand. But recent technological advancement in the area of SOA/ESB, cloud and Business Process Management (BPM) has redefined the quality management market and many new players are trying to unsettle the so called leaders in this market space.

LISA

Now a days organizations are looking for tools to not only to improve the software quality but also to improve their productivity in quality related activities which eventually leads to reduction in the time to market as a way to increase their business bottom line. Also the competition in the quality market space are heating up with the advent of new players like iTKO's LISA, Parasoft's SOATest, Crosscheck Networks SOAPSonar etc., at times when the most of the applications that being developed are shifting from client/server or vanilla Web applications to SOA based, which consists of Web Services/ESB and cloud based, which combine services running internally and services deployed on cloud-provided services.

At a macro level, few open source tools are also playing a vital role such as:

soapUI, a complete and automated testing solution which provides support from SOAP- and REST-based Web services, to JMS enterprise messaging layers, databases, Rich Internet Applications, and much more.
WebInject(Perl based) tool for automated testing of web applications and web services.
TestMaker offers an easier way to expose the performance bottlenecks, functional issues in Web, Rich Internet Applications (RIA using Ajax, Flex) SOA and BPM applications.

SOAtest

With quality management area still very volatile as growing number of vendors continues to enter the market on one side and vendor consolidation such as acquisition of iTKO by CA Technologies and Green Hat by IBM happening at the other side, the coming years will be very interesting for organizations, developer and QA team involved in SOA, BMP and Cloud application developments.

Monday, June 4, 2012

Hadoop in Big Data Era

A generic processing framework designed to execute queries and batch read operations against massive datasets, across clusters of computers, which facilitate the organizations to scans through tons of data (which are first loaded into the Hadoop Distributed File System - HDFS), and produce results that are meaning to the them. Simply put, Hadoop is the key open source technology that provides a Big Data Engine.
Hadoop operates on massive datasets by horizontally scaling the processing across very large numbers of servers through an approach called MapReduce and not by vertical scaling which requires powerful single server to process the huge data in a timely manner.

Hundreds or thousands of small, inexpensive, commodity servers do have the power if the processing can be horizontally scaled and executed in parallel. Using the MapReduce approach, Hadoop splits up a problem, sends the sub-problems to different servers, and lets each server solve its sub-problem in parallel. It then merges all the sub-problem solutions together and writes out the solution into files which may in turn be used as inputs into additional MapReduce steps.
Although Hadoop provides a platform for data storage and parallel processing, the real value comes from add-ons subprojects (ZooKeeper, Pig, Hive, Lucene, HBase, etc), which adds functionality and new capabilities to the platform. Most implementations of a Hadoop platform will include at least some of these subprojects, for example an organization will choose HDFS as the primary distributed file system and HBase as the database to store billions of rows of data and MapReduce as the framework for distributed processing.

A number of companies are emerging with the different plans to help the organization in using Hadoop by extending support or by providing professional services or by producing tools that work along with Hadoop and make it easier to use or by providing a complete platform (based on Hadoop) that addresses many of the enterprise needs. It is worthwhile to look at few of the players in this segment

InfoSphere BigInsights	IBM took the open source Big Data technology - Hadoop and extended it into an enterprise ready Big Data platform. IBM delivers a Hadoop platform that is hardened for enterprise use with deep consideration for high availability, scalability, performance, ease-of-use and other things one normally expect out of solution to be deployed in production environment. Also InfoSphere BigInsights flatten the time-to-value curve associated with Big Data analytics by providing the development and runtime environments for developers to build advanced analytical applications and providing tools for business users to analyze the data.
Cloudera CDH (Cloudera's Distribution for Hadoop)	Cloudera delivers an integrated Apache Hadoop-based stack containing all the components needed for production use, tested and packaged to work together. It incorporates only software from open source projects – no forks or proprietary underpinnings and comes with Cloudera Manager which is a end-to-end management application for Apache Hadoop that includes revolutionary features such as proactive health checks and intelligent log management
M5	MapR’s M5 make Hadoop more reliable (provides full data protection, no single points of failure), more affordable, more manageable (improved performance) and significantly easier to use.

To put in perceptive, Hadoop should never be considered as replacement of relational databases or data ware housing, but something that will coexist and complement the traditional data store to provide richer capabilities to the organization. While traditional ware houses are ideal for analyzing structured data from various systems, the sheer magnitude of unstructured and semi structured data involved makes it very sensible to use the cheap cycles of server farms to transform masses of unstructured data with low information density into smaller amounts of dense structured data that is then loaded into traditional database for further analysis.

To conclude, open source Hadoop offers a great deal of potential for enterprises to harness the data (structured, semi structured or has no structure at all) that was until now difficult to manage and analyze. Hadoop is also gaining wider acceptance with vendors who are coming out with various Hadoop-based stack to significantly provide a better user experience.

Friday, May 25, 2012

NoSQL Database in Big Data Era

For quite some time, the on-premises relational database has been the undisputed model for database management in most organizations. But, now options such as, cloud based and NoSQL databases are gaining popularity as an alternative model. Some of the reasons for enterprise to look at NoSQLs are: First the volumes of data that are being stored now-a-days have increased massively and managing the same by a single relational database management system are becoming reasonably impossible for the enterprises. This situation has leads the enterprise to look at various (horizontal or vertical) database scaling options. Horizontal scaling forces enterprise has to go in for high investment mode to buy powerful hardware where else vertical scaling involves usage of inexpensive commodity hardware. As traditional relational database do not scale out easily on commodity clusters, as the NoSQL databases which are usually designed with low-cost commodity hardware in mind, the enterprises to realize the economic benefits are moving towards horizontal scaling based on NoSQL database system to meet the demand for powerful data processing system. And finally, change management is becoming a big challenge in large production relational database systems and realization that NoSQL tends to impose comparatively very less data model restrictions and along with the fact that NoSQL uses Key Value store which allows application to store virtually any structure it wants in a data element is compelling enterprise to look at alternate options to relational database system.

Even though NoSQL database seems to be promising, factors like product maturity, availability of expertise in the market and enterprise support offered by vendors are hindering the adoptions in many mainstream enterprises. It is worthwhile to look at few of the products in this segment.

FlockDB	An open source distributed, fault-tolerant graph database (data store made up of network of nodes and stores relationships between nodes). Promoted by Twitter to build its database of users and manage their relationships to one another. Simpler when compared to other powerful graph databases such as Neo4J, but has not gained major acceptance in the community as of now.
Cassandra	An open source distributed database system based on key-value store. Promoted by Facebook as a good tool for tracking large amount of data across network of computers, such as status updates in Facebook is build upon with a principle of "eventual consistency", and not, "perfect consistency". Well accepted in the community and many companies, including Acunu and Datastax offers commercial support for Cassandra.
CouchDB	CouchDB stores documents, which is made up of a set of pairs that link key with a value. It searches for documents with two functions to map and reduce the data. One formats the document, and the other makes a decision about what to include. Provides interfaces based on JavaScript/JSON. Enjoys rapidly increasing community support.
MongoDB	An open source database system. MongoDB stores structured data as JSON-like documents with dynamic schemas and allows direct data queries using JavaScript.
Riak	Riak is one of the most sophisticated data stores, which comes as open source and enterprise variant. Build on the principles of "Eventual consistency is no excuse for losing data", it offers most of the features found in others, and adds more control over duplication. Although the basic structure stores pairs of keys and values, the options for retrieving them and guaranteeing their consistency are quite rich.
SimpleDB	A highly available and flexible non-relational data store, promoted by Amazon. Build on the principles of "Eventual consistency”. Provides a simple web services interface to create, store multiple data sets and to query data.
HBase	An open source, distributed, column-oriented store, modeled after Google's BigTable. Written in Java and promoted by Apache, runs on top of Hadoop Distributed Filesystem – HDFS and provides BigTable-like capabilities for Hadoop. HBase is not an "eventually consistent" data store and is designed to be strongly consistent, even if that consistency causes availability problems. Provides programmatic access through Java API, Thrift and REST.

Monday, April 30, 2012

The new frontier in Business Analytics: Big Data

Most of the organizations have access to tons and tons of information, but they don’t know how to get value out of it because most of the information is either in unstructured or semi structured format. But most of these organizations who witness the data explosion around them have also started to realize that analyzing this large data sets (or Big Data) will become a key basis of competition and growth, sooner or later.

Now-a-days, every organization has taken the first step in data analytics, having data warehouses in place. The traditional data warehouses are mostly for analyzing structured data from various systems and producing insights with known and relatively stable measurements. What gets unaddressed in the traditional solution is the unstructured or semi-structured data, which forms more than 70% of data generated in an organization such as email communication, server logs, customer service call logs, status updates in social media sites etc...

So what characteristics should the data sets have to qualify itself as Big Data? According to IBM, Big Data has three important characteristics: Volume, Variety and Velocity, the amount of data involved, the types of data and speed at which these are getting generated for the analysis.

Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. There are many and to begin with, let us get familiar with NoSQL and Hadoop, which is synonymous with Big Data in my subsequent blog.