All example are based on Angular2 RC 1
HTTP and Routing - Get in HTTP
HTTP and Routing - Basic Routing
Other Post in Series:
All example are based on Angular2 RC 1
Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure.
The important innovation of MapReduce is the ability to take a query over a dataset, divide it, and run in parallel over multiple nodes. Distributation the computation solves the issue of data too large to fit onto a single machine. Combine this technique with commodity Linux server and you have a cost-effective alternative to massive computing arrays.
Programming Hadoop at the MapReduce level is a case of working with the Java APIs, and manually loading data files into HDFS (Hadoop Distributed File System).
Hadoop offers two solutions for making Hadoop programming easier.
Pis is a programming language that simplifies the common tasks of working with Hadoop: loading data, expressing transformations on the data, and storing the final results.
Hive enables Hadoop to operate as a data warehouse. It superimposes structure on data in HDFS and then permits queries over the data using a familiar SQL-like syntax. As with Pig, Hive's core capabilities are extensible.
Choosing between Hive and Pig can be confusing. Hive is more suitable for data warehousing tasks, with predominatly static structure and the need for frequent analysis. Hive's closeness to SQL makes it an ideal point of integration between Hadoop and other business intelligence tools.
Pig gives the developer more agility for the exploration of large datasets, allowing the development of succinct scripts for transforming data flows for incorporation into larger applications.
Improved interoperability with the rest of the data world is provided by Sqoop and Flume. Sqoop is a tool designed to import data from relational databases into Hadoop, either directly into HDFS or into Hive. Flume is designed to import streaming flows of log data directly into HDFS.
Hive's SQL friendliness means that it can be used as a point of integration with vast universe of database tools capable of making connections via JBDC or ODBC database drivers.
As cmputing nodes can come and go, members of the cluster need to synchronize with each other, know where to access services, and know how they should be configured. This is the purpose of Zookeper
The Oozie component provides features to manage the workflow and dependencies, removing the need for developers to code custom solutions.
Ambari is intended to help system administrators deploy and configure Hadoop, upgrade clusters, and monitor services. Through and API, it may be integrated with other system management tools.
Whirr is a highly complementary componentary component. It offers a way of running services, including Hadoop, on cloude pltforms. Ehirr is cloud neutral and currently supports. Whirr is cloud neutral and currently supports the Amazon EC2 and Rackspace services.
Every organization's data are diverse and particular to their needs. However, there is much less diversity in the kinds of analyses performes on the data. The Mahout project is a library of Hadoop implementations of common analytical computations. Use cases include user collaborative filtering, user recommendations, clustering, and classification.
Big data is data that exceeds the processing capacity of conventional database systems.
The value of big data to an organization falls into two categories: analytical use and enabling new product.
Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, sattellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of goverments, GPS trails, telemetru from automobiles, financial market data, the list goes on.
To clarify matters, the three V's of Volume
, Velocity
and Variety
are commonly used to characterize different aspects of big data.
The benifit gained from the ability to process large amount of information is the main attraction of big data analytics.
Many companies already have large amount of archived data, perhaps in the form of logs, but not the capacity to process it.
It's not just the velocity of the incoming data that's the issue: it's possible to stream fast-moving data into bulk storage for later batch processing.
Rarely does data present itself in a form perfectly ordered and ready for processing. A common theme in big data systems is that the source data is diverse, and doesn't fall into neat relational structures. It could be text from social networks, image data, a raw feed directly from a sensor source. None of these things come ready for integration into an application.
Our windows service will support two modes
There are many methods to Secure your api but two are most widely used. Token-based authentication
and OAuth 2 + SSL
For most APIs, I prefer a simple token-based authentication, where the token is a random hash assigned to the user and they can reset it at any point if it has been stolen. Allow the token to be passed in through POST or an HTTP header.
Another very good option is OAuth 2 + SSL. You should be using SSL anyway, but OAuth 2 is reasonably simple to implement on the server side, and libraries are available for many common programming languages.
Here are some other important things to keep in mind:
/user/delete/{id}
. If it doesn’t, then send back an error message such as a 406 Not Acceptable response.
Let's say you have a api http://niisar.com/api/friendlist
and it response JSON Data. This seems fine at first. But what happen when you need to modify the format of JSON? Everyone that’s already integrated with you is going to break. Oops.
So do some planning ahead, and version your API from the outset, explicitly incorporating a version number into the URL like http://niisar.com/api/v1/friendlist
so that people rely on v1 of API.
Also use inheritance or a shared architecture to reuse the same naming conventions and data handling consistently throughout your API.
Finally, you need to record and publish a changelog to show differences between versions of your API so that users know exactly how to upgrade.
Documentation may be boring but if you want anyone to use your API, documentation is essential. You’ve simply got to get this right. It’s the first thing users will see, so in some ways it’s like the gift wrap. Present well, and people are more likely to use your API.
Fortunately, there are number of software tools that facilitate and simplify the task of generating documentation. Or you can write something yourself for your API
But what separates great documentation from adequate documentation is the inclusion of usage examples and, ideally, tutorials. This is what helps the user understand your API and where to start. It orients them and helps them load your API into their brain.
Make sure that API can get up and running with at least a basic implementation of your API, even if it’s just following a tutorial, within a few minutes. I think 15 minutes is a good goal.
You can use CSS to change the appearance of your web page when it's printed on a paper. You can specify one font for the screen version and another for the print version.
You just need to press Ctrl + P
to print or call Print function from javascript window.print();
Both are same thing.
The css for printing looks like