Pydoop a Python interface for Apache Hadoop
Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due to the overwhelming popularity of Python across all domains, most notably scientific computing, it is highly desirable to bring its rich toolset to the Hadoop environment.
Pydoop is a Python interface for Apache Hadoop, which covers both HDFS access and MapReduce job submission.
- simple to use;
- compatible with most existing Python libraries, including SciPy and NumPy (it’s built as a CPython extension).
Anyone that needs to process huge amounts of data in Python.
Distributed computing - scientific computing - big data analysis.