Pydoop a Python interface for Apache Hadoop



Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due to the overwhelming popularity of Python across all domains, most notably scientific computing, it is highly desirable to bring its rich toolset to the Hadoop environment.


Pydoop is a Python interface for Apache Hadoop, which covers both HDFS access and MapReduce job submission.

Innovative features

  • simple to use;
  • compatible with most existing Python libraries, including SciPy and NumPy (it’s built as a CPython extension).

Potential users

Anyone that needs to process huge amounts of data in Python.

Impact sectors

Distributed computing - scientific computing - big data analysis.

Other resources

  2. S. Leo, G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop. Proceeding HPDC '10, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. Pages 819-825 Chicago, Illinois - June 21 - 25, 2010.

Questo sito utilizza cookie tecnici e assimilati. Possono essere presenti anche cookie profilazione di terze parti. Se vuoi saperne di più o negare il consenso a tutti o ad alcuni cookie leggi l'informativa completa. Proseguendo nella navigazione (anche con il semplice scrolling) acconsenti all'uso dei cookie. This site uses technical and anonymized analytics cookies only. There may also be profiling third-party cookies. Please read the cookie information page to learn more about how we use cookies or blocking them. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.