{"id":5272,"date":"2024-07-28T23:37:41","date_gmt":"2024-07-28T21:37:41","guid":{"rendered":"https:\/\/nwww.crs4.it\/?p=5272"},"modified":"2025-04-29T16:12:55","modified_gmt":"2025-04-29T14:12:55","slug":"pydoop","status":"publish","type":"post","link":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/","title":{"rendered":"Pydoop"},"content":{"rendered":"<p><script src=\"\/crs4_js\/people-details.js\"><\/script><\/p>\n<h3>Pydoop a Python interface for Apache Hadoop<\/h3>\n<div class=\"sm_hr\"><\/div>\n<h4>Contacts<\/h4>\n<div><a href=\"javascript:PeopleDetails.showAuthorDetails('31')\">Simone Leo<\/a>, Gianluigi Zanetti. E-mail:\u00a0<a class=\"linkurl\" href=\"mailto:valorisation@crs4.it\">valorisation@crs4.it<\/a><\/div>\n<h4>Challenge<\/h4>\n<p>Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due to the overwhelming popularity of Python across all domains, most notably scientific computing, it is highly desirable to bring its rich toolset to the Hadoop environment.<\/p>\n<h4>Overview<\/h4>\n<p>Pydoop is a Python interface for Apache Hadoop, which covers both HDFS access and MapReduce job submission.<\/p>\n<h4>Innovative features<\/h4>\n<ul>\n<li>simple to use;<\/li>\n<li>compatible with most existing Python libraries, including SciPy and NumPy (it\u2019s built as a CPython extension).<\/li>\n<\/ul>\n<h4>Potential users<\/h4>\n<p>Anyone that needs to process huge amounts of data in Python.<\/p>\n<h4>Impact sectors<\/h4>\n<p>Distributed computing &#8211; scientific computing &#8211; big data analysis.<\/p>\n<h4>Other resources<\/h4>\n<ol>\n<li><a href=\"https:\/\/crs4.github.io\/pydoop\/\">https:\/\/crs4.github.io\/pydoop\/<\/a><\/li>\n<li><a href=\"https:\/\/dl.acm.org\/citation.cfm?id=1851594\" class=\"broken_link\">S. Leo, G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop. Proceeding HPDC &#8217;10, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. Pages 819-825 Chicago, Illinois &#8211; June 21 &#8211; 25, 2010.<\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Pydoop a Python interface for Apache Hadoop Contacts Simone Leo, Gianluigi Zanetti. E-mail:\u00a0valorisation@crs4.it Challenge Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[93,90],"tags":[],"class_list":["post-5272","post","type-post","status-publish","format-standard","hentry","category-life-sciences","category-technology-catalogue"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Pydoop - CRS4<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Pydoop - CRS4\" \/>\n<meta property=\"og:description\" content=\"Pydoop a Python interface for Apache Hadoop Contacts Simone Leo, Gianluigi Zanetti. E-mail:\u00a0valorisation@crs4.it Challenge Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\" \/>\n<meta property=\"og:site_name\" content=\"CRS4\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/pages\/CRS4\/153623948010688\" \/>\n<meta property=\"article:published_time\" content=\"2024-07-28T21:37:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-29T14:12:55+00:00\" \/>\n<meta name=\"author\" content=\"Paolo Sirigu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\"},\"author\":{\"name\":\"Paolo Sirigu\",\"@id\":\"https:\/\/www.crs4.it\/en\/#\/schema\/person\/d6d18aa42b5f98236124cab354b7f22f\"},\"headline\":\"Pydoop\",\"datePublished\":\"2024-07-28T21:37:41+00:00\",\"dateModified\":\"2025-04-29T14:12:55+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\"},\"wordCount\":190,\"publisher\":{\"@id\":\"https:\/\/www.crs4.it\/en\/#organization\"},\"articleSection\":[\"life sciences\",\"Technology catalogue\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\",\"url\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\",\"name\":\"Pydoop - CRS4\",\"isPartOf\":{\"@id\":\"https:\/\/www.crs4.it\/en\/#website\"},\"datePublished\":\"2024-07-28T21:37:41+00:00\",\"dateModified\":\"2025-04-29T14:12:55+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.crs4.it\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Pydoop\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.crs4.it\/en\/#website\",\"url\":\"https:\/\/www.crs4.it\/en\/\",\"name\":\"CRS4\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.crs4.it\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.crs4.it\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.crs4.it\/en\/#organization\",\"name\":\"CRS4\",\"url\":\"https:\/\/www.crs4.it\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.crs4.it\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.crs4.it\/wp-content\/uploads\/CRS4.trentennale_3.png\",\"contentUrl\":\"https:\/\/www.crs4.it\/wp-content\/uploads\/CRS4.trentennale_3.png\",\"width\":1518,\"height\":577,\"caption\":\"CRS4\"},\"image\":{\"@id\":\"https:\/\/www.crs4.it\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/pages\/CRS4\/153623948010688\",\"https:\/\/www.instagram.com\/crs4.it\/\",\"https:\/\/www.youtube.com\/CRS4video\",\"https:\/\/www.linkedin.com\/company\/crs4\",\"https:\/\/www.slideshare.net\/CRS4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.crs4.it\/en\/#\/schema\/person\/d6d18aa42b5f98236124cab354b7f22f\",\"name\":\"Paolo Sirigu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.crs4.it\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b8b44484d86fad28cb7ed89c8cf7ca1057f60adcf3113c1a0f24d057dbf8005d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b8b44484d86fad28cb7ed89c8cf7ca1057f60adcf3113c1a0f24d057dbf8005d?s=96&d=mm&r=g\",\"caption\":\"Paolo Sirigu\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Pydoop - CRS4","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/","og_locale":"en_US","og_type":"article","og_title":"Pydoop - CRS4","og_description":"Pydoop a Python interface for Apache Hadoop Contacts Simone Leo, Gianluigi Zanetti. E-mail:\u00a0valorisation@crs4.it Challenge Over the years, the list of tools for big data analysis kept growing constantly. However, not all of them offer a multi-language API. Apache Hadoop, for instance, is written in Java and expects users to write their applications in Java. Due [&hellip;]","og_url":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/","og_site_name":"CRS4","article_publisher":"https:\/\/www.facebook.com\/pages\/CRS4\/153623948010688","article_published_time":"2024-07-28T21:37:41+00:00","article_modified_time":"2025-04-29T14:12:55+00:00","author":"Paolo Sirigu","twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#article","isPartOf":{"@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/"},"author":{"name":"Paolo Sirigu","@id":"https:\/\/www.crs4.it\/en\/#\/schema\/person\/d6d18aa42b5f98236124cab354b7f22f"},"headline":"Pydoop","datePublished":"2024-07-28T21:37:41+00:00","dateModified":"2025-04-29T14:12:55+00:00","mainEntityOfPage":{"@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/"},"wordCount":190,"publisher":{"@id":"https:\/\/www.crs4.it\/en\/#organization"},"articleSection":["life sciences","Technology catalogue"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/","url":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/","name":"Pydoop - CRS4","isPartOf":{"@id":"https:\/\/www.crs4.it\/en\/#website"},"datePublished":"2024-07-28T21:37:41+00:00","dateModified":"2025-04-29T14:12:55+00:00","breadcrumb":{"@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.crs4.it\/en\/technology-catalogue\/pydoop\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.crs4.it\/en\/"},{"@type":"ListItem","position":2,"name":"Pydoop"}]},{"@type":"WebSite","@id":"https:\/\/www.crs4.it\/en\/#website","url":"https:\/\/www.crs4.it\/en\/","name":"CRS4","description":"","publisher":{"@id":"https:\/\/www.crs4.it\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.crs4.it\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.crs4.it\/en\/#organization","name":"CRS4","url":"https:\/\/www.crs4.it\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.crs4.it\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.crs4.it\/wp-content\/uploads\/CRS4.trentennale_3.png","contentUrl":"https:\/\/www.crs4.it\/wp-content\/uploads\/CRS4.trentennale_3.png","width":1518,"height":577,"caption":"CRS4"},"image":{"@id":"https:\/\/www.crs4.it\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/pages\/CRS4\/153623948010688","https:\/\/www.instagram.com\/crs4.it\/","https:\/\/www.youtube.com\/CRS4video","https:\/\/www.linkedin.com\/company\/crs4","https:\/\/www.slideshare.net\/CRS4"]},{"@type":"Person","@id":"https:\/\/www.crs4.it\/en\/#\/schema\/person\/d6d18aa42b5f98236124cab354b7f22f","name":"Paolo Sirigu","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.crs4.it\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b8b44484d86fad28cb7ed89c8cf7ca1057f60adcf3113c1a0f24d057dbf8005d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b8b44484d86fad28cb7ed89c8cf7ca1057f60adcf3113c1a0f24d057dbf8005d?s=96&d=mm&r=g","caption":"Paolo Sirigu"}}]}},"_links":{"self":[{"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/posts\/5272","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/comments?post=5272"}],"version-history":[{"count":1,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/posts\/5272\/revisions"}],"predecessor-version":[{"id":5273,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/posts\/5272\/revisions\/5273"}],"wp:attachment":[{"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/media?parent=5272"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/categories?post=5272"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.crs4.it\/en\/wp-json\/wp\/v2\/tags?post=5272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}