Pig (programma)

Pig; software
Genere	analitiche (non in lista)
Sviluppatore	Apache Software Foundation
Data prima versione	11 settembre 2008 e 16 maggio 2011
Ultima versione	0.17.0 (19 giugno 2017)
Sistema operativo	Multipiattaforma
Linguaggio	Materia:Java
Licenza	licenza Apache; (licenza libera)
Sito web	pig.apache.org
	Modifica dati su Wikidata · Manuale

Pig^[1] è una piattaforma di alto livello per creare programmi MapReduce da usare con Apache Hadoop. Il linguaggio per questa piattaforma è chiamato Pig Latin.

Questo astrae la programmazione dall'idioma Java MapReduce in una notazione che rende la programmazione MapReduce di alto livello in maniera simile all'SQL dei sistemi RDBMS. Il Pig Latin può essere esteso usando UDF (User Defined Functions) con cui l'utente può scrivere in Java, Python, JavaScript, Ruby o Groovy^[2] e chiamare direttamente dal linguaggio.

Fu sviluppato all'inizio da Yahoo Research nel 2006^[3] per i ricercatori per avere una maniera ad hoc per creare ed eseguire job Map-Reduce su grandi insiemi di dati. Nel 2007^[4] è passato ad Apache Software Foundation^[5].

Esempio

Un esempio di programma "conta parole" in Pig Latin:

 input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
 
 -- Extract words from each line and put them into a pig bag
 -- datatype, then flatten the bag to get one word on each row
 words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
 
 -- filter out any words that are just white spaces
 filtered_words = FILTER words BY word MATCHES '\\w+';
 
 -- create a group for each word
 word_groups = GROUP filtered_words BY word;
 
 -- count the entries in each group
 word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;
 
 -- order the records by count
 ordered_word_count = ORDER word_count BY count DESC;
 STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';

Il programma genera un eseguibile con compiti eseguiti in parallelo che possono essere distribuiti su più macchine in un cluster Hadoop per contare il numero di parole in un insieme di dati come tutte pagine web su internet.

Note

^ Hadoop: Apache Pig, su pig.apache.org. URL consultato il Sep 2, 2011.
^ Pig user defined functions, su pig.apache.org. URL consultato il 3 maggio 2013.
^ Yahoo Blog:Pig – The Road to an Efficient High-level language for Hadoop, su developer.yahoo.com. URL consultato il 23 maggio 2015 (archiviato dall'url originale il 3 febbraio 2016).
^ Pig into Incubation at the Apache Software Foundation, su developer.yahoo.com. URL consultato il 23 maggio 2015 (archiviato dall'url originale il 3 febbraio 2016).
^ The Apache Software Foundation, su apache.org. URL consultato il Nov 1, 2010.

Collegamenti esterni

Sito ufficiale, su pig.apache.org.
Repository sorgenti di Pig, su svn.apache.org.
Sito di segnalazione bug, su issues.apache.org.

Portale Informatica: accedi alle voci di Wikipedia che trattano di informatica

[mainpage-1] Hadoop: Apache Pig, su pig.apache.org. URL consultato il Sep 2, 2011.

[2] Pig user defined functions, su pig.apache.org. URL consultato il 3 maggio 2013.

[3] Yahoo Blog:Pig – The Road to an Efficient High-level language for Hadoop, su developer.yahoo.com. URL consultato il 23 maggio 2015 (archiviato dall'url originale il 3 febbraio 2016).

[4] Pig into Incubation at the Apache Software Foundation, su developer.yahoo.com. URL consultato il 23 maggio 2015 (archiviato dall'url originale il 3 febbraio 2016).

[5] The Apache Software Foundation, su apache.org. URL consultato il Nov 1, 2010.

[1]

[2]

[3]

[4]

[5]

V · D · M Apache Software Foundation
Progetti principali	Accumulo · ActiveMQ · Ambari · Ant · Aries · Apache HTTP Server · APR · Avro · Axis · Axis2 · Beam · Bloodhound · Brooklyn · Buildr · Calcite · Camel · Cassandra · Cayenne · Chemistry · CloudStack · Cocoon · Cordova · CouchDB · cTAKES · CXF · Derby · Directory · Drill · Empire-db · Felix · Flex · Flink · Flume · Geronimo · Gora · Gump · Hadoop · HBase · Hive · Jackrabbit · James · Jini · JMeter · Kafka · Kudu · Kylin · Lucene · Mahout · Maven · MINA · mod_perl · MyFaces · NetBeans · Nutch · OFBiz · Oozie · OpenEJB · OpenJPA · OpenNLP · OpenOffice · PDFBox · Parquet · Phoenix · POI · Pig · Pivot · Qpid · Roller · Samza · ServiceMix · Shiro · Sling · Solr · Spark · Stanbol · Storm · SpamAssassin · Struts 1 · Struts 2 · Subversion · SystemML · Tapestry · Thrift · Tika · Tomcat · Traffic Server · UIMA · Velocity · Wicket · Xalan · Xerces · ZooKeeper
Apache Commons	BCEL · BSF · Daemon · Jelly · Logging
Apache Incubator	NuttX · SINGA · Trafodion · XAP
Altri progetti	Batik · Chainsaw · FOP · Ivy · Log4j
Apache Attic	Abdera · Apex · AxKit · Beehive · Bluesky · iBATIS · Cactus · Click · Continuum · Deltacloud · Excalibur · Forrest · Hama · Harmony · HiveMind · Jakarta · Lenya · Marmotta · ODE · Shale · Shindig · Slide · Sqoop · stdcxx · Tuscany · Wave · Wink · XMLBeans
Licenze	Licenza Apache
Categoria

Pig (programma)

Esempio

Note

Collegamenti esterni

Menu di navigazione

Ricerca