Practical Labs - #SMARTDataSprint

Universidade Nova de Lisboa | NOVA FCSH | iNOVA Media Lab
Digital Media Winter Institute I SMART Data Sprint 2021
The current state of platformisation I 01 – 05 February 2021 I 9:00 – 18:00 (Lisbon time) #SMARTdatasprint | Research Blog | Facebook Group: SMART Data Sprint | @iNOVAmedialab

SLACK: SMARTDataSprint 2021

˚˚ Practical Labs 2021 (timetable)

MONDAY (AFTERNOON), 1 Feb 2021
BEGINNERS TO INTERMEDIATE Zoom Link		INTERMEDIATE TO ADVANCED Zoom Link
13h00 – 13h50	Querying digital platforms & extracting data Janna Joceli Omena	13h00 – 14h00	Opening up the black-box of mobile apps’ traffic Jason Chao
13h50 – 15h00	Using APIs with Facepager Jakob Jünger	14h00 – 15h30	Studying memes through platforms data spreadsheet Elena Pilipets
10m break
15h10 – 16h40	Content Analysis on Instagram Ana Marta M. Flores	15h40 – 17h10	Visualising image clusters with Gephi Density Design Lab Team
16h40 – 18h00	Basic tricks for working with data Fábio Gouveia	17h10 – 18h00	Querying digital platforms & extracting data Janna Joceli Omena

TUESDAY (MORNING), 2 Feb 2021
BEGINNERS TO INTERMEDIATE Zoom Link		INTERMEDIATE TO ADVANCED Zoom Link
09h00 – 10h00	Webscraping with Facepager Jakob Jünger	09h00 – 10h20	Using AI to enrich image data Jason Chao
10h00 – 10h45	Reading digital networks Janna Joceli Omena	10h20 – 11h40	Analysing Images by content similarity with computer vision (CLARIFAI) Density Design Lab Team
10h45 – 11h45	Gephi for Beginners Leonardo Melgaço

TUESDAY (AFTERNOON), 2 Feb 2021
BEGINNERS TO INTERMEDIATE Zoom Link		INTERMEDIATE TO ADVANCED Zoom Link		INTERMEDIATE TO ADVANCED Zoom Link
15h30 – 17h00	Visualising image clusters with Gephi Density Design Lab Team	15h30 – 18h00	Mapping Twitter social media networks with NodeXL Pro Marc Smith	15h30 – 16h30	Reading digital networks Janna Joceli Omena
17h00 – 18h00	Mapping gender issue-networks with Wikipedia: from co-word occurrences to topics of interest Leonardo Melgaço Gracila Vilaça			16h30 – 18h00	Advanced tricks for working with data Fábio Gouveia

˚˚ Practical Labs 2021 (info)

1. Practical Lab	Querying digital platforms & extracting data
#slides/folder URL:	https://drive.google.com/file/d/1nzOKYJPnRoaof8Sbl4NwCt3DxnJglc6u/view?usp=sharing
2. Facilitators	Janna Joceli Omena
3. Short Description	This workshop addresses some aspects to be considered when querying digital platforms and extracting data. From the formulation of research questions as queries (query design) to the use of data extraction tools (software practices), we will reflect on situations in which both software and the researcher’s decision intervene, re-adjust and re-shape representations of online activity.
4. Requirements	We will work with YouTube Data Tools (Rieder, 2015) and Google Spreadsheets.

1. Practical Lab	Content analysis on Instagram
#slides URL/pdf:	http://bit.ly/instagram_AMF
2. Facilitators	Ana Marta M. Flores
3. Short Description	What kind of questions can one answer having access to Instagram data? We are going to work with metrics (likes count, comment count, captions, etc.) and content (image URLs, time/date, captions) from a specific Instagram dataset.To do so, we will start organizing and cleaning the dataset using some shortcuts and cell formulas on Google Spreadsheets. Then, we will perform a preliminary content analysis by identifying and combining: (1) visual patterns/categories on the dataset; (2) high and low engagement posts and (3) most frequently used emojis.Finally, we will develop graphs with this same dataset on Raw Graphs. Visualizations such as treemap (hierarchy), streamgraph (time series) or alluvial diagram (multi categorical) can be built to better present the findings.
4. Requirements	Participants must have a basic knowledge of Google Spreadsheets. Web-based and open tools such as Text Analysis and Raw Graphs will be used in this Practical Lab.

1. Practical Lab	Studying memes through platform data spreadsheets
#slides/folder URL:	http://bit.ly/memes_data2021
2. Facilitators	Elena Pilipets
3. Short Description	This practical lab focuses on the possibilities of exploring memes and other visual vernaculars (e.g., screenshots, GIFs) through platform data spreadsheets. We will first practice how to filter and organize images according to their digital attributes (e.g., time of posting, engagement metrics, hashtags, image captions, etc.) in Google Spreadsheets. In the second step we will learn how to download the images using a list of image URLs and extensions such as DownThemAll to analyze the result based on visual color patterns and temporality. To this end, we will use ImageSorter. In addition, this practical lab offers a discussion of variously designed data visualizations and their narrative possibilities with regard to 1. the contextual situatedness of memes provided through co-tag relations; 2. the restrictions of looking only at the content with the most exposure 3. the patterns of image adaptation over time as indicator of shifts in relations of relevance.
4. Requirements	Please create a Google account to use Google Drive and install the following extensions/visualization tools: DownThemAll for Google Chrome; Image Sorter Windows or Image Sorter Mac. During the workshop, we will also use text analysis tools such as TagCrowd and TextAnalysis. A paper by Sabine Niederer and Gabriele Colombo presenting digital methods approach to image research can be found here.

1. Practical Lab	Using APIs with Facepager
#slides/folder URL:	Facepager SlidesFacepager Cheat sheetFacepager UsergroupPlease use the hashtag #SMARTdatasprint when asking questions in the usergroup.
2. Facilitators	Jakob Jünger
3. Short Description	In the session you will learn how to use APIs from online platforms such as Facebook, YouTube, Twitter, or Wikipedia for automated data collection. After a short introduction into the basics of application programming interfaces we will collect comments for text analysis and links for network analysis. The practical lab introduces you to Facepager, a versatile tool for automated data collection.
4. Requirements	Install and run the latest version of Facepager from https://github.com/strohne/Facepager. Excel, Numbers, R, Python or a similar software for reading spreadsheet data.

1. Practical Lab	Webscraping with Facepager
#slides/folder URL:	Facepager SlidesFacepager Cheat sheetFacepager UsergroupPlease use the hashtag #SMARTdatasprint when asking questions in the usergroup.
2. Facilitators	Jakob Jünger
3. Short Description	Webscraping refers to the automated extraction of data from webpages. It can be used whenever no API is available. After a short introduction to different techniques and hurdles we will extract data from news pages for text analysis and URLs for network analysis. The practical lab introduces you to Facepager, a versatile tool for automated data collection.
4. Requirements	Install and run the latest version of Facepager from https://github.com/strohne/Facepager. Excel, Numbers, R, Python or a similar software for reading spreadsheet data.

1. Practical Lab	Basic tricks for working with data
#slides/folder URL:	Slides
2. Facilitators	Fábio Gouveia
3. Short Description	Dealing with data in text files sometimes can be challenging for beginners. To keep track of their origin, avoid issues during importing, and joining them may need some knowledge that is not easily part of everyday office work.The main goal of this lab is to give some basic tips and tricks to deal with files for further data analysis and visualization.This practical lab will focus on name attribution concerns, file preparation and data consolidation. Some basic approach to shell (command prompt) usage, and text code page issues will also be part of this practical lab.
4. Requirements	Participants need to have their own computer. We will mainly work using a simple text editor like notepad (Sublime Text 3 or Notepad ++ is best) and a spreadsheet like Google Sheets or Excel. We will also briefly explore the shell commands (command prompt) to perform some tasks, so administrator privileges may be necessary.

1. Practical Lab	Advanced tricks for working with data
#slides/folder URL:	Slides
2. Facilitators	Fábio Gouveia
3. Short Description	To deal with large amounts of text files are part of any data scientist regular activity. Challenges arise when you find that you can rely no more on simple text editors or spreadsheets, even in their 64bits versions. Also, structured file formats, although powerful, starts to confront the now intermediate data scientist.The main goal of this lab is to give some intermediate tips and tricks to deal with files for further data analysis and visualization.This practical lab will focus on understanding some more structured file formats and to perform some operations to prepare them to further usage. Some shell command (command prompt) usage, simple software execution and an introduction to Open Refine tool will also be part of this practical lab.
4. Requirements	Participants need to have their own computer. We will mainly work using a simple text editor like notepad (Sublime Text 3 or Notepad ++ is best), spreadsheets like Google Sheets or Excel and Open Refine. We will also explore shell commands (command prompt) to perform some tasks, so administrator privileges may be necessary. Knowledge related to the beginner practical labs is desired.

1. Practical Lab	Using AI to enrich image data
#slides/folder URL:	https://github.com/jason-chao/workshops/blob/main/2021/Feb-SMART2021/enrich_image_data.md
2. Facilitators	Jason Chao
3. Short Description	This practical lab will introduce the affordances of Google Vision API in social research and the tool Memespector-GUI. Google Vision API is a powerful tool widely used in business to derive intelligence from images.The participants will learn how to:1. Make sense of the semi-structured output of the API;2. Repurpose the API to analyse social media images, as an example; and3. Apply secret tricks to keep using the API for free (or at least paying as little as possible).
4. Requirements	Participants need to bring their own computer and have at least one Google/Gmail account and a payment card.

1. Practical Lab	Mapping gender issue-networks with Wikipedia: from co-word occurrences to topics of interest
#slides/folder URL:	http://bit.ly/WomenInTech_WORDij
2. Facilitators	Leonardo Melgaço Gracila Vilaça
3. Short Description	In this practical lab, we will use WORDij, a system based on the linkage strength between words (DANOWSKI, 2013), to process text files from Wikipedia articles. We will focus on the network output from WORDij based on co-word occurrences. By critically analysing word’s associations, within a network in Gephi (BASTIAN; HEYMANN; JACOMY, 2019), we will be able to map topics of interest within an issue-network (ROGERS, 2018). Empirically, we investigate gender gap issues related to the ambivalent movement “women in technology” (CHAU, 2017).
4. Requirements	Please make sure Wordij (<https://www.wordij.net/index.html>) and Gephi (<https://gephi.org>) are installed and running beforehand. We recommend the participants to download JDK “Java Development Kit”. Optical mouse facilitates network navigation in Gephi.

1. Practical Lab	Opening up the black-box of mobile apps’ traffic
#slides/folder URL:	https://github.com/jason-chao/workshops/blob/main/2021/Feb-SMART2021/blackbox_of_mobile_apps_traffic.md
2. Facilitators	Jason Chao
3. Short Description	Unbeknown to the users, many mobile applications (apps) send out questionably huge amounts of data to suspicious destinations which call the purposes of those apps into question. Inspecting the network traffic of digital devices used to require a lab set-up or some technical knowledge.This practical lab will introduce AppTraffic – a newly developed tool aimed at empowering researchers from different backgrounds to easily decrypt and study the data travelling in and out of mobile apps.
4. Requirements	Participants need to bring their mobile device and their own computer. For the mobile device, Apple’s iOS device (iPhone or iPad) is recommended.

1. Practical Lab	Gephi for beginners
#slides/folder URL:	http://bit.ly/GephiForBeginners
2. Facilitators	Leonardo Melgaço
3. Short Description	In this practical lab we will learn how to create and, most importantly, how to unfold a complex network on Gephi (https://gephi.org). We are going to explore the software’s interface and navigate through a network while addressing three central aspects: network spacialization (ForceAtlas2 layout algorithm), network aesthetics (nodes colors and sizes) and statistical modularity (clusters analysis).
4. Requirements	Participants need to install Gephi (https://gephi.org) and make sure it’s running. Tip: download JDK “Java Development Kit”. Optical mouse facilitates network navigation in Gephi.

1. Practical Lab	Mapping Twitter social media networks with NodeXL Pro
#slides/folder URL:	TBA
2. Facilitators	Marc Smith
3. Short Description	In this practical lab, we will use NodeXL Pro (https://nodexlgraphgallery.org) to collect, analyze, visualize and report on social media networks from Twitter. A range of topics, hashtags, URLs, and usernames can be mapped. The variety of social media network structures will be reviewed.mWe will learn to interpret and narrate these networks to others. The practical lab will include the automation features in NodeXL that ensure the creation of readable and complete network and content analysis and visualization.We will explore the many stories and insights that can be extracted and presented through social media network analysis: who are the leaders or most influential contributors, how are sub-groups or internal divisions formed in the population, what topics, URLs, hashtags, and users are most commonly discussed? Using these analytic elements, we will explore the ways these insights tell a story about populations, leaders, and topics over time.
4. Requirements	A computer or tablet is helpful. Windows and Office users will be provided with a courtesy license for the NodeXL Pro application.Mac and Tablet users will be provided with a courtesy month of access to the NodeXL Pro Cloud edition service.Download NodeXL Pro from: https://nodexlgraphgallery.org/Pages/Default.aspxThese YouTube videos are useful introductions to NodeXL and mapping Twitter social media networks:https://www.youtube.com/watch?v=kDiGl-2m868 https://youtu.be/mjAq8eA7uOMThis article from Pew Research is a good written introduction to social media network maps of Twitter:http://www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters/

1. Practical Lab	Reading Digital Networks
#slides/folder URL:	bit.ly/reading-digital-networks-JJO bit.ly/gephi-basic-guide_JJO
2. Facilitators	Janna Joceli Omena
3. Short Description	This workshop introduces methodological strategies to read digital networks according to their visual affordances and technological grammar, also taking into account three different but related aspects: the triad grammatisation-cultures of use-software. We will ask and respond to the following questions: what to look at when reading networks? What does node position and connections mean? How to visually interpret networks?
4. Requirements	We will work with Gephi (Bastian, Heymann & Jacomy, 2019).

1. Practical Lab	Analysing Images by content similarity with computer vision (CLARIFAI)
#slides/folder URL:	https://drive.google.com/drive/folders/1Ak7vKGB831073OIs_V_nBZvUSA1Ox2by?usp=sharing
2. Facilitators	Antonella Autuori, Matteo Bettini, Andrea Elena Febres Medina
3. Short Description	This approach helps to cluster and visualize images in a series, according to how their content is classified by machine learning algorithms. This can be used to define as well as measure thematic visual clusters within a series of images. It is similar to an overview of a co-hashtag, but done with visual content. In reality, for each image, one creates tags with the assistance of computer vision and then uses mutual tags to visually cluster related images. Four major steps are taking place. First, with the help of a computer vision API, photographs are tagged. Second, photographs are locally downloaded and saved. Third, in Gephi, a network of images and tags is constructed and visualized. Finally, images are loaded into the network and exported.
4. Requirements	Gephi installed and signed up on Clarifai. Image Preview Plugin installed in Gephi.

1. Practical Lab	Visualising image clusters with Gephi
#slides/folder URL:	https://drive.google.com/drive/folders/1BrbisV6K3_DBHlI0RAdfykorVMqhOOya?usp=sharing
2. Facilitators	Antonella Autuori, Matteo Bettini, Andrea Elena Febres Medina
3. Short Description	This practical lab helps to cluster and visualize images in a hashtag-image network in the context of social media research. This means that images with similar hashtags can be analysed and visualised based on co-hashtag relations. The main steps to be discussed are how to extract a meaningful network from a spreadsheet file using Table2Net and how to visualize, spatialize and analyze it using Gephi’s image preview plugin.
4. Requirements	Please have Gephi and Image Preview Plugin installed and please install DownThemAll before the workshop.