Universidade Nova de Lisboa | NOVA FCSH | iNOVA Media Lab
Digital Media Winter Institute I SMART Data Sprint 2021
The current state of platformisation I 01 – 05 February 2021 I 9:00 – 18:00 (Lisbon time) #SMARTdatasprint | Research Blog | Facebook Group: SMART Data Sprint | @iNOVAmedialab
SLACK: SMARTDataSprint 2021
˚˚ Practical Labs 2021 (timetable)
MONDAY (AFTERNOON), 1 Feb 2021 |
|||
BEGINNERS TO INTERMEDIATE |
INTERMEDIATE TO ADVANCED |
||
13h00 – 13h50 |
Querying digital platforms & extracting data |
13h00 – 14h00 |
Opening up the black-box of mobile apps’ traffic |
13h50 – 15h00 |
Using APIs with Facepager |
14h00 – 15h30 |
Studying memes through platforms data spreadsheet |
10m break |
|||
15h10 – 16h40 |
Content Analysis on Instagram |
15h40 – 17h10 |
Visualising image clusters with Gephi |
16h40 – 18h00 |
Basic tricks for working with data |
17h10 – 18h00 |
Querying digital platforms & extracting data |
TUESDAY (MORNING), 2 Feb 2021 |
||||
BEGINNERS TO INTERMEDIATE |
INTERMEDIATE TO ADVANCED |
|||
09h00 – 10h00 |
Webscraping with Facepager |
09h00 – 10h20 |
Using AI to enrich image data |
|
10h00 – 10h45 |
Reading digital networks |
10h20 – 11h40 |
Analysing Images by content similarity with computer vision (CLARIFAI) |
|
10h45 – 11h45 |
Gephi for Beginners |
TUESDAY (AFTERNOON), 2 Feb 2021 |
|||||
BEGINNERS TO INTERMEDIATE |
INTERMEDIATE TO ADVANCED |
INTERMEDIATE TO ADVANCED |
|||
15h30 – 17h00 |
Visualising image clusters with Gephi |
15h30 – 18h00 |
Mapping Twitter social media networks with NodeXL Pro |
15h30 – 16h30 |
Reading digital networks |
17h00 – 18h00 |
Mapping gender issue-networks with Wikipedia: from co-word occurrences to topics of interest |
16h30 – 18h00 |
Advanced tricks for working with data |
˚˚ Practical Labs 2021 (info)
1. Practical Lab |
Querying digital platforms & extracting data |
#slides/folder URL: |
https://drive.google.com/file/d/1nzOKYJPnRoaof8Sbl4NwCt3DxnJglc6u/view?usp=sharing |
2. Facilitators |
Janna Joceli Omena |
3. Short Description |
This workshop addresses some aspects to be considered when querying digital platforms and extracting data. From the formulation of research questions as queries (query design) to the use of data extraction tools (software practices), we will reflect on situations in which both software and the researcher’s decision intervene, re-adjust and re-shape representations of online activity. |
4. Requirements |
We will work with YouTube Data Tools (Rieder, 2015) and Google Spreadsheets. |
1. Practical Lab |
Content analysis on Instagram |
#slides URL/pdf: |
http://bit.ly/instagram_AMF |
2. Facilitators |
Ana Marta M. Flores |
3. Short Description |
What kind of questions can one answer having access to Instagram data? We are going to work with metrics (likes count, comment count, captions, etc.) and content (image URLs, time/date, captions) from a specific Instagram dataset.To do so, we will start organizing and cleaning the dataset using some shortcuts and cell formulas on Google Spreadsheets. Then, we will perform a preliminary content analysis by identifying and combining: (1) visual patterns/categories on the dataset; (2) high and low engagement posts and (3) most frequently used emojis.Finally, we will develop graphs with this same dataset on Raw Graphs. Visualizations such as treemap (hierarchy), streamgraph (time series) or alluvial diagram (multi categorical) can be built to better present the findings. |
4. Requirements |
Participants must have a basic knowledge of Google Spreadsheets. Web-based and open tools such as Text Analysis and Raw Graphs will be used in this Practical Lab. |
1. Practical Lab |
Studying memes through platform data spreadsheets |
#slides/folder URL: |
|
2. Facilitators |
Elena Pilipets |
3. Short Description |
This practical lab focuses on the possibilities of exploring memes and other visual vernaculars (e.g., screenshots, GIFs) through platform data spreadsheets. We will first practice how to filter and organize images according to their digital attributes (e.g., time of posting, engagement metrics, hashtags, image captions, etc.) in Google Spreadsheets. In the second step we will learn how to download the images using a list of image URLs and extensions such as DownThemAll to analyze the result based on visual color patterns and temporality. To this end, we will use ImageSorter. In addition, this practical lab offers a discussion of variously designed data visualizations and their narrative possibilities with regard to 1. the contextual situatedness of memes provided through co-tag relations; 2. the restrictions of looking only at the content with the most exposure 3. the patterns of image adaptation over time as indicator of shifts in relations of relevance. |
4. Requirements |
Please create a Google account to use Google Drive and install the following extensions/visualization tools: DownThemAll for Google Chrome; Image Sorter Windows or Image Sorter Mac. During the workshop, we will also use text analysis tools such as TagCrowd and TextAnalysis. A paper by Sabine Niederer and Gabriele Colombo presenting digital methods approach to image research can be found here. |
1. Practical Lab |
Using APIs with Facepager |
#slides/folder URL: |
Facepager SlidesFacepager Cheat sheetFacepager UsergroupPlease use the hashtag #SMARTdatasprint |
2. Facilitators |
Jakob Jünger |
3. Short Description |
In the session you will learn how to use APIs from online platforms such as Facebook, YouTube, Twitter, or Wikipedia for automated data collection. After a short introduction into the basics of application programming interfaces we will collect comments for text analysis and links for network analysis. The practical lab introduces you to Facepager, a versatile tool for automated data collection. |
4. Requirements |
Install and run the latest version of Facepager from https://github.com/strohne/Facepager. Excel, Numbers, R, Python or a similar software for reading spreadsheet data. |
1. Practical Lab |
Webscraping with Facepager |
#slides/folder URL: |
Facepager SlidesFacepager Cheat sheetFacepager UsergroupPlease use the hashtag #SMARTdatasprint |
2. Facilitators |
Jakob Jünger |
3. Short Description |
Webscraping refers to the automated extraction of data from webpages. It can be used whenever no API is available. After a short introduction to different techniques and hurdles we will extract data from news pages for text analysis and URLs for network analysis. The practical lab introduces you to Facepager, a versatile tool for automated data collection. |
4. Requirements |
Install and run the latest version of Facepager from https://github.com/strohne/Facepager. Excel, Numbers, R, Python or a similar software for reading spreadsheet data. |
1. Practical Lab |
Basic tricks for working with data |
#slides/folder URL: |
|
2. Facilitators |
Fábio Gouveia |
3. Short Description |
Dealing with data in text files sometimes can be challenging for beginners. To keep track of their origin, avoid issues during importing, and joining them may need some knowledge that is not easily part of everyday office work.The main goal of this lab is to give some basic tips and tricks to deal with files for further data analysis and visualization.This practical lab will focus on name attribution concerns, file preparation and data consolidation. Some basic approach to shell (command prompt) usage, and text code page issues will also be part of this practical lab. |
4. Requirements |
Participants need to have their own computer. We will mainly work using a simple text editor like notepad (Sublime Text 3 or Notepad ++ is best) and a spreadsheet like Google Sheets or Excel. We will also briefly explore the shell commands (command prompt) to perform some tasks, so administrator privileges may be necessary. |
1. Practical Lab |
Advanced tricks for working with data |
#slides/folder URL: |
|
2. Facilitators |
Fábio Gouveia |
3. Short Description |
To deal with large amounts of text files are part of any data scientist regular activity. Challenges arise when you find that you can rely no more on simple text editors or spreadsheets, even in their 64bits versions. Also, structured file formats, although powerful, starts to confront the now intermediate data scientist.The main goal of this lab is to give some intermediate tips and tricks to deal with files for further data analysis and visualization.This practical lab will focus on understanding some more structured file formats and to perform some operations to prepare them to further usage. Some shell command (command prompt) usage, simple software execution and an introduction to Open Refine tool will also be part of this practical lab. |
4. Requirements |
Participants need to have their own computer. We will mainly work using a simple text editor like notepad (Sublime Text 3 or Notepad ++ is best), spreadsheets like Google Sheets or Excel and Open Refine. We will also explore shell commands (command prompt) to perform some tasks, so administrator privileges may be necessary. Knowledge related to the beginner practical labs is desired. |
1. Practical Lab |
Using AI to enrich image data |
#slides/folder URL: |
https://github.com/jason-chao/workshops/blob/main/2021/Feb-SMART2021/enrich_image_data.md |
2. Facilitators |
Jason Chao |
3. Short Description |
This practical lab will introduce the affordances of Google Vision API in social research and the tool Memespector-GUI. Google Vision API is a powerful tool widely used in business to derive intelligence from images.The participants will learn how to:1. Make sense of the semi-structured output of the API;2. Repurpose the API to analyse social media images, as an example; and3. Apply secret tricks to keep using the API for free (or at least paying as little as possible). |
4. Requirements |
Participants need to bring their own computer and have at least one Google/Gmail account and a payment card. |
1. Practical Lab |
Mapping gender issue-networks with Wikipedia: from co-word occurrences to topics of interest |
#slides/folder URL: |
|
2. Facilitators |
Leonardo Melgaço |
3. Short Description |
In this practical lab, we will use WORDij, a system based on the linkage strength between words (DANOWSKI, 2013), to process text files from Wikipedia articles. We will focus on the network output from WORDij based on co-word occurrences. By critically analysing word’s associations, within a network in Gephi (BASTIAN; HEYMANN; JACOMY, 2019), we will be able to map topics of interest within an issue-network (ROGERS, 2018). Empirically, we investigate gender gap issues related to the ambivalent movement “women in technology” (CHAU, 2017). |
4. Requirements |
Please make sure Wordij (<https://www.wordij.net/index.html>) and Gephi (<https://gephi.org>) are installed and running beforehand. We recommend the participants to download JDK “Java Development Kit”. Optical mouse facilitates network navigation in Gephi. |
1. Practical Lab |
Opening up the black-box of mobile apps’ traffic |
#slides/folder URL: |
|
2. Facilitators |
Jason Chao |
3. Short Description |
Unbeknown to the users, many mobile applications (apps) send out questionably huge amounts of data to suspicious destinations which call the purposes of those apps into question. Inspecting the network traffic of digital devices used to require a lab set-up or some technical knowledge.This practical lab will introduce AppTraffic – a newly developed tool aimed at empowering researchers from different backgrounds to easily decrypt and study the data travelling in and out of mobile apps. |
4. Requirements |
Participants need to bring their mobile device and their own computer. For the mobile device, Apple’s iOS device (iPhone or iPad) is recommended. |
1. Practical Lab |
Gephi for beginners |
#slides/folder URL: |
|
2. Facilitators |
Leonardo Melgaço |
3. Short Description |
In this practical lab we will learn how to create and, most importantly, how to unfold a complex network on Gephi (https://gephi.org). We are going to explore the software’s interface and navigate through a network while addressing three central aspects: network spacialization (ForceAtlas2 layout algorithm), network aesthetics (nodes colors and sizes) and statistical modularity (clusters analysis). |
4. Requirements |
Participants need to install Gephi (https://gephi.org) and make sure it’s running. Tip: download JDK “Java Development Kit”. Optical mouse facilitates network navigation in Gephi. |
1. Practical Lab |
Mapping Twitter social media networks with NodeXL Pro |
#slides/folder URL: |
TBA |
2. Facilitators |
Marc Smith |
3. Short Description |
In this practical lab, we will use NodeXL Pro (https://nodexlgraphgallery.org) to collect, analyze, visualize and report on social media networks from Twitter. A range of topics, hashtags, URLs, and usernames can be mapped. The variety of social media network structures will be reviewed.mWe will learn to interpret and narrate these networks to others. The practical lab will include the automation features in NodeXL that ensure the creation of readable and complete network and content analysis and visualization.We will explore the many stories and insights that can be extracted and presented through social media network analysis:
Using these analytic elements, we will explore the ways these insights tell a story about populations, leaders, and topics over time. |
4. Requirements |
A computer or tablet is helpful. Windows and Office users will be provided with a courtesy license for the NodeXL Pro application.Mac and Tablet users will be provided with a courtesy month of access to the NodeXL Pro Cloud edition service.Download NodeXL Pro from: https://nodexlgraphgallery.org/Pages/Default.aspxThese YouTube videos are useful introductions to NodeXL and mapping Twitter social media networks:https://www.youtube.com/watch?v=kDiGl-2m868https://youtu.be/mjAq8eA7uOMThis article from Pew Research is a good written introduction to social media network maps of Twitter:http://www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters/ |
1. Practical Lab |
Reading Digital Networks |
#slides/folder URL: |
|
2. Facilitators |
Janna Joceli Omena |
3. Short Description |
This workshop introduces methodological strategies to read digital networks according to their visual affordances and technological grammar, also taking into account three different but related aspects: the triad grammatisation-cultures of use-software. We will ask and respond to the following questions: what to look at when reading networks? What does node position and connections mean? How to visually interpret networks? |
4. Requirements |
We will work with Gephi (Bastian, Heymann & Jacomy, 2019). |
1. Practical Lab |
Analysing Images by content similarity with computer vision (CLARIFAI) |
#slides/folder URL: |
https://drive.google.com/drive/folders/1Ak7vKGB831073OIs_V_nBZvUSA1Ox2by?usp=sharing |
2. Facilitators |
Antonella Autuori, Matteo Bettini, Andrea Elena Febres Medina |
3. Short Description |
This approach helps to cluster and visualize images in a series, according to how their content is classified by machine learning algorithms. This can be used to define as well as measure thematic visual clusters within a series of images. It is similar to an overview of a co-hashtag, but done with visual content. In reality, for each image, one creates tags with the assistance of computer vision and then uses mutual tags to visually cluster related images. Four major steps are taking place. First, with the help of a computer vision API, photographs are tagged. Second, photographs are locally downloaded and saved. Third, in Gephi, a network of images and tags is constructed and visualized. Finally, images are loaded into the network and exported. |
4. Requirements |
Gephi installed and signed up on Clarifai. Image Preview Plugin installed in Gephi. |
1. Practical Lab |
Visualising image clusters with Gephi |
#slides/folder URL: |
https://drive.google.com/drive/folders/1BrbisV6K3_DBHlI0RAdfykorVMqhOOya?usp=sharing |
2. Facilitators |
Antonella Autuori, Matteo Bettini, Andrea Elena Febres Medina |
3. Short Description |
This practical lab helps to cluster and visualize images in a hashtag-image network in the context of social media research. This means that images with similar hashtags can be analysed and visualised based on co-hashtag relations. The main steps to be discussed are how to extract a meaningful network from a spreadsheet file using Table2Net and how to visualize, spatialize and analyze it using Gephi’s image preview plugin. |
4. Requirements |
Please have Gephi and Image Preview Plugin installed and please install DownThemAll before the workshop. |