Practical Labs

27 - 31 January 2020 

9:30 – 18:00 | #SMARTdatasprint | Research Blog

Facebook Group: SMART Data Sprint |@iNOVAmedialab

Universidade Nova de Lisboa  | NOVA FCSH | iNOVA Media Lab

˚˚ Practical Labs  2020 (timetable)


MONDAY
Torre B 

Auditório 1

First floor
Torre B

Auditório 2

Third floor
Torre B

T5

Second Floor
Torre B

T6

Second Floor
Torre B

T9

Third floor
Morning

12:15

-13:30
Researching YouTube Bernhard Rieder
Gephi for beginners

Fábio Gouveia
Extracting meaning from spreadsheets through dataviz

Elena Aversa
Getting to know data extraction + text analysis tools

Flores & Pilipets
RawGraphs: an open-source visualisation platform

Serena Del Nero
Afternoon

14:30

-15:40
Researching YouTube Bernhard Rieder
Gephi for beginners

Fábio Gouveia
Extracting meaning from spreadsheets through dataviz

Elena Aversa
Data beautification

Giacomo Flaim & Beatrice Gobbo
RawGraphs: an open-source visualisation platform

Serena Del Nero
 
TUESDAY
Torre B

Auditório 2

Third floor
Torre B

T9

Third floor
Torre B

T5

Second Floor
Torre B

T6

Third floor
Torre B

T10

Third floor
Morning

9:30

-11:00
Visual Network Analysis

[part 1]

Tommaso Venturini
Shaping questions for Trends Studies through Digital Methods

Ana Marta Flores
Researching YouTube 

Bernhard Rieder
Doing research with digital images: Clarifai API interface

Elena Aversa & Serena Del Nero
Building Hashtag-Image Networks 

Elena Pilipets
Morning

11:00

-12:30
Visual Network Analysis

[part 1]

Tommaso Venturini
Using natural language processing tools for data analysis and extraction

Benjamin Meindl
Creating Maps and Measures with NodeXL

Marc Smith
Doing research with digital images: Clarifai API interface

Elena Aversa & Serena Del Nero
Building Hashtag-Image Networks 

Elena Pilipets
Extra Activities & Optional Meetings
Torre B

T5

Second Floor
Torre B

T9

Third floor
Afternoon

17:00

- 18:00
Q&A with Tommaso Venturini on visual network analysis (VNA)
 Q&A with Marc Smith & getting ready to work with NodeXL
 
WEDNESDAY
Torre B

T5

Second Floor
Torre B

T10

Third floor
Torre B

T11

Third floor
Torre B

T9

Third floor
Torre B

T6

Second Floor
Morning

9:30

-11:00
Visual Network Analysis

[part 2]

Tommaso Venturini
Data beautification

Giacomo Flaim & Beatrice Gobbo
Creating Maps and Measures with NodeXL

Marc Smith
Data mining and visualisation with R 

Hamdan Azhar

cancelled
Using natural language processing tools for data analysis and extraction

Benjamin Meindl
Afternoon

14:20

-15:50
---------
When dataviz is ugly

Beatrice Gobbo & Giacomo Flaim 
Creating Maps and Measures with NodeXL

Marc Smith
Data mining and visualisation with R 

Hamdan Azhar

cancelled
Using natural language processing tools for data analysis and extraction

Benjamin Meindl
Optional Meeting
Afternoon

17:00

- 18:00
Q&A with Tommaso Venturini on visual network analysis 
Torre B

T6

Second Floor

˚˚ Practical Labs  2020 (info)
1. Practical Lab 
Visual Network Analysis  I – How to Make Networks Readable 
#slides URL/pdf:
 
2. Facilitators 
Tommaso Venturini
3. Short Description 
In this practical workshop, we will use Gephi (https://gephi.org) to play with an example network of Wikipedia pages on geoengineering. We will learn not only how to read network, but most importantly how to make them readable. Through a process of manual hermeneutics, we will experiment with the unfolding of a complex network and discover a few conceptual and practical tricks to go from a hairball to a structured graph.
4. Requirements 
Everyone 
 
1. Practical Lab 
Visual Network Analysis  II – Six Network Stories 
#slides URL/pdf:
 
2. Facilitators 
Tommaso Venturini
3. Short Description 
In this practical workshop, we will start by using Table2Net (https://medialab.github.io/table2net/) to extract a network from a table of literary characters. Using Gephi (https://gephi.org) on the resulting network, we will discuss six different story-telling strategies that can be used to narrate a network in order to make it understandable for anyone other than its creator.
4. Requirements 
Everyone 
 
1. Practical Lab 
Visual Network Analysis  Q&A
#slides URL/pdf:
 
2. Facilitators 
Tommaso Venturini
3. Short Description 
During the Q&A, groups and individual participants  will receive personalized feedback and support in the creation and analysis of their networks.
4. Requirements 
Everyone 
 
1. Practical Lab 
Extracting meaning from spreadsheets through dataviz
2. Facilitators 
Elena Aversa 
#slides URL/pdf:
 
3. Short Description 
Data visualization is a medium for communicating the subtleties and complexities of what is hidden in large sets of measured data (Glenn J. Myatt, Wayne P. Johnsons, 2009). Presenting massive amounts of information in visual forms makes it easier for the human brain to understand complex encoded data, to identify hidden patterns and to grasp concepts. In short, data graphics allow for transparent and effective communication of data, while also serving as a tool for research development. Although traditionally data visualisation in academia has been used to summarise outcomes, it is crucial to shift focus to its role in the process of developing research. Beyond being a tool for better present findings and arguments, data visualization can benefit the academic research also at the initial stage, when theory is still being developed (Iliinsky, 2012) providing an excellent approach for exploring data. New paths and hypothesis can suddenly emerge by just plotting the values displayed in a table of descriptive statistics. Different visualized parametres can reveal new different patterns. For instance, comparing time series can elucidate significant facts overtime, whether plotting related values can help recognize causal relationships and their mechanisms, or still mind-maps and flow charts can be helpful for theory-refining.

The main goal of this lab is to understand this essential approach of data visualization as a developmental aid for academics to assist in theory-building and refining. In short, data visualization and the way it supports meaning-making about data. 

The lab is structured as follows. First, we discuss and show consistent examples of data visualisation used as a research and a dissemination tool; secondly, we focus on a given dataset and on the different ways to extract meaning from it. This second phase is crucial to understand how to behave with data visualization during the exploratory stage of an academic research. In short, we try to answer the frequent “after-scraping” question: what do I do now with this .csv? Finally, the last challenge is a hands-on activity during which you can play with your dataset and visualize its potential paths.
4. Requirements 
Participants need to bring their own computer. We will mainly work using spreadsheets and RawGraphs (give a look at https://rawgraphs.io/ to get familiar with it). 

Beginners
 
1. Practical Lab 
Building Hashtag-Image Networks 
#slides URL/pdf:
https://drive.google.com/drive/u/0/folders/1bATYXMx6-dSik6URH4RCB0sOan5rjiO8
2. Facilitators 
Elena Pilipets 
3. Short Description 
This practical lab is a step-by-step walkthrough the possibilities of hashtag-image networks in the context of social media research and visual digital methods. Specifically, we will discuss how to extract a meaningful network from a spreadsheet file using Table2Net and how to visualize, spatialize and analyze it using Gephi’s image plugin. In addition, we offer a discussion of variously designed networks and their narrative possibilities with regard to 1. the contextual situatedness of the images provided through co-tag relations; 2. the restrictions of looking only at the content with the most exposure 3. the density of associations between images and tags as indicator of shifts in relations of relevance. A project discussing the topic of the practical lab in the context of issue mapping and bot engagement can be found here. A paper by Gabriele Colombo presenting the context of digital methods approach to image research can be found here.  A paper discussing the possibilities of Gephi spatialisation algorithms for  network analysis by Venturini/Jacomy & Jensen can be found here.
4. Requirements 
Please bring your own computer and install Gephi before. To download the images using a list of image URLs please install DownThemAll before the workshop. Amateurs-Advanced. 
 
1. Practical Lab 
Researching YouTube (Communities)
#slides URL/pdf:
 
2. Facilitators 
Bernhard Rieder
3. Short Description 
This workshop introduces methodological strategies for researching YouTube, with a focus on identifying and demarcating specific sub-communities on the platform. Starting from the various entry points for data extraction afforded by the YouTube API, we will look at different sampling strategies and how they relate to different research interests. The goal is to assemble sets of data points that can be analyzed further to develop insights into how specific topics are covered on the platform and what kind of channel ecosystem supports this coverage.

With the help of the YouTube Data Tools, we will particularly look into iterative and combinatorial strategies for the construction of datasets of both channels and videos.
4. Requirements 
Please bring your own computer and install Gephi beforehand. Please also watch this video introducing the YouTube Data Tools.
 
1. Practical Lab 
Doing research with digital images: Clarifai API interface
#slides URL/pdf:
http://bit.ly/DS20-Clarifai
2. Facilitators 
Elena Aversa, Serena Del Nero
3. Short Description 
Visual processing involves a large percentage of the human brain; allowing humans to deal with images at incredible speeds. Since our attention is easily grabbed, pictures are without doubt the perfect way to communicate in today’s short-attention world. Internet and social media are overrun with visual information from every single corner of the world. Whether it’s a meme, artsy photo or selfie, our social feeds are increasingly filled with more images and less text. Analysis on the complexities of this new “visual” datasets become more and more widespread to better understand social perceptions, human habits and global debates.

The main goal of this practical lab is to deepen the field of doing research with images. We will go through different approaches in doing image analysis; focusing for instance on visual attributes like color and similarity, as well as content features to better spot patterns and clusters. 

The lab is structured in two branches; first, a more analytical section during which we will go around the subject of matter and understand how to use  Clarifai API interface. The second phase will be much fun: we will test our learning skills and play with some images.
4. Requirements 
Participants need to bring their own computer and sign up on Clarifai. We will work using spreadsheets and Image Tagging Tool Interface and Clarifai. You should also download the ghepy plugin image preview.  Amateurs.
 
1. Practical Lab 
RawGraphs: an open-source visualisation platform
#slides URL/pdf:
http://bit.ly/DS20-RawGraphs
2. Facilitators 
Serena Del Nero
3. Short Description 
The amount of data with which we interact is increasingly in complexity, we surf the web and produce a huge amount of data every day. In order to visualize these data it is necessary to use tools and software to transform them into visual representation. RawGraphs fills the missing link between spreadsheet applications and vector-graphics editors.

The main goal of this workshop is to learn how to visualize data using this tool and understanding the importance of visual models. Choosing the right one is essential to display the data correctly.

The workshop is structured in three parts. First we will focus on the data and the visual models to represent them, second we will learn together RawGraphs following a step-by-step tutorial and finally, a more practical and interactive activity: a dataset will be assigned to the participants which will have to create a visualization using the most suitable visual model for the assigned type of data.
4. Requirements 
Participants need to bring their own computer. We will work using spreadsheets and RawGraphs (give a look to get familiar with it). 

Beginners
 
1. Practical Lab 
Using natural language processing tools for data analysis and extraction
#slides URL/pdf:
http://bit.ly/nlp_smart 
2. Facilitator
Benjamin Meindl
3. Short Description 
Advances in machine learning and increasing computational power allow to use machine learning tools easily on your computer. One field of machine learning is natural language processing (NLP), which allows to extract information from text. In this tutorial, we will explore simple tools for NLP along two frequent use cases.

 

·      Text classification, which can help, e.g., to identify sentiment, or categorize texts; and

·      Named entity recognition, which allows to extract certain types of words from a text, e.g., all location descriptions, drugs, disease names, plants, people, etc.

 

We will explore simple, but powerful, tools, which may be beneficial for the data sprint and for your research.
4. Requirements 
No previous skills are required. However, some basic programming skills (i.e., python), are beneficial, particularly for the final parts of the tutorials, where we use our classifiers to extract data. We will provide a Jupyter notebook, which allows everyone to run and/or adjust the code on their own laptop.

 

In order to try it yourself, please install python and jupyter notebook on your laptop. A description of the installation process as well as required packages can be found here: https://jupyter.readthedocs.io/en/latest/install.html.

Finally, I also recommend installing the annotation tool Prodigy, which helps you to do annotations and language processing in a super simple way! Therefore, please apply for the free student trial https://prodi.gy/buy and install afterwards. This tool will be helpful not only for the tutorial but may be your entry point to NLP.
 
1. Practical Lab 
Charting Collections of Connections in Social Media: Creating Maps and Measures with NodeXL
#slides URL/pdf:
UPDATED:

https://drive.google.com/file/d/1fLgumqcyoyTwr20NLkW3u7K_J_yM2pIC/view?usp=sharing
2. Facilitators 
Marc Smith
3. Short Description 
Abstract:  Networks are a data structure commonly found in any social media service that allows populations to author collections of connections.  The Social Media Research Foundation's NodeXL project makes analysis of social media networks accessible to most users of the Excel spreadsheet application.  With NodeXL, network charts become as easy to create as pie charts. Recent research created by applying the tool to a range of social media networks has already revealed the variations in network structures present in online social spaces.  A review of the tool and images of Twitter, flickr, YouTube, Facebook, Wikis and email networks will be presented. 

Description: We now live in a sea of tweets, posts, blogs, and updates coming from a significant fraction of the people in the connected world.  Our personal and professional relationships are now made up as much of texts, emails, phone calls, photos, videos, documents, slides, and game play as by face-to-face interactions.  Social media can be a bewildering stream of comments, a daunting fire hose of content. With better tools and a few key concepts from the social sciences, the social media swarm of favorites, comments, tags, likes, ratings, updates and links can be brought into clearer focus to reveal key people, topics and sub-communities.  As more social interactions move through machine-readable data sets new insights and illustrations of human relationships and organizations become possible. But new forms of data require new tools to collect, analyze, and communicate insights.  
4. Requirements 
 Get NodeXL Installed: https://www.nodexlgraphgallery.org/NodeXLSetup/NodeXLProExcelTemplate2014Setup.exe 


Review these links:
 
1. Practical Lab 
Shaping questions for Trends Studies through Digital Methods
#slides URL/pdf:
http://tiny.cc/trends_digitalmethods 
2. Facilitators 
Ana Marta M. Flores
3. Short Description 
A trend is more than a product or object. Trends Studies understand socio-cultural movements as behaviour patterns that can be translated into strategies for any kind of sector.  When we observe society to identify trends we must start with the right questions. To accomplish that, we propose a short course using some of the digital methods thought and structures to shape the research focused on trends. 

The main objective is to expand the Digital Methods approach combining with Trends Studies theory and practice. Initially, participants should understand key concepts such as trend (micro and macro), how trends can be identified and what uses one gives to them. Next, we will get familiar with examples and possibilities using culture and journalism next to tools such as YTDT, Image Scraper and Raw Graphs. 
4. Requirements 
Computers are optional, as the course will be mostly expository.

It will be important, however, that participants have a basic understanding of spreadsheets, Youtube Data Tools, Raw Graphs and Google Image Scraper. Amateurs.
 
1. Practical Lab 
When data visualization is ugly — or bad —
#slides URL/pdf:
 
2. Facilitators 
Beatrice Gobbo, Giacomo Flaim
3. Short Description 
The practical lab is divided into two phases: 
  • a theoretical introduction;
  • a practical workshop. 
The aim of the lab is to provide tools for avoiding errors in visualizations.

By browsing the web, it's easy to encounter incorrect bad and ugly visualizations, which can lead to misunderstandings and misleading interpretations. The information visualization designer's role is not only to create visualizations but also to understand which are the visual elements that could make them bad and ugly*.

In the design process, several mistakes can lead to bad and ugly final outputs: from the process of data selection (more → Extracting meaning from spreadsheets through DataViz by Elena Aversa), to the process of mapping variables and values (more → RawGraphs: an open-source visualization platform by Serena del Nero) to the process of the staging of the final visualization

After a brief theoretical introduction to basic visualization concepts, such as the use of visual variables (J. Bertin, 1967) and the data-ink ratio (E. Tufte, 1983), participants will be asked to find errors in a collection of visualizations (deliberately considered bad and ugly by the DataViz community), and to which phase(s) of the design process can be traced back.

Then during the workshop session, participants will be asked to analyse a visualisation, spot errors, re-design and present it in order to open the discussion in the last part of the practical lab.

* In the tutorial, we won't use bad and ugly as the opposite of beautiful, but as the opposite of correct.
4. Requirements 
Participants need to bring their own computer.
  • Worksheet editor –  Google Spreadsheets, or  Excel
  • Image editor – vectr.com or Adobe Illustrator
Suggested practical labs to follow:

Extracting meaning from spreadsheets through dataviz

(Elena Aversa)

RawGraphs: an open-source visualisation platform

(Serena Del Nero)

Amateurs and Experts
 
1. Practical Lab 
Data beautification
2. Facilitators 
Giacomo Flaim & Beatrice Gobbo
3. Short Description 
 
4. Requirements 
http://bit.ly/Data-beautification-2020
 
1. Practical Lab 
Data extraction and Text analysis
#slides URL/pdf:
https://drive.google.com/drive/u/0/folders/1bATYXMx6-dSik6URH4RCB0sOan5rjiO8
2. Facilitators 
Ana Marta M. Flores & Elena Pilipets 
3. Short Description 
In this practical lab, we discuss and practice how to query, collect, download and analyse data using DMI and other tools for data extraction and analysis. In the second part of the workshop we specifically focus on how to work with textual material (e.g. with post captions, emojis, or hashtags) by introducing and practicing different tools (see e.g. Textanalysis, TagCrowd, Wordle, WordTree, WordIJ ) and designing different forms of visualisation.  
4. Requirements 
Please bring your computers, familiarize yourself with the tools mentioned above and install Gephi before.  Beginners.