Search This Site

Workshop: Multi-Campus Cyber-Security Data Curation for Research and Education

Workshop Goal

The goal of the workshop is to engage the community to formulate a vision and roadmap for the creation of a multi-campus data collection and sharing infrastructure for use by machine-learning cybersecurity and privacy researchers. Such a federated infrastructure will be invaluable for detecting zero-day (new, previously unseen) attacks and large-scale attacks with complex kill-chains, e.g., the Wannacry ransomware attack, Mirai Distributed Denial of Service (DDoS) attacks and Advanced Persistent Threat (APT) attacks. Discussion will encompass legal, ethical, privacy, organizational and sustainability considerations.

New! Post Workshop Report

The workshop report can be found here.


  • Jack Davidson, University of Virginia
  • Howie Huang, George Washington University
  • Von S. Welch, Indiana University

Save the dates

The workshop will be held virtually from Tuesday July 27 - Thursday July 29 2021. 

  • Tuesday July 27: 11am - 4pm EDT
  • Wednesday July 28: 11am - 4pm EDT
  • Thursday July 29: 11am - 4 pm EDT

Tentative schedule outline.

Tuesday 7/27

The first day will feature three invited talks and setup the context and framing for the rest of the workshop.

Time Slot Agenda
1100 - 1200

Organizers and attendee introduction
Opening Remarks, Marilyn McClure, National Science Foundation, CNS Program Director

1200 - 1300

Setting the Context. Von Welch, Jack Davidson, Howie Hwang

Von Welch, Multi-Campus Cyber-Security Data Curation for Research and Education: Vision and Path Forward

1300 - 1330 Break/Lunch
1330 - 1430

Invited Talk: Reflections on WOMBIR: Workshop on Overcoming Measurement Barriers to Internet Research

k claffy, Director, Center for Applied Internet Data Analysis (CAIDA)

Kimberly Claffy ("kc claffy") is founder and director of the Center for Applied Internet Data Analysis (CAIDA), a resident research scientist of the San Diego Supercomputer Center at UC, San Diego, and an Adjunct Professor in the Computer Science and Engineering Department at UC, San Diego. Her research interests span Internet topology, routing, security, economics, future Internet architectures, and policy. She leads CAIDA research and infrastructure efforts in Internet cartography, aimed at characterizing the changing nature of the Internet's topology, routing and traffic dynamics, and investigating the implications of these changes on network science, architecture, infrastructure security and stability, and public policy. She has been at SDSC since 1991 and holds a Ph.D. in Computer Science from UC San Diego.

1430 - 1440 Break
1440 - 1540

Invited Talk: Machine Learning and Data Privacy in Security, an Industry Perspective 

William Hewlett, Director, AI Research, Palo Alto Networks.

Billy went to Stanford for undergrad (Symbolic Systems) and masters (Computer Science) with a focus in Artificial Intelligence (AI).  He worked at a few computer game companies (including EA and Blizzard) building AI for games before getting a PhD in CS at UCLA.  He went straight from UCLA to Palo Alto Networks, a leader in Network Security, where he has been for the last 8 years.  In one form or another, he has been working in the field of AI for more than 20 years.  Billy is the Director of the AI Research Team, which puts machine learning in our products to protect our customers.

1540 - 1600 Wrap-up

Wednesday 7/28

The second day features a panel discussion, followed by concurrent breakout sessions.

Time Slot Agenda
1100 - 1130 Welcome, logistics, introduction and summary of Day 1
1130 - 1230

Panel: Explore the benefits to using multi-campus IT data for cybersecurity research and what the barriers are to allowing that research

  • Fred Cate, Vice President for Research, Indiana University
  • Ronald Hutchins, Vice Provost of Information Technology, University of Virginia
  • Anita Nikolich, Director of Research and Technology Innovation and Research Scientist, University of Illinois, Urbana-Champaign
  • Tejas Patel, Defense Advanced Research Projects Agency
  • Melur K "Ram" Ramasubramanian, Vice President for Research, University of Virginia
1230 - 1300 Break/Lunch
1300 - 1430

Concurrent Breakout Sessions (10 participants per each breakout)

Break Out Groups

Breakout Group 1: Tijay Chung, Tom Barton, Molly  Buchanan, Myles Frantz, William Burke, Zain Shamsi

Breakout Group 2:  Sean Peisert, Alastair Nottingham, Daphne Yao, Alireza Sarmadi, Yixin Sun, Cheryl Washington

Breakout Group 3: Peng Gao, Anh Nguyen, Hao Fu, Salman Ahmed, Richard Biever, Alina Oprea, Yizhe Zhang

Breakout Group 4: Mark Gardner, Tanmoy Sarkar Pias, Jay Yang, Jason Hiser, Kent Wada, Hongning Dong

Questions and Issues To Be Considered

  1. What data and datasets can and should be collected to facilitate cybersecurity research?
  2. What resources are needed to collect, store, and analyze the data?
  3. What are the privacy, legal and ethical considerations?
  4. What are the considerations for sharing data across organizations?
  5. How is such an effort sustained?
  6. What are other key considerations and issues?
1430 - 1445 Break
1445 - 1600 Breakout reporting and wrapup

Thursday 7/29

The last day continues with additional breakout sessions and yield a final set of workshop recommendations.

Time Slot Agenda
1100 - 1130 Welcome, logistics, introduction and summary of Day 2
1130 - 1300 Concurrent Breakouts: Topics to be discussed based on Day 2 discussions
1300 - 1330 Break/Lunch
1330 - 1500 Breakout reporting and wrap-up
1500 - 1510 Break
1510 - 1600

Summary of key issues
Recommendations, next steps
Final report and recommendations capture