Summer 2024 Research Projects – Computer Science

Below you will find a list of faculty that will be conducting research during the Summer of 2024 and are looking for research students.

Methods for Analyzing Tumor Evolutionary Trees (Layla Oesper)
Trust and Expertise Modeling in Tech Support Relationships (Amy Csizmar Dalal)
Developing a 1/10-th scale Autonomous Vehicle (Tanya Amert)
Exploring the Impact of Critical-section Granularity when accessing Shared Resources (Tanya Amert)
Understanding Dispute Resolution and Community Stature on English Wikipedia (Sneha Narayan)

Descriptions of the projects are below:

Methods for Analyzing Tumor Evolutionary Trees (Layla Oesper)

2 Students, 6 weeks, start date is likely to be July 8, may not be combined with SLAI

Cancer is a disease resulting from the accumulation of genomic alterations that occur during an individual’s lifetime and cause the uncontrolled growth of a collection of cells into a tumor. These mutations occur as part of an evolutionary process that may have begun decades before a patient’s diagnosis. Better understanding about the history of a tumor’s evolution over time may yield important insight into how and why tumors develop as well as which mutations drive their growth. While recent algorithmic progress has led to improved inference of tumor evolutionary histories as a type of rooted tree, this is still a very challenging task.

Students will work on one of two possible projects, depending on student interest.

Project 1: Exploration of Tumor Tree Space

The rooted trees that can be used to describe the evolution of a tumor are a particular type of labeled, rooted tree. In this project you will explore the space of these trees using both empirical and theoretical analysis. Specific tasks could include learning about algorithms to enumerate these trees and then implementing these methods in order to create several datasets. We will then do thorough analysis of these datasets including exploration of how features of these trees change across the space of all trees in each dataset.

Project 2: Tumor Evolution Visualization

Students will join an existing project to build a web-based tool for visualizing and comparing tumor evolution trees. Student tasks may include modifying the code base to implement needed updates, and the design and implementation of new features or modules including tree layout algorithms. Students may also take part in analysis of the tool including curation of datasets and case-studies, potentially to be used as part of user studies. This project is being done as part of an ongoing collaboration with Professor Eri Alexander.

Ideally, students should be available to participate in an independent study during the spring of 2024 to read papers, familiarize themselves with related tools/concepts, and have discussions to begin planning the project. Applicants should have completed at a minimum CS 201 by the end of Spring term 2024. Students who have taken Computational Biology (CS 362), Algorithms (CS 252), or Bioinformatics & Genomics (BIOL 338) are also strongly encouraged to apply. No specific biology background is expected or required, just an interest in applying computational techniques to important biological problems. Please indicate which of the two projects you are most interested in when applying.

Trust and Expertise Modeling in Tech Support Relationships (Amy Csizmar Dalal)

2 students, 6 weeks, June 10-July 19. Can be combined with SLAI.

Prerequisites: CS 201 completed by the end of Spring Term.

When the WiFi stops working, the printer stops responding, or we can’t remember that particular Python command syntax, we think nothing of going online to consult any number of online sources. We Google, we visit StackOverflow, we check the ITS website to see if a service is down. Maybe we even submit a ticket to ITS or call the HelpDesk. Rarely, however, do we ask ourselves as we’re doing so: why do I trust this particular source to give me the correct answer to my problem?

When we access these online (or on-phone) resources, we trust that someone (the Google algorithm, people hanging out on StackOverflow, ITS workers) will be able to understand our problem as it’s phrased, even if we don’t know the exact technical terms to use. We trust that the responses are accurate, useful, understandable to us, and not malicious.

There is a complex and implicit negotiation that happens in these interactions. Whom do we, as help-seekers, deem to be a trustworthy expert? Which help-seekers do help-givers deem worthy to take the time and energy to assist? What cues can we use from written language to discern how this negotiation happens, and when it’s successful? How do help-seekers and help-givers arrive at common ground so that they can understand each other? These are all questions I aim to answer in my research, and that you’ll be working with me to answer this summer!

The Project:

You’ll be working with a dataset containing 10 years’ worth of Carleton ITS HelpDesk troubleshooting tickets (2009-2019) to determine how help-seekers and help-givers assess each others’ technical expertise and establish trust. This summer, you’ll be working on one of the following questions:

1. What can duplicate questions tell us about technical expertise? What kinds of questions, or issues, do Carleton students, faculty, and staff ask most often? How have these questions changed over time? What does this tell us about the tech-savviness of the Carleton population? How do ITS workers respond to these common questions? How have their responses changed over time? Can we extract a set of best practices for responding to these questions for ITS?

2. How do ITS workers and help-seekers categorize incoming tickets? What ticket categories are most commonly used? Can we reconstruct relationships between categories? How often are tickets “miscategorized”? What categories serve as “catch-all” categories? What can ticket categorization tell us about technical expertise, both from help-seekers and ITS staff?

We’ll be using a combination of quantitative and qualitative analysis methods, statistics, and natural language processing libraries to answer these research questions.

Ideally, students should be available to participate in a 1- or 2-credit independent study Spring Term to get up to speed on the project, read relevant papers, and plan for the summer.

Developing a 1/10-th scale autonomous vehicle (Tanya Amert)

2 students, up to 8 weeks, dates yet to be determined, may be combined with SLAI

Prerequisites: CS 201 completed by the end of Spring Term.

Self-driving capabilities are being continually added to real vehicles. Many of these capabilities rely on algorithms that are designed in isolation. The task of integrating diverse functionality onto a single vehicle can pose a challenge, especially when such a vehicle has limited size, weight, and power. This challenge is amplified when we impose timing constraints in the system; certain tasks have tighter deadlines or higher priorities, and therefore the scheduling of the various computations becomes a key factor in safety.

Small-scale vehicles provide a viable avenue to explore this complexity: they are less expensive than their full-size counterparts, and are subject to even more stringent size, weight, and power limitations. However, the same driving approaches can be applicable in a smaller setting, allowing us to test different algorithms and scheduling considerations in a cheaper, smaller, and safer context.

This summer, we will be developing our own 1/10th-scale autonomous vehicle. At the beginning, we will have a vehicle and computer, and we will need to get them communicating to enable the car to perform simple tasks (e.g., driving in a straight line) before moving onto more complex navigation challenges. We will use tutorials and other materials provided by f1tenth.org as our starting point. With the aim of developing a vehicle that can traverse an unknown track, students will explore more deeply in several areas:

Perception: computer-vision and other techniques to allow the vehicle to perceive the environment
Planning: mapping out the track and choosing a specific route to minimize lap time or satisfy some other optimization metric
Scheduling: changing the priorities of various tasks in the system to impact their respective response times
GPU use: using existing tools to monitor whether the graphics processing unit (GPU) is being used as effectively as possible
Other areas that seem interesting!

Students working on this project will be exposed to a wide array of computer science topics, including computer vision, robotics, real-time systems, and parallel computing. However, no prior experience in these topics or in hardware is required or expected. At a minimum, applicants should have taken CS 201. Courses such as CS 208 and CS 252 or CS 257 are helpful but not required. Additionally, experience in the Maker Space is convenient, but not necessary.

If interested, students working on this project could be involved in a 1-credit independent study during the Spring 2024 term to help with assembling the physical car.

Exploring the impact of critical-section granularity when accessing shared resources (Tanya Amert)

1 student, up to 8 weeks, dates yet to be determined, may be combined with SLAI

Consider a shared resource, for example a hardware resource like a graphics processing unit (GPU) or a software resource like a shared queue. Safety requirements may necessitate that only one task can access a shared resource at a time; safety can be guaranteed through a mutual-exclusion locking protocol (i.e., with a “mutex”). A task can only access the resource by requesting it via the mutex; if another task is using the resource (i.e., “holding the lock”), the new requestor must wait.

A typical assumption is that any time spent holding a lock must be short to avoid unnecessary waiting. However, in a system in which tasks have different deadlines, it is more important to meet deadlines than to be fast. A question then arises: are there situations in which a task could combine multiple accesses into a single “lock request”, potentially forcing other tasks to wait longer, with all tasks still meeting their deadlines? Are there task sets for which deadlines can only be guaranteed to be met if such accesses are combined? (Spoiler: yes!)

Early work is already ongoing for simple uniprocessor systems in which tasks have fixed priorities. However, there are several future directions for extending this work:

uniprocessor systems with priorities dynamically assigned based on current deadlines, rather than statically assigned for each task
multiprocessor systems, for either static or dynamic priorities
optimal solutions to any of the above by posing the problem as an integer linear program

A student working on this project will be actively involved in work that is planned for submission to a future real-time systems conference. This means helping design the algorithms, run experiments, and write up results. As such, prior experience in CS 202 is required, and experience in CS 252, CS 332, or CS 348 would be helpful but not strictly necessary.

Ideally, a student working on this project would be available to participate in a 1-credit independent study or reading group during the Spring 2024 term to read papers, learn about the experimental approaches in this work, and have early conversations planning for the summer.

Understanding Dispute Resolution and Community Stature on English Wikipedia (Sneha Narayan)

Start date: first week of July 2024 Duration: 8-10 weeks 2 students. Cannot be combined with SLAI

Wikipedia exemplifies the power of collaboration on the internet, and demonstrates how large groups of people can work together to produce important shared resources. As in any large collaborative effort (and even many smaller ones), disputes inevitably arise among Wikipedia’s editors. Imagine you’re an editor updating an article on Wikipedia, when a different editor stops by, decides they don’t like what you’re doing, and starts harassing you. On platforms like Instagram, you might have the ability to report hostile posts, but in volunteer-run communities like Wikipedia and many sub-reddits, community management and conflict moderation is handled largely by experienced users.

On English Wikipedia, the thorniest disputes among editors are adjudicated by an elected group of Wikipedians known as the Arbitration Committee (ArbCom). Along with input from other active editors, Arbcom does the work of interpreting and applying Wikipedia’s complex system of policies and principles to determine how to sanction users that break the rules (sort of like how lawyers and judges interpret and apply the law).

Last summer, my research students assembled a dataset of 1,483 proposed arbitration cases that occurred between 2004 and 2023 on English Wikipedia, with the goal of understanding how a Wikipedian’s stature within the editing community (in terms of their role, duration of participation, and number of edits contributed), impacts the dispute resolution process. We have some initial findings that show that Wikipedians who participate in ArbCom deliberations are dramatically more experienced than the average Wikipedia editor, and that the stature of a user filing a case to be heard by ArbCom appears to be strongly associated with the likelihood of ArbCom accepting their case.

This summer, I hope to work with students to continue analysis on this dataset, and focus on understanding the dynamics of retention and turnover among editors participating in such dispute resolution processes. Students interested in working with me should have taken CS classes at least through CS201 Data Structures. I’m especially interested in working with students who have taken a class or two in statistics, have strong writing and data presentation skills, and are interested in engaging in academic literature from sociology, communication, and political science in addition to computer science. Additionally, if you have any experience moderating or managing online communities of any sort, please mention this in your application (though this is not a requirement for applying).