Week 2
Week 2 (June 26th - June 30th)
June 26th:
Starting the next week I continued reading the two articles Vicente gave me to start with. This included the article from last week plus the second article called DataComp: In search of the next generation of multimodal datasets. Here are some notes from my reading:
- Recent advances in multimodal learning include CLIP, DALL-E, Stable Diffusion, Flamingo, and GPT-4, enabling impressive capabilities in zero-shot classification and image generation.
- These models rely on large datasets containing image-text pairs, often with billions of examples, far surpassing previous datasets like ImageNet.
- Limited transparency in the design of these datasets hinders progress in multimodal learning.
- The authors introduce DataComp, a benchmark for multimodal dataset design.
- DataComp challenges researchers to propose new training sets while keeping training code and computational resources constant.
- Evaluation involves a suite of diverse classification and retrieval tasks.
- DataComp focuses on two key challenges: data source selection and filtering.
- CommonPool, a dataset of 12.8 billion image-text pairs, is used for the filtering track, and another track allows participants to bring their own data sources.
- Smaller, more rigorously filtered datasets can outperform larger, unfiltered ones, emphasizing the importance of dataset design.
- DataComp-1B, a new state-of-the-art multimodal dataset, is achieved through promising filtering techniques, leading to improved model performance.
- The authors release their candidate pool index, tooling, filtering baselines, and code publicly, aiming to advance empirical research in dataset design and promote the next generation of multimodal datasets.
June 28th:
After going through the articles I had to set up my computer to be able to use the servers again as well as anything else that I needed for the duration of my internship. I also added the GitHub repository I used last year to my server. Since being here last Summer some of the PhD students in the Vislang lab have condensed or updated my code. Therefore, I went through it all again to familiarize myself with the code.
Aside from my work, I started talking to the other intern Sindhu about what I had done over the weekend. Since her parents were in town, they also explored Houston over the weekend. This past weekend I went to the Houston Zoo where students get in for free, it was definitely an experience. I think sometimes it’s hard to understand the ethical concerns of certain establishments. Moreover, the other REU students and I went to the Color Factory which was extremely fun. We took so many cute photos and it was nice making friends with other people my age on the Rice University campus even if they were in the Chemistry REU and not Computer Science. Here are some pictures from the weekend:
June 30th:
Today, was a very productive day, I got a lot of work done. I went through all of my old code and started pulling images from the internet for my new datasets. After meeting with Vicente yesterday we decided that we would expand my United States Politicians Benchmark into the Public Figure Benchmark. This means that I am going to create more datasets of other Public Figures to expand the range of my Benchmark. I started looking at the top 100 actors. For this, I found an IMDB list of The Top 100 Actors Working in Hollywood Today.
After working, Vicente took the Vislang lab out to lunch, this included the 2 PhD students Jefferson and Ernesto, and Sindhu the other intern through DREU. We went to a Mexican restaurant named Cucharita which was really good. I loved the atmosphere and it was nice talking to the lab outside of the office. Here are some pictures of the dishes:
TO-DO FOR NEXT WEEK
- Download images for new Public Figure Datasets
- Begin running the old tests on your new images from the Dataset
- Have the Base Benchmark print out individual scores per person