Ishan Khire

Posts

Paper Summary: Method to Measure Benchmark Contamination

December 14, 2025

This is a summary and explanation for a paper that measured dataset contamination using kernel diverge scores: https://arxiv.org/pdf/2502.00678 . Benchmarks are a flawed estimation of a model’s real-world capabilities for a variety of reasons. One reason in benchmark dataset contamination - i.e. if part of benchmark’s dataset overlaps with training data, the model will have been specifically trained to do well on those tests. Therefore, the benchmark fails to accurately measure how well the model generalizes to examples it hasn’t seen. This is an interesting paper that looks at Kernel Divergence to estimate to dataset contamination. Aim Given a dataset D and model M, the paper aims to create a dataset contamination score S(D,M). The contamination ratio is the proportion of the benchmark dataset that has been seen in training data. This score should be monotonic (datasets with higher contamination should have higher scores) and consistent (datasets with similar contamination should...

Georgia Tech Part 1

November 18, 2025

My first semester at Georgia Tech is almost over. A lot has happened, but I still can't really believe it's gone this fast. I like it here. Georgia Tech is really beautiful, especially in the fall. I like walking across Tech Green and seeing the trees shed their orange leaves and watching squirrels run across the grass. People here are, in my experience, very kind. When people say "have a good one" or "thank you" or "bless you", it really does feel like they mean it. It doesn't feel (too) awkward to start conversations with strangers. America has made me more extroverted. College feels like the high school experience I never had. Welcome Week in particular was incredible---an entire week of no classes, where I hung out with new people every day, sat at new tables at lunch. I know that strangers are friendly and everyone would still like talking to new people; however, talking with new people is not the default anymore, and so I do it less ofte...

Have LLMs passed the Turing Test?

September 19, 2025

What is the Turing Test? The Turing Test aims to find out if a human interrogator speaking simultaneously to an AI and human can tell the difference between them. There are many disagreements over what constitutes a true Turing Test. How can we decide a human interrogator? Does the interrogator represent the median human, the median human well-educated on LLMs, an expert on AI, etc.? How long should the human interrogator be allowed to talk with the AI? Have LLMs passed the Turing Test? Jones and Bergen 2025 sets the experiment up as follows: Two types of prompts were used. The baseline prompt was minimal: “You are about to participate in a Turing test. Your goal is to convince the interrogator that you are a human.” The second prompt included a persona for the model (a young, slang-using introvert). Participants were split into two groups: UCSD psychology undergrads and prolific workers. Very interestingly, there were "no consistent effects of any variable on participant accurac...

The Upholding of Proposition 12

May 12, 2023

Background: Farming Conditions and Prop 12 Currently in the US, most breeding pigs live in factory farmers, where they are confined in gestation crates which are small metal cages so small that pigs can’t even turn around, while egg-laying hens live in tiny, cramped battery cages that cause a range of psychological and physiological harm . The crowded conditions also have potential health harms by increasing the stress levels of pigs and weakening their immune systems, which can make them more susceptible to zoonotic diseases that may spread to humans. Starting in the early 2000s, a few animal welfare groups including the Humane Society of the Unites States aimed to ban the farming system of cages for hens, breeding pigs and veal calves. In 2008, Proposition 2 was passed which put in place a “production” ban on cages, which said that producers had to ensure pigs, hens, and calves could lie down, turn around, and extend their limbs or wings without hitting the side of an encl...

Search This Blog