How I program with LLMs

-



The second issue is we can do better. I am happy we now live in a time when programmers write their own tests, but we do not hold ourselves to the same standards with tests as we do with production code. That is a reasonable trade-off; there are only so many hours in the day. But what LLMs lack in arithmetical prowess, they make up for in enthusiasm.

Let’s ask for an even better test.

In the tests, implement the simplest, most readable version of the standard code for quartiles over a fixed set of known values in a slice. Then pass the test cases through the standard code and the reservoir sampler and confirm they are within an epsilon of each other. Structure the comparison code such that it can be used in a fuzz test, too.

This got us some new test code:

// referenceQuartiles calculates the exact quartiles for a slice of float64 values  
// using linear interpolation, matching the behavior expected from the sampler.  
func referenceQuartiles(data []float64) (q1, median, q3 float64) { … }

// compareQuartiles checks if two sets of quartiles are within epsilon of each other.  
// Returns true if they match within the tolerance, false otherwise.  
func compareQuartiles(q1a, meda, q3a, q1b, medb, q3b, epsilon float64) bool { … }

// checkQuartiles is a test helper that compares sampler output against the reference  
// implementation and reports any differences.  
func checkQuartiles(t *testing.T, data []float64, epsilon float64) {  
   t.Helper()  
    
   // Get reference values  
   wantQ1, wantMed, wantQ3 := referenceQuartiles(data)  
    
   // Get sampler values using a large reservoir for accuracy  
   qs := NewQuartileSampler(1000)  
   for _, v := range data {  
       qs.Add(v)  
   }  
   gotQ1, gotMed, gotQ3 := qs.Quartiles()  
    
   if !compareQuartiles(gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon) {  
       t.Errorf("Quartiles mismatch:\ngot  (q1=%v, med=%v, q3=%v)\nwant (q1=%v, med=%v, q3=%v)\nepsilon=%v",  
           gotQ1, gotMed, gotQ3, wantQ1, wantMed, wantQ3, epsilon)  
   }  
}

The original test from above has been reworked to use checkQuartiles, and we have something new:

func FuzzQuartileSampler(f *testing.F) {  
   // Add some seed corpus  
   f.Add([]float64{1, 2, 3, 4, 5})  
    
   f.Fuzz(func(t *testing.T, data []float64) {  
       // Use a larger epsilon for fuzzing since we might get more extreme values  
       checkQuartiles(t, data, 0.2)  
   })  
}

This is fun because it’s wrong. My running gopls tool immediately says:

fuzzing arguments can only have the following types:
    string, bool, float32, float64,
    int, int8, int16, int32, int64,
    uint, uint8, uint16, uint32, uint64,
    []byte

Pasting that error back into the LLM gets it to regenerate the fuzz test such that it is built around a func(t *testing.T, data []byte) function that uses math.Float64frombits to extract floats from the data slice. Interactions like this point us toward automating the feedback from tools; all it needed was the obvious error message to make solid progress toward something useful. I was not needed.

Doing a quick survey of the last few weeks of my LLM chat history shows (which, as I mentioned earlier, is not a proper quantitative analysis by any measure) that more than 80 percent of the time there is a tooling error, the LLM can make useful progress without me adding any insight. About half the time, it can completely resolve the issue without me saying anything of note. I am just acting as the messenger.



Source link

Latest news

CBP Searched a Record Number of Phones at the US Border Over the Past Year

The recent spike in searches at the border has mostly been driven by an increase in the last...

Disrupt 2025: Day 2 | Tech Zone Daily

It’s time for day two of Tech Zone Daily Disrupt 2025, which means San Francisco’s Moscone West will...

From Robot Vacuums to Dysons, Here Are All the Best Vacuum Cleaners We’ve Ever Tried

Comparing Our Favorite Vacuum CleanersHonorable MentionsBlack and Decker Dustbuster Flex for $89: This is another cool handheld vacuum...

Adobe Now Lets You Generate Soundtracks and Speech in Firefly

Adobe hasn’t shared a specific date yet, but Firefly Image Model 5 will launch “in the months to...

China Dives in on the World’s First Wind-Powered Undersea Data Center

The project is environmentally sustainable in other ways. More than 95 percent of its electricity comes from offshore...

The Alienware 16X Aurora Is My Favorite Alienware Laptop in Years

But with the 115-watt RTX 5060 in the Alienware 16X Aurora, you can expect to play most games...

Must read

You might also likeRELATED
Recommended to you