Data-informed code generation

Jun 08, 2025

Here’s an idea for a next-gen coding assistant – make it act like a Python-based query engine over my dataset.

When writing analytics code, it’d be far more useful if the assistant could reference the dataset itself (at least a decently-sized sample). That way, its suggestions would go beyond generic completions, and account for feature distributions, and even spot issues in the data, like missing values or unnormalized values.

Even better: for modeling workflows, imagine if you could feed it evaluation outputs, e.g. PR curves, confusion matrices, or any relevant metric, and it could tailor suggestions accordingly. That’s something I’d gladly pay for!

Jack on ML leadership

Data-informed code generation