This a continuation of previous work done in the godot-dodo project (https://github.com/minosvasilias/godot-dodo), which involved finetuning LLaMA models on GitHub-scraped GDScript code.
Starcoder performs significantly better than LLaMA using the same dataset, and exceeds evaluation scores of both gpt-4 and gpt-3.5-turbo, showing that single-language finetunes of smaller models may be a competitive option for coding assistants, especially for less commonplace languages such as GDScript.
The twitter thread also details some drawbacks of the current approach, namely increasing occurences where the model references out-of-scope objects in its generated code, a problem that worsens as the amount of training epochs increases.
Starcoder performs significantly better than LLaMA using the same dataset, and exceeds evaluation scores of both gpt-4 and gpt-3.5-turbo, showing that single-language finetunes of smaller models may be a competitive option for coding assistants, especially for less commonplace languages such as GDScript.
The twitter thread also details some drawbacks of the current approach, namely increasing occurences where the model references out-of-scope objects in its generated code, a problem that worsens as the amount of training epochs increases.