AST and semantic errors
Jun 25, 2024 - ⧖ 2 minOne of the problems we still face in the development field is the learning curve involved in learning a programming language. This barrier often exists because our tools are not designed for teaching, but rather optimized as professional tools. My idea is to write a bit about compilers and about how new programmers interact with their tools.
In my master's thesis, I am exploring the idea of improving Java error messages—improving in the sense of creating more context and helping developers understand where they went wrong and why it is an error. When we talk about professional developers, this may not make much sense because it could slow down the process. However, for students, we are not dealing with large codebases, so the focus is educational.
Not all errors are the same
This work involves many compiler concepts, since it is in this process that we want to expand error messages. This concern is not new, and we can see some advances in languages like Rust, but it requires a lot of work in terms of tooling for us to move forward in other languages. The first concept is to understand the different types of errors. When we talk about syntactic errors, these are the ones that prevent the lexer from completing the parse, identifying either invalid structures or incorrect words in the grammar.
However, there is a category that causes more confusion: semantic errors. These occur when what you wrote is valid according to the grammar but does not make much sense in practice. For example, assigning a number to a text variable—you used all the tokens correctly, but it does not make sense within the type system. These errors usually have very localized and direct error messages that, for more experienced developers, make sense (though not always as much as they should), and developers tend to accept them. Another example is using an integer variable as a parameter for a function that expects a floating-point value. An experienced developer will likely check the variable definition (or change the function and see if it works).