When I look at the programming languages available today, it seems that all of them try to optimize execution speed or developer time at the expense of the other. For example, although compiled C code is extremely fast, it can take many more programmer hours to write a robust application in C than in a higher-level language. On the other side of the coin, a moderately complex problem can be solved by a Ruby programmer in a few minutes, but the resulting code is executed slowly, running on an interpreter (soon to be a VM, but still slower than native code) and requiring garbage collection and lots of runtime checking. Are these sacrifices necessary? I don’t think so. How is it possible to make a language that simultaneously optimizes execution speed and developer time? I believe the answer lies in static code analysis, particularly at the compiler level.
(To get a brief overview of my language ideas, see my five-point summary of a better language below.)
You may be thinking: “It takes too long to compile code; I’d much rather run an interpreter because I can develop faster.” I agree: when you are developing code, it is faster to use an interpreter. Furthermore, you can benefit from runtime safety checks that would be absent in native code. But when you are ready to distribute your code, it is much better to compile it. Unfortunately, current language implementations tend to focus on a native compiler or an interpreter/bytecode compiler, not both. For example, there are a plethora of different C compilers, but few widely-used C interpreters. The same is true for traditionally interpreted languages and their interpreters: Ruby, Python, and Java programs are seldom distributed as native code binaries because no widely-used compilers exist. A language that has both a well-supported interpreter and a compiler (ideally having reference implementations for both) allows developers to use the language both for quick development and fast execution on shipped binaries.
Then why don’t we just make a really good interpreter for C or a really good native code compiler for Ruby, Python, Java, etc.? The answer to the first part is because C programs take a long time to develop primarily due to the language’s design, not compile time. The answer to the second part is that often a true native code compiler is impossible (ie. in a language with eval that accepts arbitrary strings) or, if it is possible, the native code is necessarily slower than it would be for a functionally-equivalent C program because of language-required runtime checks and garbage collection.
To me, a better programming language would optimize both execution speed and developer time and would be easily compilable to native code and interpretable by design. As I mentioned early, I believe that these goals can be largely achieved by compiler-level static code analysis. My focus is on compiler implementation of a language rather than interpreter implementation because there are already many interpreted languages that optimize developer time well.
I probably won’t have time to implement any of my ideas in this area for a while so for now I’ll just enumerate some of them in this and upcoming blog posts. Through this, I hope to start a conversation about how we can overcome some of the traditionally-seen issues with truly compilable languages. Solutions to such issues (which I hope to write about in future posts) include:
- time- and memory-efficient automatic dynamic memory allocation, particularly with growing and shrinking objects like strings, arrays, and hashtables
- optimized reference counting using properties known at compile-time to reduce the number of updates required to reference counters
- compile-time checks and optional runtime checks for bad behavior such as accessing an out-of-bounds array index or dereferencing a null pointer
The above list is not by any means exhaustive. Feel free to comment on other useful properties of developer-friendly languages that are difficult to put into a compilable language, hard to implement in a time- and speed-efficient manner, or just seldom seen in a compilable language for whatever reason.
I am also interested in designing the language to allow for easier compiler optimizations to native code. For example, using a foreach (PHP/Perl) or each (Ruby) construct to let the compiler parallelize operations or split operations along cache lines. Such optimizations are not possible to do safely in C because you cannot express “do this operation on each of these elements in any order”. Instead, you must use a loop construct, which means “do these operations in this order”, restricting the meaning of your operation further than necessary, thus reducing the ability to optimize.
To summarize, here are some of the main properties I would like to see in a better programming language:
- Compilable into native code that is as fast or faster than functionally-equivalent C code
- Optional runtime safety checks in the interpreter and inserted into compiled code by the compiler that can be used for developing but removed (through a flag) for speed
- Easy-to-use constructs (such as strings, regular expressions, and hashtables) integrated into the language (like in Ruby and others)
- Very strict typing to increase the number of safety guarantees the compiler can provide
- Interpreter considerations, including the ability to automatically link with compiled libraries
To be fair, most of these will require longer compile times. However, by ensuring an interpreter is available, this will not be an issue as the code will only need to be compiled occasionally (and perhaps only by an automated build system so the developer will never have to wait).
I realize that I have not provided any empirical evidence that any of the current interpreted/bytecode-compiled language implementation are significantly slower than native code implementations. If I had the time and resources, I would research this myself. I encourage readers of this post to provide pointers to existing research or to perform their own original research to show where major speed improvements can be made over existing interpreted/bytecode-compiled implementations.
I hope that my suggestions and brief explanations have shown you that optimizing a language for execution speed and ease-of-use does not have to be zero-sum. With some clever designing, I believe that we can have native code speed benefits with the developer-friendliness of today’s interpreted languages.