Our Story – RV16, Fast, Simple, Open

Building a Community

A couple of decades ago, a coworker built an 8-bit CPU from TTL logic and ported a small C compiler to it. Running at 4 Mhz, it could execute “Hello, World” and eventually even a Minix port! Even though the architecture was asymmetrical and by now, long in the tooth, it was still very exciting to watch its progress.

In recent years, building retro CPUs on FPGAs or in browser emulators has become common. But in the 2020s, there are far better options, and that’s exactly where the miniCore journey begins.

The minicore project was founded with a single vision: to document and share the complete journey of designing and implementing a modern RISC CPU — from RTL to C and MicroPython support. The target audience is hobbyists, students, educators, engineers, and makers.

While RISC CPUs dominate modern commercial designs, including the micro-cores inside Intel x86 processors, they are typically 32-bit or 64-bit designs, and the RTL is rarely open. RISC-V is an option, but with few exceptions it isn’t well suited for implementation on a small, low-cost FPGA like the Lattice iCE40UP5K.

With a clean 16-bit design, the entire RV16 stack – from RTL to software port – can be understood and shared without baggage.

Clean Slate Hardware Design

RV16 is a clean room 16-bit CPU design with an orthogonal ISA that is designed to run C and MicroPython. The RTL is designed to minimize FPGA LUT usage, while still providing high performance including 16×16 MUL using the built-in FPGA DSP block. It’s expected that RV16 will use between 2000 to 2500 LUTs, only about 30% more than a simple 8-bit CPU, but delivering roughly 2× the performance, leaving two to three thousand LUTs available for future iterations.

The original Unix was written on a 16-bit CPU (Digital Equipment’s PDP-11), and for embedded uses, a 16-bit design is the sweet spot – minimizing memory usage, while providing high performance.

Microarchitecture

RV16 uses a two-stage ALU (within the ID/EX pipeline stage) to minimize LUT usage, structured as a classic 3-stage pipeline: Instruction Fetch (IF), Decode and Execute (ID/EX), and Writeback (WB).

There is no data forwarding in the initial design, and stalls are fully exposed to the compiler, making the RTL design easily verifiable.

PIO

The PIO is controlled using MMIO registers, with no special I/O instructions that take up instruction encoding space. On the FPGA, PIO uses the FPGA built-in blocks and only consumes a couple hundred LUTs.

Note: The full PIO with dedicated instruction memory is planned for the ASIC (V2+); the FPGA implementation uses MMIO-controlled PIO via built-in blocks.

Memory

RV16 uses Harvard architecture to provide two 64K flat address spaces for instructions and data. Moreover, an architecture-defined 8-bit Data Page register extends the addressable data space to up to 16MB.

Initial release will include 32K for program and data, plus 16K for heap and stack, comfortable for most embedded uses.

RTOS-Ready by Design

The ISA is designed for future RTOS ports: an extendable interrupt vector table, SWI and FENCE instructions, and separate user and system stacks.

Software Stack

The ISA is designed to make LLVM backend port simple, including regular registers, support for LLVM-friendly bit-manipulation instructions (CLZ, CTZ, POPCOUNT, and BSWAP), and auto-increment addressing for fast C/C++ execution.

Unlike 8-bit processors, RV16’s flat 64K address map and 16-bit registers make MicroPython possible to run.

Release plans:

The RV16 design also allows for the companion-processor use case – fitting the gap where 32-bit is too large and 8-bit isn’t enough, with the FPGA fabric providing programmable I/O capability that goes well beyond what fixed-silicon solutions can offer.

V0 – Building the Foundation

ISA definition, initial RTL, and prototype LLVM/Clang toolchain support.

V1 – Metal to the Ground (Plane)

Prototype FPGA implementation.

V2 – Ready for the World

Refined ISA. FPGA on a prototype-friendly DIP-40 package. Programmable and powered through USB-C. full PIO engine, JTAG and expanded peripheral support.

V3 – Looking Beyond

ASIC-ready RTL validated through open PDK synthesis (OpenLane / Sky130). The RTL is designed for third-party adoption: by universities with shuttle programs, ASIC houses, or system integrators building custom SoCs. A self-funded tape-out is possible if the community gets there, but the primary goal is open, auditable silicon that others can build on.

Join Us

This is what my coworker’s TTL breadboard was pointing toward, decades earlier. The journey has just begun, and the best is ahead. Check out the blog for design updates, star the project on GitHub, and sign up for the mailing list to follow along as RV16 comes to life, moving from silicon to software.