Introduction
The GemmaPro model is based on Gemma3 4B, using Q4 quantization technology. After weight reorganization, the model size is only about 2.5 GB, requiring just 4 GB RAM to run smoothly. This means most office computers or high-end phones have the potential to use it.
Any GPU with 4GB or more VRAM can fully load the model. On computers without a GPU, the model can achieve 7-8 tokens per second in inference speed, which is close to normal human speaking speed, making conversations feel smooth without noticeable delay. When using a GPU, the speed is even faster.
GemmaPro Model Features
- Small size and low resource requirements
- Enhanced thinking ability and logical performance
- Better answer depth and logical performance compared to original Gemma3 4B
- Model supports both normal mode and inference mode
- Smooth handling of Chinese language tasks
Model Download Location
Hugging Face - SciMaker/GemmaPro