What do Tensor and Neural cores mean?

farooqkz · June 20, 2024, 8:52am

Hello. According to specs of Alta, it has 1 general purpose core, 4 tensor cores and 8 neuro cores. Can someone please explain to me what do these mean? Which operations/instructions can each execute? Which ones are only for inference and which ones can be used for training as well?

Thank you in advance.

charlie · June 22, 2024, 4:52am

Alta’s processor is Amlogic’s A311D, which adopted NPU from VeriSilicon’s Vivante VIP9000 series.

I don’t know exactly what are the cores described in Alta’s spec, but from the hardware architecture diagram of VIP9000, they might be Programmable Engine (Parallel Processing Units) and Neural Network Engine (NN Cores and Tensor Processing Fabric).

I couldn’t find more detailed information of VIP9000, but from slides for a webinar by VeriSilicon, I could find information about its predecessor VIP8000:

Programmable Engine

128-bit vector processing unit (shader)
OpenCL shader instruction set
Enhanced vision instruction set (EVIS)
INT 8/16/32b, Float 16/32b

NN Engine

Convolution and inner products
128 MACs/cycle per core
INT8 or Float16

Tensor Processing Fabric

Data shuffling, normalization, pooling/unpooling, LUT, etc.
Network pruning support, zero skipping, compression
Accepts INT8 and Float16 (Float16 internal)

A311D’s NPU is known to have 768 INT8 MACs (multiply–accumulate) or 384 INT16 MACs per core. You can google (confidential ) A311D datasheet.

farooqkz · June 22, 2024, 9:09am

Hello. Thanks for your reply. The problem is not with information for me. I am no NN engineer. I don’t know what do these mean(Tensor, Neural, etc). Which ones are used for training and which ones only for inference?

charlie · June 22, 2024, 3:10pm

All the parts of NPU will be used for both training and inference. (I’m not sure how Programmable Engine will be used.)

What is more needed for training is memory for storing weights, activations, gradients, and gradient moments in the course of backpropagation. Inference needs memory for weights only because it doesn’t do backpropagation.