cuda synatx
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#execution-configuration
https://blog.csdn.net/qq_39575835/article/details/83027440
1  | kernal_func<<< Dg, Db, Ns, S >>>(...);  | 
dim3 index
calc index
1  | int idx = (gridDim.x*gridDim.y*blockIdx.z+gridDim.x*blockIdx.y+blockIdx.x)*blockDim.x*blockDim.y*blockDim.z+ blockDim.x * blockDim.y * threadIdx.z + blockDim.x * threadIdx .y + threadIdx.x;  | 
as you can see, blockIdx is corrdinate in GridDim, threadIdx is corrdinate in blockDim, there are all dim3{int x,y,z;}