Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have a simple program that allocates an
unsigned __int64
(8 bytes on the stack) and then attempts to register that memory on the GPU using cudaHostRegister. The section of the program making this call is shown below:
unsigned __int64 mem;
unsigned __int64 *pMem = &mem;
cudaError_t result;
result = cudaHostRegister(pMem, sizeof(unsigned __int64), cudaHostRegisterMapped);
if(result != cudaSuccess) {
printf("Error in cudaHostRegister: %s.\n", cudaGetErrorString(result));
return -1;
I am compiling in Visual Studio 2010 Premium using the nvcc flags compute_11 and sm_11, and everything works correctly on my laptop running a Quadro K1000m with a cuda capability version of 3.0.
I recently switched to my desktop where I tried running with a GeForce 8600 GT and a GeForce 9500 GT, both of which have a cuda capability version of 1.1.
According to NVIDIA's documentation for cudaHostRegister, cards with a cuda capability of 1.1 and above should allow the use of cudaHostRegisterMapped:
cudaHostRegisterMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer(). This feature is available only on GPUs with compute capability greater than or equal to 1.1.
After some searching, it seemed that cudaHostRegisterMapped may require page-aligned memory. I thought that may be the difference between my 3.0 card and my 1.1 cards, so I masked off the address to get a page-aligned address and used the size of a page (4096 bytes) in the size field, as shown below:
unsigned __int64 mem;
unsigned __int64 *pMem = &mem;
unsigned __int64 memAddr = (unsigned __int64)pMem;
cudaError_t result;
pMem = (unsigned __int64 *)(memAddr & 0xFFFFFFFFFFFFF000);
result = cudaHostRegister(pMem, 4096, cudaHostRegisterMapped);
if(result != cudaSuccess) {
printf("Error in cudaHostRegister: %s.\n", cudaGetErrorString(result));
return -1;
This code also works on my 3.0 card, but fails with the same result as before on my 1.1 cards. The cudaHostRegister function returns with the error cudaErrorInvalidValue
, indicating that:
one or more of the parameters passed to the API call is not within an acceptable range of values
I haven't been able to find much more about why this function might fail like this. Thanks for any help anyone can provide.
[Edit]
Based on talonmies response, I verified at least one of my cards (9500 GT, I didn't run it on the 8600 GT) does support memory mapping according to NVIDIA's deviceQuery executable that comes with the SDK.
–
–
–
Mapped memory is supported on some compute capability 1.1 devices, but not all of them. The MCP79 family of integrated chipsets (so Ion, and 9300M/9400M) do support mapped memory. Older compute capability 1.1 devices like your 8600GT and 9500GT, however, do not support mapped memory.
You can check for this programmatically using the cudaGetDeviceProperties
API call; canMapHostMemory
will tell you whether a given device supports mapped memory or not.
–
–
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.