添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a simple program that allocates an unsigned __int64 (8 bytes on the stack) and then attempts to register that memory on the GPU using cudaHostRegister. The section of the program making this call is shown below:

unsigned __int64 mem;
unsigned __int64 *pMem = &mem;
cudaError_t result;
result = cudaHostRegister(pMem, sizeof(unsigned __int64), cudaHostRegisterMapped);
if(result != cudaSuccess) {
    printf("Error in cudaHostRegister: %s.\n", cudaGetErrorString(result));
    return -1;

I am compiling in Visual Studio 2010 Premium using the nvcc flags compute_11 and sm_11, and everything works correctly on my laptop running a Quadro K1000m with a cuda capability version of 3.0.

I recently switched to my desktop where I tried running with a GeForce 8600 GT and a GeForce 9500 GT, both of which have a cuda capability version of 1.1.

According to NVIDIA's documentation for cudaHostRegister, cards with a cuda capability of 1.1 and above should allow the use of cudaHostRegisterMapped:

cudaHostRegisterMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer(). This feature is available only on GPUs with compute capability greater than or equal to 1.1.

After some searching, it seemed that cudaHostRegisterMapped may require page-aligned memory. I thought that may be the difference between my 3.0 card and my 1.1 cards, so I masked off the address to get a page-aligned address and used the size of a page (4096 bytes) in the size field, as shown below:

unsigned __int64 mem;
unsigned __int64 *pMem = &mem;
unsigned __int64 memAddr = (unsigned __int64)pMem;
cudaError_t result;
pMem = (unsigned __int64 *)(memAddr & 0xFFFFFFFFFFFFF000);
result = cudaHostRegister(pMem, 4096, cudaHostRegisterMapped);
if(result != cudaSuccess) {
    printf("Error in cudaHostRegister: %s.\n", cudaGetErrorString(result));
    return -1;

This code also works on my 3.0 card, but fails with the same result as before on my 1.1 cards. The cudaHostRegister function returns with the error cudaErrorInvalidValue, indicating that:

one or more of the parameters passed to the API call is not within an acceptable range of values

I haven't been able to find much more about why this function might fail like this. Thanks for any help anyone can provide.

[Edit] Based on talonmies response, I verified at least one of my cards (9500 GT, I didn't run it on the 8600 GT) does support memory mapping according to NVIDIA's deviceQuery executable that comes with the SDK.

Your alignment code looks risky to me. Sure that the resulting address is really always yours? Seems to me you'd need at least mem[128] to be on the safe side. – Dude Sep 7, 2012 at 8:53 SM 1.1 and later hardware can do mapped pinned memory, so the mapped flag is valid for all GPUs except the original G80 (GeForce GTX 8800). If the call passes when you remove that flag, report it as a bug to NVIDIA. What are the platform differences, if any, between the two machines? if mem is on the stack, some platforms don't support page-locking that memory. – ArchaeaSoftware Sep 8, 2012 at 16:38 @ArchaeaSoftware: The call does indeed pass when I change the flag to cudaHostRegisterPortable instead of cudaHostRegisterMapped. Both systems (a laptop with a Quadro K1000m and a desktop with a 9500GT) are running Windows 7 Ultimate x64 SP1. The desktop has an AMD processor while the laptop is running Intel, if that is significant. I will report this as a bug to NVIDIA; thanks for your help! – fortenbt Sep 10, 2012 at 17:43

Mapped memory is supported on some compute capability 1.1 devices, but not all of them. The MCP79 family of integrated chipsets (so Ion, and 9300M/9400M) do support mapped memory. Older compute capability 1.1 devices like your 8600GT and 9500GT, however, do not support mapped memory.

You can check for this programmatically using the cudaGetDeviceProperties API call; canMapHostMemory will tell you whether a given device supports mapped memory or not.

Using the deviceQuery executable provide with the SDK, the result for the 9500GT shows: Support host page-locked memory mapping: Yes Is this not the canMapHostMemory field? – fortenbt Sep 6, 2012 at 16:46 I just verified in the source of deviceQuery the field shown for Support host page-locked memory mapping is the canHostMemory device property. – fortenbt Sep 6, 2012 at 16:54 So cudaHostRegister() fails on the same 9500GT that returns canMapHostMemory = true? If so, sounds like a bug that you should report. (1) Go to nvidia.com/content/cuda/cuda-toolkit.html (2) If you have an existing NVdeveloper account (e.g. via partners.nvidia.com) click on the green "Login to nvdeveloper" link on right half of the screen (otherwise click "Join nvdeveloper" to apply for new account) (3) Log in at prompt with your email address and password (4) In side bar on left, click third link from top titled "Bug Report" (5) Fill in bug reporting form and submit – njuffa Sep 7, 2012 at 3:39 @talonmies: All SM 1.1. and later hardware, including 8600GT and 9500GT, support mapped pinned memory. The only CUDA-capable GPU that does not support mapped pinned memory is G80. – ArchaeaSoftware Sep 8, 2012 at 16:41 @ArchaeaSoftware: pinned + mapped = "zero copy" doesn't it? Pre Ion/GT200 devices definitely do not support zero copy. – talonmies Sep 9, 2012 at 7:14

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.