Tips and Tricks For Writing NIFs

What is covered:

  • Iterating Through a List.
  • Filling an Array From a List.
  • Returning a C Pointer for Later Use.
  • Cleaning Up a C Pointer.
  • Normal and Dirty Versions of a nif.
  • Built in Hashing.

Background

After writing my exor_filter nif library, which was basically just a wrapper for the xorfilter_singleheader, I found that the official documentation and various guides on the internet lacked some good examples on how to easily do certain things such as: load a list into a nif, dirty and non dirty versions of function, and others. Here are some concepts that I picked up and hopfully they’ll be useful to a larger audience.

Big shoutout to to jiffy for being a great reference resource.

A good starting point is to follow the rebar3 guide for setting up nifs. These examples will use some of the functions defined there, like mk_error, which isn’t a standard function. Any function that returns an ERL_NIF_TERM should be assumed to have accompanying Erlang code that isn’t shown. Reference the linked rebar3 guide on what that should look like.

Note: Some of these examples print to std out using printf. I’m pretty sure that isn’t recommended to be done in production, but works for development just fine.

Iterating Through a List

I feel as though list iteration for a nif can be a bit fuzzy when looking at the documentation. Its actually pretty simple. This example expects a list of unsigned 64 bit numbers. The steps are as follows:

  1. Check to see if the first argument is a list.
  2. Iterate over the elements.
  3. Return an error iif a list element isn’t a uint64.
// Standard nif code above

// Nif function for iterating through a list and printing to std out.
static ERL_NIF_TERM
list_iteration(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[], int buffered)
{
   ERL_NIF_TERM is_list = argv[0];
   unsigned list_length;
 
   // Check that the arity of the function is correct.
   if(argc != 1)
   {
      return enif_make_badarg(env);
   }

   // First check if we are infact a list.
   if(!enif_is_list(env, is_list)) 
   {
      return enif_make_badarg(env);
   }

   // Iterate through list
   ERL_NIF_TERM head;
   uint64_t current = 0;
   ERL_NIF_TERM list = argv[0];
   while(enif_get_list_cell(env, list, &head, (ERL_NIF_TERM*) &list))
   {
      if(!enif_get_uint64(env, head, &current))
      {
         return mk_error(env, "list_element_not_uint64");
      }

      // Weird uint64_t printing.
      printf("%" PRId64 "\n", current);
   }

   return mk_atom("ok);
}

Usage:

erl> test_nif:list_iteration([1, 2, 3]).
1
2
3
ok

Filling an Array From a List

Various libraries operate on arrays passed to them. Here is a way to set up an array to be manipulated. Much of the code is similar, except we need to snag the list length and use the built in nif allocator function to allocate it. I won’t include the first check, as it is implied.

// Standard nif code above

// Nif function to fill an array.
static ERL_NIF_TERM
list_iteration(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[], int buffered)
{
   ERL_NIF_TERM is_list = argv[0];
   unsigned list_length;
 
   // First check if we are infact a list, see above

   // Get the length of the list.
   if(!enif_get_list_length(env, argv[0], &list_length)) 
   {
      return mk_error(env, "get_list_length");
   }

   // Create array of the determined size.
   uint64_t* value_list = enif_alloc(sizeof(uint64_t) * list_length);

   ERL_NIF_TERM list = argv[0];
   ERL_NIF_TERM head;
   uint64_t current = 0;

   // Similar to the while loop, except keep a counter.
   for(int i = 0; enif_get_list_cell(env, list, &head, (ERL_NIF_TERM*) &list); i++) 
   {
      if(!enif_get_uint64(env, head, &current))
      {
         return mk_error(env, "list_element_not_uint64");
      }
      value_list[i] = current;
   }

   // Do whatever to the array

   // Free the array.
   enif_free(value_list);

   return mk_atom("ok);
}

Returning a C Pointer for Later Use

This one is a bit more tricky to work out from the documenation. We are going to

  • Create a ErlNifResourceType
  • A struct to wrap our data
  • Intialize the type in the load_nif function
  • Return a resource for later use.

This should be paired with the section below, Cleaning Up a C Pointer, but I’m splitting them into two sections to be a bit more digestible. Our resource is going to just return a pointer to an array but you can sub in anything you need.

// Define resource type.
static ErlNifResourceType* test_resource;

// Resource we are going to return to the Erlang code.
typedef struct 
{
   int         is_buffer_allocated;
   uint64_t*   buffer;
} test_resource_t;

// Standard nif code

// Nif function that returns a pointer via a resource to be used later.
static ERL_NIF_TERM
create_test_resource(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[], int buffered)
{
   // No need to check arguments here.

   // Allocate our resource.
   test_resource_t* resource = 
      enif_alloc_resource(test_resource, sizeof(test_resource_t));
   resource -> is_buffer_allocated = false;

   uint64_t* value_list = enif_alloc(sizeof(uint64_t) * list_length);

   // Be sure that we actually allocated the resource.
   if(value_list == NULL)
   {
      // Cleanup after ourselves, more will be covered in the next section.
      enif_release_resource(resource);
      return mk_error(env, "could_not_allocate_memory_error");
   }

   // Initialize our resource.
   resource->buffer = value_list
   resource->is_buffer_allocated = true;

   ERL_NIF_TERM term = enif_make_resource(env, resource);
   return term;
}

// Allocate the resource type
ErlNifResourceType* create_test_resource(ErlNifEnv* env) 
{
   return enif_open_resource_type(
      env,
      NULL,
      "test_resource_t",
      destroy_test_resource, // Will be defined in the next section.
      ERL_NIF_RT_CREATE | ERL_NIF_RT_TAKEOVER,
      NULL
   );
}

// Code called on nif loading by the BEAM.  Value defined in the first line.
static int
nif_load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info)
{
   test_resource = create_test_resource(env);
   return 0;
}

// Other nif code.

Usage is simple, calling the function will return a resource type, to be saved for later.

Cleaning Up a C Pointer

Above we see that enif_release_resource is called in case of error. This will also be called when it’s time to clean the code up. This will call the callback that we passed create_test_resource, which was destroy_test_resource.

// Nif fucntion that frees a passed resource.
static ERL_NIF_TERM
free_test_resource(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
   // Arity check
   if(argc != 1)
   {
      return enif_make_badarg(env);
   }

   test_resource_t* resource;
   if(!enif_get_resource(env, argv[0], test_resource, (void**) &resource)) 
   {
      return mk_error(env, "could_not_retrieve_test_resource");
   }

   enif_release_resource(resource);

   return mk_atom(env, "ok");
}

// Callback to free the resource.  Check defined resource type's boolean, 
// just incase the filter wasn't initialized due to an allocation error.
void
destroy_test_resource(ErlNifEnv* env, void* obj)
{
   test_resource_t* resource = (test_resource_t*) obj;

   if(obj != NULL && resource->is_buffer_allocated)
   {
      enif_free(resource->buffer);
   }
}

Returned C Pointer Usage

Here is what it looks like when used from Erlang code:

Resource = test_nif:create_test_resource(),
%% Do stuff.
ok = test_nif:free_test_resource(Resource).

Normal and Dirty Versions of a NIF

I ran into an issue where the time it took to use a library increased as the size of the passed data structure increased. I believe this is a common use case. I wanted to use a dirty nif when the dataset that was being operated on was large. I also wanted the option to use a regular non-dirty nif when the dataset was small. This is because the time it takes to load and unload a dirty nif outweighs the time that the library function takes to run. I only wanted to take on the overhead when absoutely necessary. I also didn’t want repeated code and have two versions of the function. It turns out, it’s actually pretty easy. All that has to be done is to use the same C functions, and just define a dirty version to use the specified ERL_NIF_DIRTY_JOB_CPU_BOUND option. The Erlang code could then decide on which to use, based on the size of the inputs.

Here is the C example:

// nif code defined above

// Notice they're both using the same underlying function!
static ErlNifFunc nif_funcs[] = {
   {"test_nif", 1, test_nif_fun},
   {"test_nif_dirty", 1, test_nif_fun, 
      ERL_NIF_DIRTY_JOB_CPU_BOUND}
};

And the Erlang code that makes the decision:

my_fun(input_list) when length(input_list) > 10000 ->
   test_nif_dirty(input_list);

my_fun(input_list) ->
   test_nif(input_list).

Its been pointed out that the length/1 function has a runtime of O(n). A function is_dirty could easily be made to iterate to the 10001st (or whatever makes sense) element and return true!

Built in Hashing

The nif library hash two options for hashing erlang terms inside a nif. This is clearly stated in the documentation, but I didn’t really see it used anywhere so I thought I’d mention it. The first option uses erlang:phash2/1 function, which is consitent accross nodes and VMs. The hashing space is only 60 bits where 4 bits is metadata. The second option is the fast hash, which is faster, more consistent, and has the full 64 bit hashing space. Here are their uses:

ERL_NIF_TERM test;
uint64_t hashed_value = enif_hash(ERL_NIF_PHASH2, test, 0);
uint64_t fast_hashed_value = enif_hash(ERL_NIF_INTERNAL_HASH, head, 0);

Hope this guide was helpful! Example usages can be found in the exor_filter C code.

Matthew


1401 Words

2020-01-11