I read the BPF Usenix document from 1993 that I found on wikipedia:
http://en.wikipedia.org/wiki/Berkeley_Packet_Filter
http://www.tcpdump.org/papers/bpf-usenix93.pdf
The document describes a "pseudo machine" language for BPF similar to the machine languages used on a Motorola 6800 or IBM z machines.
It is big-endian (unlike Intel/AMD machines) and uses 32 bit words.
So to answer your question about why to use words, half-words and bytes.
A byte is 8 bits.
A half-word is 2 bytes or 16 bits
A word is 4 bytes or 32 bits.
The question of when to use each type is dependent on the sizes defined in the TCP/IP packet headers:
http://nmap.org/book/tcpip-ref.html
For example the ipv4 address of daemonforums.org is 94.142.245.224
This is a 4 byte word and In big-endian hexadecimal is: 0x5E8EF5E0
One can break it down into the 4 bytes:
94 = 0x5E
142 = 0x8E
245 = 0xF5
224 = 0xE0