package float16

float16 – github.com/x448/float16 Index | Files

import "github.com/x448/float16"

Index ¶

Variables
type Float16

func FromNaN32ps(nan float32) (Float16, error)
func Frombits(u16 uint16) Float16
func Fromfloat32(f32 float32) Float16
func Inf(sign int) Float16
func NaN() Float16
func (f Float16) Bits() uint16
func (f Float16) Float32() float32
func (f Float16) IsFinite() bool
func (f Float16) IsInf(sign int) bool
func (f Float16) IsNaN() bool
func (f Float16) IsNormal() bool
func (f Float16) IsQuietNaN() bool
func (f Float16) Signbit() bool
func (f Float16) String() string

type Precision

func PrecisionFromfloat32(f32 float32) Precision

Variables ¶

var ErrInvalidNaNValue = errors.New("float16: invalid NaN value, expected IEEE 754 NaN")

ErrInvalidNaNValue indicates a NaN was not received.

Types ¶

type Float16 ¶

type Float16 uint16

Float16 represents IEEE 754 half-precision floating-point numbers (binary16).

func FromNaN32ps ¶

func FromNaN32ps(nan float32) (Float16, error)

FromNaN32ps converts nan to IEEE binary16 NaN while preserving both signaling and payload. Unlike Fromfloat32(), which can only return qNaN because it sets quiet bit = 1, this can return both sNaN and qNaN. If the result is infinity (sNaN with empty payload), then the lowest bit of payload is set to make the result a NaN. This function was kept simple to be able to inline.

func Frombits ¶

func Frombits(u16 uint16) Float16

Frombits returns the float16 number corresponding to the IEEE 754 binary16 representation u16, with the sign bit of u16 and the result in the same bit position. Frombits(Bits(x)) == x.

func Fromfloat32 ¶

func Fromfloat32(f32 float32) Float16

Fromfloat32 returns a Float16 value converted from f32. Conversion uses IEEE default rounding (nearest int, with ties to even).

func Inf ¶

func Inf(sign int) Float16

Inf returns a Float16 with an infinity value with the specified sign. A sign >= returns positive infinity. A sign < 0 returns negative infinity.

func NaN ¶

func NaN() Float16

NaN returns a Float16 of IEEE 754 binary16 not-a-number (NaN). Returned NaN value 0x7e01 has all exponent bits = 1 with the first and last bits = 1 in the significand. This is consistent with Go's 64-bit math.NaN(). Canonical CBOR in RFC 7049 uses 0x7e00.

func (Float16) Bits ¶

func (f Float16) Bits() uint16

Bits returns the IEEE 754 binary16 representation of f, with the sign bit of f and the result in the same bit position. Bits(Frombits(x)) == x.

func (Float16) Float32 ¶

func (f Float16) Float32() float32

Float32 returns a float32 converted from f (Float16). This is a lossless conversion.

func (Float16) IsFinite ¶

func (f Float16) IsFinite() bool

IsFinite returns true if f is neither infinite nor NaN.

func (Float16) IsInf ¶

func (f Float16) IsInf(sign int) bool

IsInf reports whether f is an infinity (inf). A sign > 0 reports whether f is positive inf. A sign < 0 reports whether f is negative inf. A sign == 0 reports whether f is either inf.

func (Float16) IsNaN ¶

func (f Float16) IsNaN() bool

IsNaN reports whether f is an IEEE 754 binary16 “not-a-number” value.

func (Float16) IsNormal ¶

func (f Float16) IsNormal() bool

IsNormal returns true if f is neither zero, infinite, subnormal, or NaN.

func (Float16) IsQuietNaN ¶

func (f Float16) IsQuietNaN() bool

IsQuietNaN reports whether f is a quiet (non-signaling) IEEE 754 binary16 “not-a-number” value.

func (Float16) Signbit ¶

func (f Float16) Signbit() bool

Signbit reports whether f is negative or negative zero.

func (Float16) String ¶

func (f Float16) String() string

String satisfies the fmt.Stringer interface.

type Precision ¶

type Precision int

Precision indicates whether the conversion to Float16 is exact, subnormal without dropped bits, inexact, underflow, or overflow.

const (

	// PrecisionExact is for non-subnormals that don't drop bits during conversion.
	// All of these can round-trip.  Should always convert to float16.
	PrecisionExact Precision = iota

	// PrecisionUnknown is for subnormals that don't drop bits during conversion but
	// not all of these can round-trip so precision is unknown without more effort.
	// Only 2046 of these can round-trip and the rest cannot round-trip.
	PrecisionUnknown

	// PrecisionInexact is for dropped significand bits and cannot round-trip.
	// Some of these are subnormals. Cannot round-trip float32->float16->float32.
	PrecisionInexact

	// PrecisionUnderflow is for Underflows. Cannot round-trip float32->float16->float32.
	PrecisionUnderflow

	// PrecisionOverflow is for Overflows. Cannot round-trip float32->float16->float32.
	PrecisionOverflow
)

func PrecisionFromfloat32 ¶

func PrecisionFromfloat32(f32 float32) Precision

PrecisionFromfloat32 returns Precision without performing the conversion. Conversions from both Infinity and NaN values will always report PrecisionExact even if NaN payload or NaN-Quiet-Bit is lost. This function is kept simple to allow inlining and run < 0.5 ns/op, to serve as a fast filter.

Source Files ¶

float16.go

Version: v0.8.2
Published: Jan 12, 2020
Platform: js/wasm
Imports: 3 packages
Last checked: 1 second ago –

Tools for package owners.

?	: This menu
/	: Search site
f	: Jump to identifier
g then g	: Go to top of page
g then b	: Go to end of page
G	: Go to end of page
g then i	: Go to index
g then e	: Go to examples