SIMD library
Types
-
Represents one SIMD matrix type with 12 (3x4) packed single-precision (32-bit) floating-point elements. Based on the implementation, the matrix may be stored using two 256-bits registers (AVX), or four 128-bits registers (SSEn / Neon).
-
Represents one SIMD matrix type with 16 (4x4) packed single-precision (32-bit) floating-point elements. Based on the implementation, the matrix may be stored using two 256-bits registers (AVX), or four 128-bits registers (SSEn / Neon).
Functions
-
Reinterprets SIMD varaible of type
int4to typefloat4without changing the data of the varaible. -
Reinterprets SIMD varaible of type
float4to typeint4without changing the data of the varaible. -
float4 load_f2(f32 const *mem_addr)
Loads 64-bits (composed of 2 packed single-precision (32-bit) floating-point elements) from memory into the first two elements of
dst. The rest 2 elements are filled with 0. -
float4 load_f4(f32 const *mem_addr)
Loads 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into
dst. -
void store_f2(f32 *mem_addr, float4 a)
Stores the lower 2 single-precision (32-bit) floating-point elements from
ainto memory. -
void store_f4(f32 *mem_addr, float4 a)
Stores 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from
ainto memory. -
float4 set_f4(f32 e0, f32 e1, f32 e2, f32 e3)
Sets packed single-precision (32-bit) floating-point elements in
dstwith the supplied values. -
int4 set_i4(i32 e0, i32 e1, i32 e2, i32 e3)
Sets packed 32-bit integers in
dstwith the supplied values. -
Returns vector of type
float4with all elements set to zero. -
Broadcasts single-precision (32-bit) floating-point value
e0to all elements ofdst. -
Stores the first single-precision (32-bit) floating-point element from
aintodst. -
float4 setw_f4(float4 a, f32 b)
Replaces the forth element of
awithb, and stores the results indst. -
Broadcasts the first element of
ato every element ofdst. -
Broadcasts the second element of
ato every element ofdst. -
Broadcasts the third element of
ato every element ofdst. -
Broadcasts the forth element of
ato every element ofdst. -
int4 cmpeq_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor equality, and stores the results indst. -
int4 cmpneq_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor not-equal, and stores the results indst. -
int4 cmpgt_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor greater-than, and stores the results indst. -
int4 cmplt_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor less-than, and stores the results indst. -
int4 cmpge_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor greater-than-or-equal, and stores the results in dst. -
int4 cmple_f4(float4 a, float4 b)
Compares packed single-precision (32-bit) floating-point elements in
aandbfor less-than-or-equal, and stores the results in dst. -
Converts the comparison result mask to one 32-bit integer.
-
float4 add_f4(float4 a, float4 b)
Adds packed single-precision (32-bit) floating-point elements in
aandb, and stores the results indst. -
float4 sub_f4(float4 a, float4 b)
Subtracts packed single-precision (32-bit) floating-point elements in
aandb, and stores the results indst. -
float4 mul_f4(float4 a, float4 b)
Multiplies packed single-precision (32-bit) floating-point elements in
aandb, and stores the results indst. -
float4 div_f4(float4 a, float4 b)
Divides packed single-precision (32-bit) floating-point elements in a by packed elements in b, and stores the results in dst.
-
float4 scale_f4(float4 a, f32 b)
Scales packed single-precision (32-bit) floating-point elements in
ausing one single-precision (32-bit) floating-point elementb, and stores the results indst. -
float4 muladd_f4(float4 a, float4 b, float4 c)
Multiply packed single-precision (32-bit) floating-point elements in
aandb, add the intermediate result to packed elements inc, and store the results indst. -
float4 negmuladd_f4(float4 a, float4 b, float4 c)
Multiply packed single-precision (32-bit) floating-point elements in
aandb, add the negated intermediate result to packed elements inc, and store the results in dst. -
float4 scaleadd_f4(float4 a, f32 b, float4 c)
Scales packed single-precision (32-bit) floating-point elements in
ausing one single-precision (32-bit) floating-point elementb, add the intermediate result to packed elements inc, and stores the results indst. -
Computes the square root of packed single-precision (32-bit) floating-point elements in
a, and stores the results indst. -
Computes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in
a, and stores the results indst. SSE specific: The maximum relative error for this approximation is less than 1.5*2^-12. -
Computes the reciprocal square root of packed single-precision (32-bit) floating-point elements in
a, and stores the results indst. -
float4 max_f4(float4 a, float4 b)
Compare packed single-precision (32-bit) floating-point elements in
aandb, and store packed maximum values indst. SSE specific:dstdoes not follow the IEEE Standard for Floating - Point Arithmetic(IEEE 754) maximum value when inputs are NaN or signed - zero values. -
float4 min_f4(float4 a, float4 b)
Compare packed single-precision (32-bit) floating-point elements in
aandb, and store packed minimum values indst. SSE specific:dstdoes not follow the IEEE Standard for Floating - Point Arithmetic(IEEE 754) minimum value when inputs are NaN or signed - zero values. -
Computes the bitwise AND of every bit in
aandb, and stores the results indst. -
Computes the bitwise OR of every bit in
aandb, and stores the results indst. -
f32 dot2_f4(float4 a, float4 b)
Computes the dot product on the first two elements of
aandb, and stores the result indst. -
f32 dot3_f4(float4 a, float4 b)
Computes the dot product on the first three elements of
aandb, and stores the result indst. -
f32 dot4_f4(float4 a, float4 b)
Computes the dot product on elements of
aandb, and stores the result indst. -
float4 dot2v_f4(float4 a, float4 b)
Computes the dot product on the first two elements of
aandb, and stores the result in each element ofdst. -
float4 dot3v_f4(float4 a, float4 b)
Computes the dot product on the first three elements of
aandb, and stores the result in each element ofdst. -
float4 dot4v_f4(float4 a, float4 b)
Computes the dot product on elements of
aandb, and stores the result in each element ofdst. -
float4 cross2_f4(float4 a, float4 b)
Computes the cross product on the first two elements of
aandb, and stores the result indst. -
float4 cross3_f4(float4 a, float4 b)
Computes the cross product on the first three elements of
aandb, and stores the result in each element ofdst. -
float4 cross4_f4(float4 a, float4 b, float4 c)
Computes the cross product on elements of
a,bandc, and stores the result in each element ofdst. -
float4 normalize2_f4(float4 a)
Normalizes the first two elements of
a, and stores the result indst. -
float4 normalize3_f4(float4 a)
Normalizes the first three elements of
a, and stores the result indst. -
float4 normalize4_f4(float4 a)
Normalizes elements of
a, and stores the result indst. -
float4 reflect2_f4(float4 i, float4 n)
Performs reflection operation based on the first two elements of
i(incident vector) andn(normal vector), and stores the refected vector indst. -
float4 reflect3_f4(float4 i, float4 n)
Performs reflection operation based on the first three elements of
i(incident vector) andn(normal vector), and stores the refected vector indst. -
float4 reflect4_f4(float4 i, float4 n)
Performs reflection operation based on elements of
i(incident vector) andn(normal vector), and stores the refected vector indst. -
float4 refract2_f4(float4 i, float4 n, f32 index)
Performs refraction operation based on the first two elements of
i(incident vector),n(normal vector) and the scalar valueindex(refraction index), and stores the refected vector indst. -
float4 refract3_f4(float4 i, float4 n, f32 index)
Performs refraction operation based on the first three elements of
i(incident vector),n(normal vector) and the scalar valueindex(refraction index), and stores the refected vector indst. -
float4 refract4_f4(float4 i, float4 n, f32 index)
Performs refraction operation based on elements of
i(incident vector),n(normal vector) and the scalar valueindex(refraction index), and stores the refected vector indst. -
float4 lerp_f4(float4 a, float4 b, f32 t)
Computes linear interpolation on packed single-precision (32-bit) floating-point elements in
aandbusing the single-precision (32-bit) floating-point valuet, and stores the results indst. -
float4 lerpv_f4(float4 a, float4 b, float4 t)
Computes linear interpolation on packed single-precision (32-bit) floating-point elements in
aandbusing the corresponding packed single-precision (32-bit) floating-point element int, and stores the results indst. -
float4 barycentric_f4(float4 a, float4 b, float4 c, f32 f, f32 g)
Computes barycentric interpolation on packed single-precision (32-bit) floating-point elements in
a,bandcusing the single-precision (32-bit) floating-point valuesfandg, and stores the results indst. -
float4 catmull_rom_f4(float4 a, float4 b, float4 c, float4 d, f32 t)
Computes Catmull-Rom spline interpolation on packed single-precision (32-bit) floating-point elements in
a,b,canddusing the single-precision (32-bit) floating-point valuet, and stores the results indst. -
float4 hermite_f4(float4 v0, float4 t0, float4 v1, float4 t1, f32 t)
Computes Hermite spline interpolation on packed single-precision (32-bit) floating-point elements in
v0,t0,v1andt1using the single-precision (32-bit) floating-point valuet, and stores the results indst. -
Shuffles single-precision (32-bit) floating-point elements in
abased on the control parameter_SelectX,_SelectY,_SelectZand_SelectW, and stores the results indst. -
float4 select_f4(float4 a, float4 b)
Performs a per-component selection between
aandbbased on the control parameter_SelectX,_SelectY,_SelectZand_SelectW, and stores the results indst. -
float4 permute2_f4(float4 a, float4 b)
Shuffles single-precision (32-bit) floating-point elements in
aandbbased on the control parameter_SelectX,_SelectY,_SelectZand_SelectW, and stores the results indst. -
float3x4 load_f3x4(f32 const *mem_addr)
Loads 12 packed single-precision (32-bit) floating-point elements from
mem_addrtodst. The highest 4 packed single-precision (32-bit) floating-point elements are uninitialized. -
float4x4 load_f4x4(f32 const *mem_addr)
Loads 16 packed single-precision (32-bit) floating-point elements from
mem_addrtodst.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated. -
float4x4 setf4_f4x4(float4 r0, float4 r1, float4 r2, float4 r3)
Creates one 4x4 matrix by loading four vectors, and stores the result in
dst. -
void store_f3x4(f32 *mem_addr, float3x4 m)
Stores the first 12 packed single-precision (32-bit) floating-point elements from
mtodst.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated. -
void store_f4x4(f32 *mem_addr, float4x4 m)
Stores packed single-precision (32-bit) floating-point elements from
mtodst.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated. -
Returns matrix of type
float4x4with all elements set to zero. -
float3x4 matmul_f3x3(float3x4 a, float3x4 b)
Performs 3x3 matrix multiplication on
aandb, and stores the result indst. -
float4x4 matmul_f4x4(float4x4 a, float4x4 b)
Performs 4x4 matrix multiplication on
aandb, and stores the result indst. -
float4x4 transpose_f4x4(float4x4 a)
Performs matrix transpose on
a, and stores the result indst. -
f32 determinant_f3x3(float3x4 a)
Computes the determinant of the 3x3 matrix
a, and stores the result indst. -
float4 determinantv_f3x3(float3x4 a)
Computes the determinant of the 3x3 matrix
a, and stores the result in every element ofdst. -
float3x4 inverse_f3x3(float3x4 a, f32 *out_determinant)
Computes the determinant and the inverse matrix of
a, stores the determinant inout_determinant, and stores the inverse matrix indst. -
f32 determinant_f4x4(float4x4 a)
Calculates the determinant of matrix
a, and stores the result indst. -
float4 determinantv_f4x4(float4x4 a)
Calculates the determinant of matrix
a, and stores the result in every element ofdst. -
float4x4 inverse_f4x4(float4x4 a, f32 *out_determinant)
Computes the determinant and the inverse matrix of
a, stores the determinant inout_determinant, and stores the inverse matrix indst. -
Rounds each component of
ato the nearest even integer. -
Computes the per-component angle modulo 2PI for
a, and stores the results indst. The angle is expressed in radians. The result is rounded in [-PI, PI]. -
Computes the sine of packed single-precision (32-bit) floating-point elements in
aexpressed in radians, and stores the results indst. -
Computes the cosine of packed single-precision (32-bit) floating-point elements in
aexpressed in radians, and stores the results indst. -
float4 sincos_f4(float4 &out_cos, float4 a)
Computes the sine and cosine of packed single-precision (32-bit) floating-point elements in
aexpressed in radians, and stores the results indstandout_cos. -
float4 mulquat_f4(float4 a, float4 b)
Multiplies two quaternion
aandb, and stores the result indst. -
float4 quatinverse_f4(float4 a)
Inverts the quaternion
a, and stores the result indst. -
float4 quatnormalangle_f4(float4 n, f32 a)
Computes one rotation quaternion based on the given normal
aand one anglea, and stores the result quaternion indst. -
float4 quateulerangles_f4(float4 a)
Computes a rotation quaternion based on a vector containing the Euler angles (pitch, yaw, and roll), and stores the result in
dst. -
float4 quatlerp_f4(float4 a, float4 b, f32 t)
Interpolates between two unit quaternions
aandbusing linear interpolation, and stores the result indst. -
float4 quatslerp_f4(float4 a, float4 b, f32 t)
Interpolates between two unit quaternions
aandbusing spherical linear interpolation, and stores the result indst. -
float3x4 transform2d_f3x4(float4 translation, f32 rotation, float4 scaling)
Builds a 2D affine matrix from translation, rotation and scaling.
-
float3x4 transform2d_translation_f3x4(float4 translation)
Builds a 2D affine matrix from translation.
-
float3x4 transform2d_rotation_f3x4(f32 rotation)
Builds a 2D affine matrix from roation.
-
float3x4 transform2d_scaling_f3x4(float4 scaling)
Builds a 2D affine matrix from scaling.
-
float4x4 transform3d_f4x4(float4 translation, float4 rotation_quaternion, float4 scaling)
Builds a 3D affine matrix from translation, rotation and scaling.
-
float4x4 transform3d_translation_f4x4(float4 translation)
Builds a 3D affine matrix from translation.
-
float4x4 transform3d_rotation_quaternion_f4x4(float4 quaternion)
Builds a 3D affine matrix from rotation represented by one quaternion.
-
float4x4 transform3d_rotation_x_f4x4(f32 rotation)
Builds a 3D affine matrix from rotation along X axis.
-
float4x4 transform3d_rotation_y_f4x4(f32 rotation)
Builds a 3D affine matrix from rotation along Y axis.
-
float4x4 transform3d_rotation_z_f4x4(f32 rotation)
Builds a 3D affine matrix from rotation along Z axis.
-
float4x4 transform3d_rotation_normal_angle_f4x4(float4 normal, f32 angle)
Builds a 3D affine matrix from rotation represented by rotation axis and rotation angle.
-
float4x4 transform3d_rotation_euler_angles_f4x4(float4 pitch_yaw_roll)
Builds a 3D affine matrix from rotation represented by euler angles (pitch, yaw, roll).
-
float4x4 transform3d_scaling_f4x4(float4 scaling)
Builds a 3D affine matrix from scaling.
-
float4x4 transform3d_look_to_f4x4(float4 eye, float4 eyedir, float4 updir)
Creates one affine matrix that trnasforms points and directions in world space to view space.
eyedirandupdirmust be normalized.