Search Results

Search found 17 results on 1 pages for 'mantissa'.

Page 1/1 | 1 

  • Floating point mantissa bias

    - by user69514
    Does anybody know how to go out solving this problem? * a = 1.0 × 2^9 * b = -1.0 × 2^9 * c = 1.0 × 2^1 Using the floating-point (the representation uses a 14-bit format, 5 bits for the exponent with a bias of 16, a normalized mantissa of 8 bits, and a single sign bit for the number), perform the following two calculations, paying close attention to the order of operations. * b + (a + c) = ? * (b + a) + c = ?

    Read the article

  • Floating point conversion from Fixed point algorithm

    - by Viks
    Hi, I have an application which is using 24 bit fixed point calculation.I am porting it to a hardware which does support floating point, so for speed optimization I need to convert all fixed point based calculation to floating point based calculation. For this code snippet, It is calculating mantissa for(i=0;i<8207;i++) { // Do n^8/7 calculation and store // it in mantissa and exponent, scaled to // fixed point precision. } So since this calculation, does convert an integer to mantissa and exponent scaled to fixed point precision(23 bit). When I tried converting it to float, by dividing the mantissa part by precision bits and subtracting the exponent part by precision bit, it really does' t work. Please help suggesting a better way of doing it.

    Read the article

  • Is there a good radixsort-implementation for floats in C#

    - by CommuSoft
    I have a datastructure with a field of the float-type. A collection of these structures needs to be sorted by the value of the float. Is there a radix-sort implementation for this. If there isn't, is there a fast way to access the exponent, the sign and the mantissa. Because if you sort the floats first on mantissa, exponent, and on exponent the last time. You sort floats in O(n).

    Read the article

  • Floating point computer - Trouble with getting back correct results

    - by Francisco P.
    Having trouble with a challenge. Let's say I have a theoretical, base 10, floating point calculator with the following characteristics Only 3 digits for mantissa 1 digit for exponent Sign for mantissa and exponent How would this machine compute the following? 300 + \sum_{i=1}^{100} 0.2 The correct result is 320. The machine's result is 300. But why? Can't get where the 20 goes goes missing... Thanks for your time.

    Read the article

  • Read half precision float (float16 IEEE 754r) binary data in matlab

    - by Michael
    you have been a great help last time, i hope you can give me some advise this time, too. I read a binary file into matlab with bit16 (format = bitn) and i get a string of ones and zeros. bin = '1 00011 1111111111' (16 bits: 1. sign, 2-6. exponent, 7-16. mantissa) According to ftp://www.fox-toolkit.org/pub/fasthalffloatconversion.pdf it can be 'converted' like out = (-1)^bin(1) * 2^(bin(2:6)-15) * 1.bin(7:16) [are exponent and mantissa still binary?] Can someone help me out and tell me how to deal with the 'eeeee' and '1.mmmmmmmmmm' as mentioned in the pdf, please. Thanks a lot! Michael

    Read the article

  • How to implement " char * ftoa(float num) " without sprintf() library function in C, C++ and JAVA

    - by SIVA
    Today I appeared for an interview, and the question was writing my own "char * ftoa(float num) " in C, C++ and Java. Yes, I know float numbers follow IEEE standard while allocating their memory, but I don't know float to char conversion by using Mantissa and Exponent in C. I don't have any idea to solve the above problem in C++ and JAVA. I/P to the ftoa(): 1.23 O/P from the ftoa(): 1.23 (char format). Thanks in advance ...

    Read the article

  • Why does multiplying a double by -1 not give the negative of the current answer

    - by Ankur
    I am trying to multiply a double value by -1 to get the negative value. It continues to give me a positive value double man = Double.parseDouble(mantissa); double exp; if(sign.equals("plus")){ exp = Double.parseDouble(exponent); } else { exp = Double.parseDouble(exponent); exp = exp*-1; } System.out.println(man+" - "+sign+" - "+exp); The printed result is 13.93 - minus - 2.0 which is correct except that 2.0 should be -2.0

    Read the article

  • Floating point arithmetics restricted to integers

    - by user396672
    I use doubles for a uniform implementation of some arithmetic calculations. These calculations may be actually applied to integers too, but there are no C++-like templates in Java and I don't want to duplicate the implementation code, so I simply use "double" version for ints. Does JVM spec guarantees the correctness of integer operations such a <=,=, +, -, *, and / (in case of remainder==0) when the operations are emulated as corresponding floating point ops? (Any integer, of course, has reasonable size to be represented in double's mantissa)

    Read the article

  • Reading ASCII numbers using "D" instead of "E" for scientific notation using C

    - by Arrieta
    Hello, I have a list of numbers which looks like this: 1.234D+1 or 1.234D-02. I want to read the file using C. The function atof will merely ignore the D and translate only the mantissa. The function fscanf will not accept the format '%10.6e' because it expects an E instead of a D in the exponent. When I ran into this problem in Python, I have up and merely used a string substitution before converting from string to float. But in C, I am sure there must be another way. So, how would you read a file with numbers using D instead of E for scientific notation? Notice that I do not mean how to read the strings themselves, but rather how to convert them to floats. Thanks.

    Read the article

  • Can i have a negative value as constant expression in Scala?

    - by Klinke
    I have an Java-Annotation that return a double value: @Retention(RetentionPolicy.RUNTIME) @Target(ElementType.FIELD) public @interface DoubleValue { double value(); } When i try to attach the annotation to a field in a scala class and the value is negativ like here: class Test { @DoubleValue(-0.05) var a = _ } i get an compiler error with the message: "annotation argument needs to be a constant; found: 0.05.unary_-". I understood that i need a numerical literal and i looked into the Scala Language Specification and it seems, that the - sign is only used for the exponent but not for the mantissa. Does someone has an idea how i can have a negative value as runtime information using annotations? Thanks, Klinke

    Read the article

  • Float32 to Float16

    - by Goz
    Can someone explain to me how I convert a 32-bit floating point value to a 16-bit floating point value? (s = sign e = exponent and m = mantissa) If 32-bit float is 1s7e24m And 16-bit float is 1s5e10m Then is it as simple as doing? int fltInt32; short fltInt16; memcpy( &fltInt32, &flt, sizeof( float ) ); fltInt16 = (fltInt32 & 0x00FFFFFF) >> 14; fltInt16 |= ((fltInt32 & 0x7f000000) >> 26) << 10; fltInt16 |= ((fltInt32 & 0x80000000) >> 16); I'm assuming it ISN'T that simple ... so can anyone tell me what you DO need to do?

    Read the article

  • Is this time related process accounting stats gathering appropriate?

    - by Ceko Cakata
    Based on sys/acct.h (V1, not V3) I need to gather some user usage statistics based on a parser that parser the acct file line by line. The parser will run and parse the entire file every N seconds and I need to gather user statistics accumulated since the last run (N seconds back). I'm not sure what will be the most appropriate way to do it based on the info provided by sys/acct.h. Maybe something like this: if ((ac_btime + ac_etime) < (current_time - N)) { gather; } Also comp_t is said to be "floating-point value consisting of a 3-bit, base-8 exponent, and a 13-bit mantissa", but I think u_int16_t is just a unsigned short int. Should I be converting it to long it with the provided formula or not?

    Read the article

  • Why does division yield a vastly different result than multiplication by a fraction in floating points.

    - by Avram
    I understand why floating point numbers can't be compared, and know about the mantissa and exponent binary representation, but I'm no expert and today I came across something I don't get: Namely lets say you have something like: float denominator, numerator, resultone, resulttwo; resultone = numerator / denominator; float buff = 1 / denominator; resulttwo = numerator * buff; To my knowledge different flops can yield different results and this is not unusual. But in some edge cases these two results seem to be vastly different. To be more specific in my GLSL code calculating the Beckmann facet slope distribution for the Cook-Torrance lighitng model: float a = 1 / (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4)); float b = clampedCosHalfNormal * clampedCosHalfNormal - 1.0; float c = facetSlopeRMS * facetSlopeRMS * clampedCosHalfNormal * clampedCosHalfNormal; facetSlopeDistribution = a * exp(b/c); yields very very different results to float a = (facetSlopeRMS * facetSlopeRMS * pow(clampedCosHalfNormal, 4)); facetDlopeDistribution = exp(b/c) / a; Why does it? The second form of the expression is problematic. If I say try to add the second form of the expression to a color I get blacks, even though the expression should always evaluate to a positive number. Am I getting an infinity? A NaN? if so why?

    Read the article

  • How do I work out IEEE 754 64-bit Floating Point Double Precision?

    - by yousef gassar
    enter code herehello i have done it in 32 but i could dont do it in 62bits please i need help I am stuck on this question and need help. I don't know how to work it out. This is the question. Below are two numbers represented in IEEE 754 64-bit Floating Point Double Precision, the bias of the signed exponent is -1023. Any particular real number ‘N’ represented in 64-bit form (i.e. with the following bit fields; 1-bit Sign, 11-bit Exponent, 52-bit Fraction) can be expressed in the form ±1.F2 × 2X by substituting the bit-field values using formula (IV.I): N = (-1) S × 1.F2 × 2(E – 1023) for 0 < E < 2047.........................….(IV.I) Where N= the number represented, S=Sign bit-value, E=Exponent=X +1023, F=Fraction or Mantissa are the values in the 1, 11 and 52-bit fields respectively in the IEEE 754 64-bit FP representation. Using formula (IV.I), express the 64-bit FP representation of each number as: (i) A binary number of the form:- ±1.F2 × 2X (ii) A decimal number of the form:- ±0.F10 × 10Y {limit F10 to 10 decimal places} Sign 0 1 Exponent 1000 0001 001 11 Fraction 1111 0111 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 52 Sign 1 1 Exponent 1000 0000 000 11 Fraction 1001 0010 0001 1111 1011 0101 0100 0100 0100 0010 1101 0001 1000 52 I know I have to use the formula for each of the these but how do I work it out? Is it like this? N = (-1) S × 1.F2 × 2(E – 1023) = 1 x 1.1111 0111 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 x 1000 0001 00111 (-1023)?

    Read the article

  • Fast sign in C++ float...are there any platform dependencies in this code?

    - by Patrick Niedzielski
    Searching online, I have found the following routine for calculating the sign of a float in IEEE format. This could easily be extended to a double, too. // returns 1.0f for positive floats, -1.0f for negative floats, 0.0f for zero inline float fast_sign(float f) { if (((int&)f & 0x7FFFFFFF)==0) return 0.f; // test exponent & mantissa bits: is input zero? else { float r = 1.0f; (int&)r |= ((int&)f & 0x80000000); // mask sign bit in f, set it in r if necessary return r; } } (Source: ``Fast sign for 32 bit floats'', Peter Schoffhauzer) I am weary to use this routine, though, because of the bit binary operations. I need my code to work on machines with different byte orders, but I am not sure how much of this the IEEE standard specifies, as I couldn't find the most recent version, published this year. Can someone tell me if this will work, regardless of the byte order of the machine? Thanks, Patrick

    Read the article

  • C# error casting from double to int32

    - by orfix
    using NUF = NUnit.Framework; [NUF.Test]public void DifferentCastingTest() { NUF.Assert.That((int)0.499999D, NUF.Is.EqualTo(0)); NUF.Assert.That((int)0.500000D, NUF.Is.EqualTo(0)); // !!! row 1 NUF.Assert.That((int)1.499999D, NUF.Is.EqualTo(1)); NUF.Assert.That((int)1.500000D, NUF.Is.EqualTo(1)); // !!! row 2 NUF.Assert.That(System.Convert.ToInt32(0.499999D), NUF.Is.EqualTo(0)); NUF.Assert.That(System.Convert.ToInt32(0.500000D), NUF.Is.EqualTo(0)); // !!! NUF.Assert.That(System.Convert.ToInt32(1.499999D), NUF.Is.EqualTo(1)); NUF.Assert.That(System.Convert.ToInt32(1.500000D), NUF.Is.EqualTo(2)); //!!! row 3 } The same double value (1.5D) is converted in different way by casting and Convert.ToInt32 (see row 2 and 3), and two double with same mantissa (0.5 and 1.5) is rounded in different mode (see row 1 and 2). Is it a bug?

    Read the article

1