experchange > java

Eric Douglas (08-26-19, 10:51 PM)
I want a custom class that works like the java.awt.Point class but a bit more complicated.
They use this hashcode:
public int hashCode() {
long bits = java.lang.Double.doubleToLongBits(getX());
bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
return (((int) bits) ^ ((int) (bits >> 32)));
}
I want a custom class that has say 6 String type properties to be able to generate a common hashcode if they all match so I can use the class instances as keys to a hashmap.
Eric Sosman (08-26-19, 11:41 PM)
On 8/26/2019 4:51 PM, Eric Douglas wrote:
> I want a custom class that works like the java.awt.Point class but a bit more complicated.
> They use this hashcode:
> public int hashCode() {
> long bits = java.lang.Double.doubleToLongBits(getX());
> bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
> return (((int) bits) ^ ((int) (bits >> 32)));
> }
> I want a custom class that has say 6 String type properties to be able to generate a common hashcode if they all match so I can use the class instances as keys to a hashmap.


Just stir the String's hashes into what you've already got:

public int hashCode() {
long bits = java.lang.Double.doubleToLongBits(getX());
bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
bits ^= Objects.hashCode(getStringThing()) * 29;
bits ^= Objects.hashCode(getStrungOut()) * 37;
bits ^= Objects.hashCode(getStringBikini()) * 59;
...
return ...
}

Choice of multipliers is up to you; perhaps you'd want a few large long
values to stir some string bits into the high order half (might not make
much difference post-fold, though).

Note that all the String values must also be tested in equals().
Arne Vajhj (08-27-19, 01:03 AM)
On 8/26/2019 4:51 PM, Eric Douglas wrote:
> I want a custom class that works like the java.awt.Point class but a bit more complicated.
> They use this hashcode:
> public int hashCode() {
> long bits = java.lang.Double.doubleToLongBits(getX());
> bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
> return (((int) bits) ^ ((int) (bits >> 32)));
> }
> I want a custom class that has say 6 String type properties to be able to generate a common hashcode if they all match so I can use the class instances as keys to a hashmap.


Have tried letting your IDE generate hashCode and equals based
on the fields you want?

Arne
Eric Douglas (08-27-19, 01:59 PM)
On Monday, August 26, 2019 at 7:04:02 PM UTC-4, Arne Vajhj wrote:
> Have tried letting your IDE generate hashCode and equals based
> on the fields you want?
> Arne


Didn't realize it could do that...so here's what it comes up with.

import java.util.HashMap;

public class TestClass1 {
String s1 = "abc";
String s2 = "def";
String s3 = "ghi";
int x = 1;
int y = 2;
float z = 3;

@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((s1 == null) ? 0 : s1.hashCode());
result = prime * result + ((s2 == null) ? 0 : s2.hashCode());
result = prime * result + ((s3 == null) ? 0 : s3.hashCode());
result = prime * result + x;
result = prime * result + y;
result = prime * result + Float.floatToIntBits(z);
return result;
}

@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null) return false;
if (getClass() != obj.getClass()) return false;
TestClass1 other = (TestClass1) obj;
if (s1 == null) {
if (other.s1 != null) return false;
} else if (!s1.equals(other.s1)) return false;
if (s2 == null) {
if (other.s2 != null) return false;
} else if (!s2.equals(other.s2)) return false;
if (s3 == null) {
if (other.s3 != null) return false;
} else if (!s3.equals(other.s3)) return false;
if (x != other.x) return false;
if (y != other.y) return false;
if (Float.floatToIntBits(z) != Float.floatToIntBits(other.z)) return false;
return true;
}

public static void main(String[] args) {
TestClass1 c1 = new TestClass1();
TestClass1 c2 = new TestClass1();
HashMap<TestClass1,String> m1 = new HashMap<>();
m1.put(c1, "test");
System.out.println(m1.containsKey(c2)); //true
}

}
Eric Douglas (08-27-19, 03:24 PM)
On Monday, August 26, 2019 at 5:41:48 PM UTC-4, Eric Sosman wrote:
[..]
> values to stir some string bits into the high order half (might not make
> much difference post-fold, though).
> Note that all the String values must also be tested in equals().


Another option is to do what Apache did in the POI project, they just return the .toString().hashCode() of a single object containing all the properties of the class, this object extends org.apache.xmlbeans.XmlObject.
Eric Sosman (08-27-19, 03:39 PM)
On 8/27/2019 9:24 AM, Eric Douglas wrote:
> On Monday, August 26, 2019 at 5:41:48 PM UTC-4, Eric Sosman wrote:
> Another option is to do what Apache did in the POI project, they just return the .toString().hashCode() of a single object containing all the properties of the class, this object extends org.apache.xmlbeans.XmlObject.


I'm not familiar with POI, but this sounds expensive: Why create
(most likely) a StringBuilder and several Strings just to compute a
hash value? Remember, the reason for wanting a hash value in the
first place is because you need speed; don't burn extra cycles on
making and collecting needless garbage. (Does the toString() in
this case produce an entire XML document? Oh, Lordy!)

You'd also need to be concerned about the "invariance" of the
String representations of the component sub-objects. If the Strings
change with locale or look-and-feel or something, the hash codes
could also change and mess up everything. Even the mere passage of
time might do it: A date that renders as "today" might spontaneously
change to "yesterday" as midnight passes ...

If you're *sure* the String representations are invariant, and if
you're *confident* that they're not too costly, well, okay, maybe.
Eric Douglas (08-27-19, 04:29 PM)
On Tuesday, August 27, 2019 at 9:39:24 AM UTC-4, Eric Sosman wrote:
[..]
> If you're *sure* the String representations are invariant, and if
> you're *confident* that they're not too costly, well, okay, maybe.
> --


It extends XmlObject not document. That returns all properties that have value. By default that could look like
<xml-fragment borderId="21" applyBorder="true"/>

I expect these fragments will still match if you change the locale, assuming all the xml objects you're mapping still have the same locale.
The advantage is you can add more properties without changing the hashcode method. This would also mean your objects can match if they're generated from 2 different versions of the API as long as they only reference fields available in both.

Does calling the toString() which calls toPrettyString() iterating over every non-null property to return a single String hashcode add significantly more overhead than the single String to int computation we see in the hashcode generated if we right click the class and select generate hashcode in Eclipse? Any chance they were thinking about mixing versions with the POI, or more likely they wanted to change the property fields without updating the hashcode method?
Eric Sosman (08-27-19, 05:49 PM)
On 8/27/2019 10:29 AM, Eric Douglas wrote:
> On Tuesday, August 27, 2019 at 9:39:24 AM UTC-4, Eric Sosman wrote:
> It extends XmlObject not document. That returns all properties that have value. By default that could look like
> <xml-fragment borderId="21" applyBorder="true"/>
> I expect these fragments will still match if you change the locale, assuming all the xml objects you're mapping still have the same locale.
> The advantage is you can add more properties without changing the hashcode method. This would also mean your objects can match if they're generated from 2 different versions of the API as long as they only reference fields available in both.
> Does calling the toString() which calls toPrettyString() iterating over every non-null property to return a single String hashcode add significantly more overhead than the single String to int computation we see in the hashcode generated if we right click the class and select generate hashcode in Eclipse? Any chance they were thinking about mixing versions with the POI, or more likely they wanted to change the property fields without updating the hashcode method?


(Repeating: I'm not familiar with POI, so take my opinions as
guesses rather than as Gospel.)

The generated hashCode() exhibited up-thread computed the hash of
an *existing* String value. It expended no cycles in generating or
collecting a new String, and performed the hash calculation only the
first time it was asked for (unless it was extraordinarily unlucky).
Note that last point: Somebody felt hash speed was important enough
to warrant caching the hash value of a simple String.

The iteration you describe appears to create a brand-new String
every single time, quite likely more than one as the individual
pieces are generated and concatenated. Then there's a StringBuilder
to think about, too, and the (several?) char[] arrays that underlie
it as it grows. Finally, calling hashCode() on the brand-new String
will be the first such call and thus will compute the hash afresh.

So, yes: It's "significantly more overhead" (unless you've left
out a bunch of mitigating detail).

As for why the Apaches chose this seemingly inefficient way to
compute hashes -- well, "I'm not familiar with POI," and certainly
not familiar with the thinking of its developers. Maybe they figured
"Nobody in his right mind would use these things as hash keys anyhow"
and opted for minimal development time, or maybe there's more going
on than is apparent from your brief description, I dunno.
Eric Douglas (08-27-19, 08:05 PM)
On Tuesday, August 27, 2019 at 11:50:10 AM UTC-4, Eric Sosman wrote:
[..]
> "Nobody in his right mind would use these things as hash keys anyhow"
> and opted for minimal development time, or maybe there's more going
> on than is apparent from your brief description, I dunno.


It is possible they chose the 'quick and dirty' route, making their code a lot slower than it could be. It is possible there's a StringBuilder in there somewhere. It's hard to find the actual code that makes the hashcode, it runs through so many factories and interfaces. Suffice to say it's xmlbeans, apparently the most generic way to make any java object serializable?



/**
* Returns an XML string for this XML object.
* <p>
* The string is pretty-printed. If you want a non-pretty-printed
* string, or if you want to control options precisely, use the
* xmlText() methods.
* <p>
* Note that when producing XML any object other than very root of the
* document, then you are guaranteed to be looking at only a fragment
* of XML, i.e., just the contents of an element or attribute, and
* and we will produce a string that starts with an <code>&lt;xml-fragment&gt;</code> tag.
* The XmlOptions.setSaveOuter() option on xmlText can be used to produce
* the actual element name above the object if you wish.
*/
String toString();

So, if I make a custom class that holds the values I need to generate one of their objects, and I don't need to be so super dynamic and super generic,the hard-coded hashcode as generated by the IDE is much more efficient..(just not interchangeable as to be created by one version of the class and compared to another should there be any changes that affect the hashcode)
Eric Sosman (08-27-19, 09:16 PM)
On 8/27/2019 2:05 PM, Eric Douglas wrote:
> [...]
> So, if I make a custom class that holds the values I need to generate one of their objects, and I don't need to be so super dynamic and super generic, the hard-coded hashcode as generated by the IDE is much more efficient..(just not interchangeable as to be created by one version of the class and compared to another should there be any changes that affect the hashcode)


It looks like direct hash computation should beat the pants off
render-as-string-and-hash-the-string, almost certainly and almost
always. (Suggested experiment, if you're interested: Compute a few
thousand hashes each way to warm up, then compute a few thousand
more and note the elapsed time. Additional fillip: Use the methods
of Runtime to approximate the memory used before and after -- but
realize this will only give an approximation.)

Interchangeability of hash values is probably not a concern.
It is not unheard-of for instances of different classes to equals()
each other, but it is fairly unusual.
Eric Douglas (08-27-19, 10:04 PM)
On Tuesday, August 27, 2019 at 3:16:32 PM UTC-4, Eric Sosman wrote:
> It looks like direct hash computation should beat the pants off
> render-as-string-and-hash-the-string, almost certainly and almost
> always. (Suggested experiment, if you're interested: Compute a few
> thousand hashes each way to warm up, then compute a few thousand
> more and note the elapsed time. Additional fillip: Use the methods
> of Runtime to approximate the memory used before and after -- but
> realize this will only give an approximation.)
> Interchangeability of hash values is probably not a concern.
> It is not unheard-of for instances of different classes to equals()
> each other, but it is fairly unusual.


I've been trying to use POI to generate .xlsx files but their stuff is not clean by default. They allow a lot of unnecessary duplicate tags plus I found it takes exponentially longer then more cells I have. So I've been writing my own Excel API as an intermediate to gather what I need to write andget it clean then call the POI methods once to write it out.

I believe there are a lot of possible property strings to a cell style object, but that may include a lot that we'll never need. I can hard code which properties we're using as needed into my own style class to generate the more efficient hashcode to merge cell properties into unique objects. Thisshould run a lot faster.

Some fun challenges with this project. I found that the POI API lets you write 'invalid' information to the xlsx file, the LibreOffice Calc may do what you wanted it to even if it is coded wrong, and the Excel 365 will come up with a 'we fixed it for you' and just delete what it doesn't like. Then, while Excel 365 does have a function UI that pops up a window listing allvalid functions, it also has a conditional formatting UI that lets you format cell values using functions with absolutely no help on the syntax. I think they want you to read a manual and/or take a course on Excel to get itright.
Daniele Futtorovic (08-27-19, 11:51 PM)
On 2019-08-27 22:04, Eric Douglas wrote:
> I believe there are a lot of possible property strings to a cell
> style object, but that may include a lot that we'll never need. I
> can hard code which properties we're using as needed into my own
> style class to generate the more efficient hashcode to merge cell
> properties into unique objects.


It shouldn't exactly be difficult to write a hash function for a set of
key-pair associations that doesn't involve rendering all those into a
String.
Silvio (08-28-19, 01:27 AM)
On 26-08-19 23:41, Eric Sosman wrote:
> On 8/26/2019 4:51 PM, Eric Douglas wrote:
>     Just stir the String's hashes into what you've already got:
>     public int hashCode() {
>         long bits = java.lang.Double.doubleToLongBits(getX());
>         bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
>         bits ^= Objects.hashCode(getStringThing()) * 29;
>         bits ^= Objects.hashCode(getStrungOut()) * 37;
>         bits ^= Objects.hashCode(getStringBikini()) * 59;


Don't do this. Use the multiplier (no need for more than one if you pick
a good prime) for the accumulator and not the individual hash-codes.
[..]
Arne Vajhj (08-28-19, 03:08 AM)
On 8/27/2019 7:59 AM, Eric Douglas wrote:
> On Monday, August 26, 2019 at 7:04:02 PM UTC-4, Arne Vajhj wrote:
>> Have tried letting your IDE generate hashCode and equals based
>> on the fields you want?

> Didn't realize it could do that...so here's what it comes up with.


[..]
> result = prime * result + Float.floatToIntBits(z);
> return result;
> }


Until proven otherwise I would assume that is "good enough".

Arne
ross.finlayson (09-06-19, 05:27 AM)
On Tuesday, August 27, 2019 at 6:08:29 PM UTC-7, Arne Vajhj wrote:
> On 8/27/2019 7:59 AM, Eric Douglas wrote:
> Until proven otherwise I would assume that is "good enough".
> Arne


A flag?

If it's less than 64-many, it fits in a word.

Or for example a prime number in an alphabet
of objects that have no two objects share prime numbers.

That's a good hash code....

Hash codes have a good usual probability of not
colliding, i.e., even if hashcodes are unique,
hashcodes usually has a low linear number in
the bucket, the hash bucket.

Then, people give it to those but then the
buckets are large.

Hashing hashcodes, these days many of the
sources are under the distributed.

There's automatic containers and constructors
and such - might as well start giving objects
prime numbers.

The prime number as a type, is an int under the
guarantee that it's also a prime number.

Given these hashCodes of immutable object setters,
their value is of course way usually the composition.

I used Eclipse to make equals() and hashCode() was fine
but equals() was like "throw me false, I'll throw", and
I was like "what is this, Photon, that's not equals()
what Eclipse makes".

Bit flags or primes - efficient hash codes.

Similar Threads