experchange > fortran

spectrum (01-26-19, 02:24 AM)
Hello,

I'm now playing around with a "sequence" type (a derived type with a SEQUENCE
statement), which I have zero experience up to now, but having a hard time for
understanding its behavior (and use cases)... For example, in the following code,
I have printed the size of a sequence type and addresses of the components.

program main
use iso_c_binding
implicit none

type Nonseq_type
integer :: n
real :: x !! (1-a)
! double precision :: x !! (1-b)
endtype
type(Nonseq_type), target :: nonseq

type Seq_type
sequence
integer :: n
real :: x !! (2-a)
! double precision :: x !! (2-b)
endtype
type(Seq_type), target :: seq

print *, "nonseq:"
print *, "sizeof = ", sizeof( nonseq )
print *, "loc(n) = ", c_loc( nonseq % n )
print *, "loc(x) = ", c_loc( nonseq % x )
print *
print *, "seq:"
print *, "sizeof = ", sizeof( seq )
print *, "loc(n) = ", c_loc( seq % n )
print *, "loc(x) = ", c_loc( seq % x )
end

If I use REAL for the component x [in (1-a), (2-a)], gfortran-8.2 gives

nonseq:
sizeof = 8
loc(n) = 140734586615400
loc(x) = 140734586615404

seq:
sizeof = 8
loc(n) = 140734586615392
loc(x) = 140734586615396

which shows that both the Seq_type and Nonseq_type have 8 bytes and
the components n and x take 4 bytes (no padding). On the other hand,
if I use DOUBLE PRECISION for the component x [in (1-b), (2-b)], I get

nonseq:
sizeof = 16
loc(n) = 140734635312736
loc(x) = 140734635312744

seq:
sizeof = 16
loc(n) = 140734635312720
loc(x) = 140734635312728

showing that both types occupy 16 bytes. Looking at the addresses, it seems
there are 4 bytes after n (probably due to padding). So, I'm wondering

* Does the SEQUENCE statement not necessarily mean that components
in a type are aligned with no padding?

* If so, what is the practical difference between the above Seq_type and Nonseq_type?
Possibly, are they equivalent in this simple case, and more important differences
appear in a different usage...?
spectrum (01-26-19, 02:28 AM)
PS. Here, I mean the section in Modern Fortran Explained (with the green cover).
spectrum (01-26-19, 02:30 AM)
Hmm, I'm sorry again, this part

> So, essentially, does the SEQUENCE statement has nothing with contiguity of memory...?


is a typo of
Steve Lionel (01-26-19, 03:30 AM)
On 1/25/2019 7:30 PM, spectrum wrote:
> So, essentially, does the SEQUENCE statement has nothing_to_do_ with contiguity of memory (used by the components)...?


Correct. SEQUENCE disallows a compiler from rearranging components,
primarily so that storage association works. Compilers could (and
sometimes do) choose to also remove padding for SEQUENCE types, but
there's nothing in the standard saying so.

BIND(C) comes closer to that, but even there if the "companion C
processor" inserts padding, BIND(C) will as well.
gah4 (01-26-19, 03:57 AM)
On Friday, January 25, 2019 at 5:30:13 PM UTC-8, Steve Lionel wrote:
> On 1/25/2019 7:30 PM, spectrum wrote:
> > So, essentially, does the SEQUENCE statement has nothing_to_do_
> > with contiguity of memory (used by the components)...?


> Correct. SEQUENCE disallows a compiler from rearranging components,
> primarily so that storage association works. Compilers could (and
> sometimes do) choose to also remove padding for SEQUENCE types, but
> there's nothing in the standard saying so.


It seems that compilers could rearrange things, to minimize
padding, in those unfortunate cases where lots of padding would
be required.

In Fortran 66 days, programmers learned to put variables with
more restrictive alignment (larger data types) first.

> BIND(C) comes closer to that, but even there if the "companion C
> processor" inserts padding, BIND(C) will as well.


There are enough systems around that require data to be
aligned appropriately, that, at least for those, they must
do padding. (Or else move at run time, which is slow.)

For those that don't require padding, access is usually faster
for aligned data, so again, one expects padding.

I suspect, though, that it is easiest to design compilers to
always SEQUENCE, except for BIND(C).

Though there are stories about compilers optimizing for
specific SPEC programs, doing things that otherwise would
not be allowed. (Such as changing between array of structure
and structure of array.) That might be more than even
non-SEQUENCE allows, though.
Ron Shepard (01-26-19, 08:10 AM)
On 1/25/19 7:57 PM, gah4 wrote:
> On Friday, January 25, 2019 at 5:30:13 PM UTC-8, Steve Lionel wrote:
> It seems that compilers could rearrange things, to minimize
> padding, in those unfortunate cases where lots of padding would
> be required.


Perhaps it wasn't clear, but that is the normal behavior. Compilers are
free to rearrange things within a derived type to eliminate padding, to
force alignment, or for other reasons. It is only with SEQUENCE that
this is not allowed.

$.02 -Ron Shepard
gah4 (01-26-19, 04:14 PM)
On Friday, January 25, 2019 at 10:10:42 PM UTC-8, Ron Shepard wrote:

(snip, I wrote)

> > It seems that compilers could rearrange things, to minimize
> > padding, in those unfortunate cases where lots of padding would
> > be required.


> Perhaps it wasn't clear, but that is the normal behavior. Compilers are
> free to rearrange things within a derived type to eliminate padding, to
> force alignment, or for other reasons. It is only with SEQUENCE that
> this is not allowed.


Yes, but how often do they do that?

They are free to do it, but also free not to do it.
Steve Lionel (01-26-19, 05:53 PM)
On 1/26/2019 1:10 AM, Ron Shepard wrote:
> Perhaps it wasn't clear, but that is the normal behavior. Compilers are
> free to rearrange things within a derived type to eliminate padding, to
> force alignment, or for other reasons. It is only with SEQUENCE that
> this is not allowed.


A correction - compilers are allowed to insert padding for SEQUENCE
types. The compilers I am familiar with don't do so by default.
spectrum (01-26-19, 06:02 PM)
On Saturday, January 26, 2019 at 10:30:13 AM UTC+9, Steve Lionel wrote:
> Correct. SEQUENCE disallows a compiler from rearranging components,
> primarily so that storage association works. Compilers could (and
> sometimes do) choose to also remove padding for SEQUENCE types, but
> there's nothing in the standard saying so.


Thanks very much for clarification. Indeed, I did not know that a compiler
may rearrange the actual order of components in memory.
My initial assumption (understanding) was something like this:

* A derived type without SEQUENCE (i.e., a usual one with no bind(C))
--> Components are aligned in order, but spaces may be inserted between
the components in memory by a compiler (i.e. padding).

* A derived type with SEQUENCE
--> Components are aligned in order, _and_ no space between them is allowed
(i.e., all the components are guaranteed to be contiguous in memory).

So I think my assumption above is basically wrong...

-----

But then, I still cannot understand what is a typical "use case" for enforcing
the order of components, while allowing the possibility of padding.
For example, this explanation

> SEQUENCE disallows a compiler from rearranging components,
> primarily so that storage association works.


seems to imply that the goal of SEQUENCE is to facilitate some storage association.
Modern Fortran Explained also gives various explanations about storage association.
However, I cannot imagine good examples how they are used for actual purposes...

More specifically, my understanding of "storage association" is something like:

1) COMMON blocks
2) argument association between actual and formal (dummy) arguments
(e.g., via explicit-shape array declaration in F77 style)
3) equivalence

etc. Namely, anything that allows the access of the same memory
in two different ways (or from different parts of the same program).

Then I cannot imagine a typical pattern of how a SEQUENCE type
can be used similarly somehow to facilitate "storage association"...
spectrum (01-26-19, 06:11 PM)
On Saturday, January 26, 2019 at 10:57:37 AM UTC+9, ga...@u.washington.edu wrote:
> In Fortran 66 days, programmers learned to put variables with
> more restrictive alignment (larger data types) first.
> There are enough systems around that require data to be
> aligned appropriately, that, at least for those, they must
> do padding. (Or else move at run time, which is slow.)
> For those that don't require padding, access is usually faster
> for aligned data, so again, one expects padding.
> I suspect, though, that it is easiest to design compilers to
> always SEQUENCE, except for BIND(C).


Indeed, I guess there is a long history to reach the current standard specification...

Compared to SEQUENCE, the purpose of BIND(C) seems very clear to me.
My understanding of BIND(C) (at the moment) is to guarantee the same memory
alignment of components between C and Fortran compilers in the same
family (e.g., gcc and gfortran, or icc and ifort), so that they can be used from either side.

By the way, even in the C language, is it possible that rearrangement of components
(or fields) in a struct can occur, depending on compilers?
spectrum (01-26-19, 06:17 PM)
PS. Here, my meaning of "rearrangement" is a change in the actual order of fields in
memory. Again, I imagined that only padding is allowed in a C struct (but I'm not sure at all).
But, as often the case, the C standard might say nothing about such "concrete" thing :)
spectrum (01-26-19, 07:37 PM)
More specifically, I imagined that a sequence type might be used for
providing an alternative way to access existing data, for example,

------------------------
program main
implicit none
integer a( 1000 ), k
a = [( k, k=1,size(a) )]

call test( a, size( a ) / 2 )
end

subroutine test( pairs, np ) !! implicit interface
implicit none
type IntPair
sequence
integer x, y
endtype
integer :: np, ip
type(IntPair) :: pairs( np )

print *, "first pair: ", pairs( 1 )
print *, "last pair: ", pairs( np )
end
------------------------

which gives (gfortran, pgi, with a warning about type mismatch)

first pair: 1 2
last pair: 999 1000

And similarly, using c_f_pointer():

------------------------
program main
use iso_c_binding
implicit none
integer, target :: a( 1000 ), k
type IntPair
sequence
integer x, y
endtype
type(IntPair), pointer :: pairs(:)

a = [( k, k=1,size(a) )]

call c_f_pointer( c_loc(a), pairs, [ 500 ] )

print *, "first pair: ", pairs( 1 )
print *, "last pair: ", pairs( 500 )
end
------------------------

but I guess I should avoid this kind of (implicit) approach to reinterpret
an existing array....
Spiros Bousbouras (01-26-19, 08:03 PM)
On Sat, 26 Jan 2019 08:17:32 -0800 (PST)
spectrum <septcolor7> wrote:
> PS. Here, my meaning of "rearrangement" is a change in the actual order of fields in
> memory. Again, I imagined that only padding is allowed in a C struct (but I'm not sure at all).
> But, as often the case, the C standard might say nothing about such "concrete" thing :)


It does. Paragraph 15 of "6.7.2.1 Structure and union specifiers" says

Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at
its beginning.
spectrum (01-26-19, 11:03 PM)
On Sunday, January 27, 2019 at 3:03:10 AM UTC+9, Spiros Bousbouras wrote:
> It does. Paragraph 15 of "6.7.2.1 Structure and union specifiers" says
> Within a structure object, the non-bit-field members and the units in
> which bit-fields reside ___have addresses that increase___ in the order in
> which they are declared. (...snip...)
> There may be unnamed padding within a structure object, but not at
> its beginning.


// the above emphasis (__) by me

Thanks very much! So the C standard requires that the fields are
placed in the same order as the declaration...

# This is one page that I came across, and it also says "Reordering is not allowed."
It is also interesting that C# allows such reordering (according to the OP in the page).
Thomas Koenig (01-27-19, 03:12 AM)
Steve Lionel <steve> schrieb:
> On 1/26/2019 1:10 AM, Ron Shepard wrote:
> A correction - compilers are allowed to insert padding for SEQUENCE
> types. The compilers I am familiar with don't do so by default.


For gfortran, I get

program main
use iso_c_binding
type foo
sequence
character(1) :: a
double precision :: b
end type foo
type(foo), target :: x
print '(Z16)',c_loc(x%a)
print '(Z16)',c_loc(x%b)
end
$ gfortran sequence.f90
$ ./a.out
7FFC01020470
7FFC01020478
$

So, there clearly is padding.

Similar Threads