Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for timestamp type #209

Merged
merged 3 commits into from
Aug 10, 2017
Merged

Proposal for timestamp type #209

merged 3 commits into from
Aug 10, 2017

Conversation

frsyuki
Copy link
Member

@frsyuki frsyuki commented Dec 22, 2015

Follows up #130 and #207.
See #207 for discussion.

* Timestamp 32 format can represent a timestamp in [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) range. Nanoseconds part is 0.
* Timestamp 64 format can represent a timestamp in [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) range.
* Timestamp 96 format can represent a timestamp in [-584554047284-02-23 16:59:44 UTC, 584554051223-11-09 07:00:16.000000000 UTC) range.
* In timestamp 64 and timestamp 96 formats, nanoseconds must not be larger than 999999999.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. This line seems to be the first rule regarding "canonical" representation of data when multiple representations would otherwise be available. None of the other types have such restrictions. For example a uint16 doesn't say the value must be at least 256 (since if it's less than that, it could have been represented by a uint8.)

Not that I'm complaining. I do think being strict about representation is a good thing, and I'd love it if the spec were strict everywhere else whenever multiple representations are available. UTF-8 forbids overlong sequences after all. (Although looking at the Protocol Buffers spec, it doesn't explicitly forbid overlong varints. I'm not sure why. Maybe it's unimportant.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this restriction specifically because deserialization becomes unexpectedly complicated if nanosecond part includes carry over, and following this restriction doesn't make serialization code complicated.

@vmihailenco
Copy link

This proposal looks good and I would like to implement it for https://github.com/vmihailenco/msgpack. Any chance it is going to be merged/changed soon?

@tagomoris
Copy link
Member

@vmihailenco msgpack organization doesn't have golang implementation, and msgpack.org introduces ugorji/go/codec. AFAIK, msgpack organization want not to increase # of repositories under itself.
Of course, you can implement anything on your own repository, but it's off topic for this pull-request.

@vmihailenco
Copy link

I am asking if this proposal going to be merged. Reference to the repository is just a proof that I am really interested in it.

@tagomoris
Copy link
Member

I got it. Sorry for misunderstanding.

@tagomoris
Copy link
Member

IMO, we can merge this new spec because there's no objections...

@drewnoakes
Copy link

+1 for merging this. Looks solid and very useful.

It'd be useful to know the process, if any, for evolving the spec. Publishing such details would build confidence in using the MsgPack format.

@tagomoris
Copy link
Member

@frsyuki ?

@potterdai
Copy link

Will this be merged?

spec.md Outdated
timestamp 64 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 2 32-bit unsigned integers:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd7 | -1 |nanoseconds in 30-bit unsigned int| seconds in 34-bit unsigned int |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not "2 32-bit unsigned integers".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed.

@frsyuki frsyuki merged commit 40e3d3d into master Aug 10, 2017
@frsyuki frsyuki deleted the extension-timestamp branch August 10, 2017 05:42
@tagomoris
Copy link
Member

🎉

@ludocode
Copy link

When de-serializing these new timestamps, does it make sense to consider an extension of type -1 with unrecognized length an error?

I'm currently implementing this in MPack and I'm trying to decide how to handle this. The spec doesn't explicitly say that such data should be considered invalid, but the pseudocode in the spec does say "error" if the length is not 4, 8 or 12. So I'm guessing raising an error is the expected behaviour, and judging by the implementors linking to this issue so far, this seems to be the consensus.

It seems to me that this is a bit of a problem though because data that was previously considered valid, like say an extension of type -1 and length 7, is now considered invalid and will flag errors in the newest parsers. Although negative extension types were reserved, this still feels like a backwards-incompatible tightening of the rules. It seems to contradict the new description of extensions which is that it is the application (not the library) that gets to decide whether to treat it as opaque or reject it as invalid.

I had considered reporting an extension of type -1 as a timestamp if it's parseable (i.e. the length is 4, 8 or 12 and the nanoseconds are in bounds), and otherwise just reporting it as an opaque extension object like any other instead of raising an error. Extension functions (like accessing the raw data) would still be available for extensions of type -1 regardless of length and regardless of whether it's recognized as a timestamp. This way if someone for whatever reason was using -1, this change would not break their code or data.

I suppose we can just say that nobody should have been using an extension of type -1 in the first place. So maybe it doesn't matter and I'm worrying over nothing. Still, as far as I know most libraries do not differentiate between reserved extensions and application-defined extensions (at least not through anything more than documentation.) So there's nothing stopping a user from accidentally choosing a reserved type and having it break later when libraries start rejecting their data.

For this reason I'm starting to think that maybe libraries and the spec should completely differentiate between reserved extensions and application-defined extensions. I may decide to handle reserved extensions as an entirely separate type within MPack, that way if users use them, it will be obvious that they may break in the future and they're on their own.

Something else to keep in mind with raising errors on unknown lengths is that this prevents us from inserting new lengths for timestamp in the future. For example lots of people wanted timezones, so I had previously suggested that we could extend this later to add one or two bytes to each length to specify it. This is actually not possible because it won't be backwards compatible. Existing libraries will report a parsing error (probably discarding the whole message) when they encounter a timestamp with the new lengths. So this means that this definition of timestamps is frozen forever: no other lengths can be added, and any change or addition will have to be introduced as a different extension type.

@tagomoris
Copy link
Member

@ludocode It looks an issue about merged spec. Could you create another issue for it?
We can't follow the discussion under closed pull-requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants