-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for timestamp type #209
Conversation
* Timestamp 32 format can represent a timestamp in [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) range. Nanoseconds part is 0. | ||
* Timestamp 64 format can represent a timestamp in [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) range. | ||
* Timestamp 96 format can represent a timestamp in [-584554047284-02-23 16:59:44 UTC, 584554051223-11-09 07:00:16.000000000 UTC) range. | ||
* In timestamp 64 and timestamp 96 formats, nanoseconds must not be larger than 999999999. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. This line seems to be the first rule regarding "canonical" representation of data when multiple representations would otherwise be available. None of the other types have such restrictions. For example a uint16 doesn't say the value must be at least 256 (since if it's less than that, it could have been represented by a uint8.)
Not that I'm complaining. I do think being strict about representation is a good thing, and I'd love it if the spec were strict everywhere else whenever multiple representations are available. UTF-8 forbids overlong sequences after all. (Although looking at the Protocol Buffers spec, it doesn't explicitly forbid overlong varints. I'm not sure why. Maybe it's unimportant.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this restriction specifically because deserialization becomes unexpectedly complicated if nanosecond part includes carry over, and following this restriction doesn't make serialization code complicated.
This proposal looks good and I would like to implement it for https://github.com/vmihailenco/msgpack. Any chance it is going to be merged/changed soon? |
@vmihailenco msgpack organization doesn't have golang implementation, and msgpack.org introduces |
I am asking if this proposal going to be merged. Reference to the repository is just a proof that I am really interested in it. |
I got it. Sorry for misunderstanding. |
IMO, we can merge this new spec because there's no objections... |
+1 for merging this. Looks solid and very useful. It'd be useful to know the process, if any, for evolving the spec. Publishing such details would build confidence in using the MsgPack format. |
@frsyuki ? |
Will this be merged? |
spec.md
Outdated
timestamp 64 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC | ||
in 2 32-bit unsigned integers: | ||
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ | ||
| 0xd7 | -1 |nanoseconds in 30-bit unsigned int| seconds in 34-bit unsigned int | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not "2 32-bit unsigned integers".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Fixed.
🎉 |
When de-serializing these new timestamps, does it make sense to consider an extension of type -1 with unrecognized length an error? I'm currently implementing this in MPack and I'm trying to decide how to handle this. The spec doesn't explicitly say that such data should be considered invalid, but the pseudocode in the spec does say "error" if the length is not 4, 8 or 12. So I'm guessing raising an error is the expected behaviour, and judging by the implementors linking to this issue so far, this seems to be the consensus. It seems to me that this is a bit of a problem though because data that was previously considered valid, like say an extension of type -1 and length 7, is now considered invalid and will flag errors in the newest parsers. Although negative extension types were reserved, this still feels like a backwards-incompatible tightening of the rules. It seems to contradict the new description of extensions which is that it is the application (not the library) that gets to decide whether to treat it as opaque or reject it as invalid. I had considered reporting an extension of type -1 as a timestamp if it's parseable (i.e. the length is 4, 8 or 12 and the nanoseconds are in bounds), and otherwise just reporting it as an opaque extension object like any other instead of raising an error. Extension functions (like accessing the raw data) would still be available for extensions of type -1 regardless of length and regardless of whether it's recognized as a timestamp. This way if someone for whatever reason was using -1, this change would not break their code or data. I suppose we can just say that nobody should have been using an extension of type -1 in the first place. So maybe it doesn't matter and I'm worrying over nothing. Still, as far as I know most libraries do not differentiate between reserved extensions and application-defined extensions (at least not through anything more than documentation.) So there's nothing stopping a user from accidentally choosing a reserved type and having it break later when libraries start rejecting their data. For this reason I'm starting to think that maybe libraries and the spec should completely differentiate between reserved extensions and application-defined extensions. I may decide to handle reserved extensions as an entirely separate type within MPack, that way if users use them, it will be obvious that they may break in the future and they're on their own. Something else to keep in mind with raising errors on unknown lengths is that this prevents us from inserting new lengths for timestamp in the future. For example lots of people wanted timezones, so I had previously suggested that we could extend this later to add one or two bytes to each length to specify it. This is actually not possible because it won't be backwards compatible. Existing libraries will report a parsing error (probably discarding the whole message) when they encounter a timestamp with the new lengths. So this means that this definition of timestamps is frozen forever: no other lengths can be added, and any change or addition will have to be introduced as a different extension type. |
@ludocode It looks an issue about merged spec. Could you create another issue for it? |
Follows up #130 and #207.
See #207 for discussion.