Before I start, there are so many strange things in the NMEA specification,
but this is one of my favorites:
I finally started looking at the NMEA TAG block specification. I paid for version 4.0 of the spec, so I should use it, right? I have to say it is slightly painful. I'll do my best to describe what TAG block is, the issues that it is trying to address, and how well the authors did in their design. I really hate how standards often do not list the authors / contributors. No way to start up a discussion.
The motivation for a format change.
The basic NMEA 0183 (3.0 and lower) sentence is missing some things. First, it is limited to 79 characters per line. Therefore, there are many times that one line is too short for all that you would like to send. AIS introduced the concept of messages that span multiple lines with fields to help re-assemble the total message. This falls apart if you have a single feed coming from multiple devices that may all be doing the same thing giving intermixed multi-line messages. If we can identify the source of the message, then we can re-assemble these multi-line messages without fear of mixing unrelated parts.
The next issue is the lack of metadata in the NMEA0183 message format. Unless the message has a timestamp built in, there is no standard way to log the time. We may have additional metadata requirements and there is no place to put them.
History
The USCG created an initial metadata format somewhere in the pre-2005 time frame that appended fields after the existing message. I've documented that format elsewhere, but to summarize, it consists of comma separated fields that start with a letter code except the last field, which is the unix UTC timestamp. It works and that is what I've used since 2006 for my logging and processing in software line ais-py, noaadata-py and libais.
I created a python regular expression that parses the basic structure of these "old USCG format" messages. Here is an example message and the regular expression being used to pull apart the fields.
TAG block
Apparently, that style was not good enough. So some group of people tried to create something better. I'm going to ignore the control aspect. I typically only deal with logs from devices that are out of my influence. We get what we get.
The basic structure of TAG is to put additional data in the front of the original NMEA 0183 messages between two "\" characters. This is extra interesting in that most programming languages use the back-slash to start escape codes (e.g. "\n" and "\r"; c.f. "man ascii" and "man printf").
An example to show what this looks like. I got this chunk back in 2009.
So we have a start! More in a future post.
Since there is no provision for guaranteed delivery of messages and only limited error checking capability, this standard should be used with caution in all safety applications.Wahoo! And off we go with safety of life applications without additional thought.
I finally started looking at the NMEA TAG block specification. I paid for version 4.0 of the spec, so I should use it, right? I have to say it is slightly painful. I'll do my best to describe what TAG block is, the issues that it is trying to address, and how well the authors did in their design. I really hate how standards often do not list the authors / contributors. No way to start up a discussion.
The motivation for a format change.
The basic NMEA 0183 (3.0 and lower) sentence is missing some things. First, it is limited to 79 characters per line. Therefore, there are many times that one line is too short for all that you would like to send. AIS introduced the concept of messages that span multiple lines with fields to help re-assemble the total message. This falls apart if you have a single feed coming from multiple devices that may all be doing the same thing giving intermixed multi-line messages. If we can identify the source of the message, then we can re-assemble these multi-line messages without fear of mixing unrelated parts.
The next issue is the lack of metadata in the NMEA0183 message format. Unless the message has a timestamp built in, there is no standard way to log the time. We may have additional metadata requirements and there is no place to put them.
History
The USCG created an initial metadata format somewhere in the pre-2005 time frame that appended fields after the existing message. I've documented that format elsewhere, but to summarize, it consists of comma separated fields that start with a letter code except the last field, which is the unix UTC timestamp. It works and that is what I've used since 2006 for my logging and processing in software line ais-py, noaadata-py and libais.
I created a python regular expression that parses the basic structure of these "old USCG format" messages. Here is an example message and the regular expression being used to pull apart the fields.
#!/usr/bin/env python import re nmea_re = re.compile (r'''^!(?P<talker>AI)(?P<string_type>VD[MO]) ,(?P<total>\d?) ,(?P<sen_num>\d?) ,(?P<seq_id>[0-9]?) ,(?P<chan>[AB]) ,(?P<body>[;:=@a-zA-Z0-9<>\?\'\`]*) ,(?P<fill_bits>\d)\*(?P<checksum>[0-9A-F][0-9A-F]) ( (,S(?P<slot>\d*)) | (,s(?P<s_rssi>\d*)) | (,d(?P<signal_strength>[-0-9]*)) | (,t(?P<t_recver_hhmmss>(?P<t_hour>\d\d)(?P<t_min>\d\d)(?P<t_sec>\d\d.\d*))) | (,T(?P<time_of_arrival>[^,]*)) | (,x(?P<x_station_counter>[0-9]*)) | (,(?P<station>(?P<station_type>[rbB])[a-zA-Z0-9_]*)) )* ,(?P<time_stamp>\d+([.]\d+)?)?''', re.VERBOSE) match = nmea_re.match('!AIVDM,1,1,,A,14eG>3@01kqiIs8ICROownFn0D03,0*02,d-106,S0993,t140726.00,T26.49646933,r09STWO1,1370786847') msg = match.groupdict() for key, val in msg.iteritems(): print '%s: %s' % (key, val) # result: ''' body: 14eG>3@01kqiIs8ICROownFn0D03 slot: 0993 t_sec: 26.00 t_min: 07 station_type: r seq_id: t_hour: 14 chan: A string_type: VDM fill_bits: 0 sen_num: 1 s_rssi: None t_recver_hhmmss: 140726.00 station: r09STWO1 time_of_arrival: 26.49646933 talker: AI checksum: 02 time_stamp: 1370786847 x_station_counter: None total: 1 signal_strength: -106 '''Remember that you can use kodos to try out python regular expressions.
TAG block
Apparently, that style was not good enough. So some group of people tried to create something better. I'm going to ignore the control aspect. I typically only deal with logs from devices that are out of my influence. We get what we get.
The basic structure of TAG is to put additional data in the front of the original NMEA 0183 messages between two "\" characters. This is extra interesting in that most programming languages use the back-slash to start escape codes (e.g. "\n" and "\r"; c.f. "man ascii" and "man printf").
An example to show what this looks like. I got this chunk back in 2009.
\g:1-2-73874,n:157036,s:r003669945,c:1241544035*4A\!AIVDM,1,1,,B,15N4cJ`005Jrek0H@9n`DW5608EP,0*13 \g:2-2-73874,n:157037*1D\$ARVSI,r003669945,,172036.69698935,1376,-095,0*15 \g:1-2-73875,n:157038,s:r003669945,c:1241544035*45\!AIVDM,1,1,,B,15NEcU0001JriI4H@2DEN038069@,0*00 \g:2-2-73875,n:157039*12\$ARVSI,r003669945,,172036.77711928,1379,-097,0*1B \g:1-2-5624390,n:2281546,s:r003669959,c:1241544037*7D\!AIVDM,1,1,,A,15PlLL@P1@JmA=DGWcw9M?w:069@,0*1D \g:2-2-5624390,n:2281547*25\$ARVSI,r003669959,,246060,9999,-094,0*34 \g:1-2-5624393,n:2281555,s:r003669959,c:1241544037*7C\!AIVDM,1,1,,A,15PlLL0OhgJn2ORG`TQ6nEC:28ES,0*3F \g:2-2-5624393,n:2281556*26\$ARVSI,r003669959,,246060,9999,-077,0*39 \g:1-2-5624394,n:2281557,s:r003669959,c:1241544037*79\!AIVDM,1,1,,A,18KKL00025rsc0<G<pMbI8E80@ET,0*73 \g:2-2-5624394,n:2281558*2F\$ARVSI,r003669959,,246060,9999,-112,0*3B \g:1-2-5624396,n:2281561,s:r003669959,c:1241544037*7E\!AIVDM,1,1,,B,181:JhP025Jl`t8Fo0U3c2q80<01,0*02 \g:2-2-5624396,n:2281562*24\$ARVSI,r003669959,,246060,9999,-117,0*3E \g:1-2-5624398,n:2281566,s:r003669959,c:1241544037*77\!AIVDM,1,1,,A,13:4<:002CJniU4G;qe:MHG40@E`,0*00 \g:2-2-5624398,n:2281567*2F\$ARVSI,r003669959,,246060,9999,-114,0*3D \g:1-2-5624421,n:2281618,s:r003669959,c:1241544037*78\!AIVDM,1,1,,A,15N9wuUPAJJnNGrGV>8J983:0D04,0*23 \g:2-2-5624421,n:2281619*20\$ARVSI,r003669959,,246060,9999,-053,0*3F \g:1-3-60450,n:131065,s:r003669946,c:1241544038*4B\!AIVDM,2,1,9,A,55MwpAh000000000000<OCO;GF2222222222220k1H4,0*2C \g:2-3-60450,n:131066*10\!AIVDM,2,2,9,A,3140002P00000000000000000000,2*79 \g:3-3-60450,n:131067*10\$ARVSI,r003669946,9,172038.11029899,1429,-091,0*2C \g:1-3-60451,n:131068,s:r003669946,c:1241544038*47\!AIVDM,2,1,0,B,55MwpAh000000000000<OCO;GF2222222222220k1H4,0*26 \g:2-3-60451,n:131069*1E\!AIVDM,2,2,0,B,3140002P00000000000000000000,2*73Pretty overwhelming and hard to read. Let's start off by creating and decoding a message that gives the Unix UTC timestamp, which is indicated by a 'c:' character. We won't give a NMEA 0183 msg to go with it. My reading is also that the checksum within a TAG block ( "\\[^/+]\\" ) is optional, but I will start off with including the checksum.
#!/usr/bin/env python from operator import xor import time def checksum(sentence): return ('%02x' % reduce(xor, map(ord, sentence.split('*')[0][1:]))).upper() timestamp = int(time.time()) # 137079172 body = 'c:%d' % timestamp msg = '\\{body}*{checksum}\\' print msg.format(body=body, checksum=checksum(body)) # \c:1370791724*31\We now need to start creating the regular expression to parse these TAG things. Let's start without worrying about the internal fields. I came up with
\\.*\*(?P<checksum>[0-9A-F]{2})?\\.*The back-slashes need to be escaped, so a single back-slash gets written as "\\" in the regex string. Now we can add the section for the timestamp. Sadly, the spec asserts that the timestamp is an integer. They then add the twist that the value can be either seconds or milliseconds, but does not give us an indicator for which. Ouch. Not like they had run out of letter codes. Currently, timestamps have 10 digits and TAG block started in 2008.
int(1e9) # 1000000000 datetime.datetime.utcfromtimestamp(1e9) # datetime.datetime(2001, 9, 9, 1, 46, 40)Therefore, we need our integer to have 10 or more digits for the timestamp. I came up with this regex:
\\c:(?PViewed in kodos:\d{10,15})\*(?P [0-9A-F]{2})?\\.*
So we have a start! More in a future post.