Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1 2 
Send Topic Print
Practical tips for spreadsheet import (Read 13858 times)
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical tips for spreadsheet import
Reply #15 - Apr 03rd, 2013, 8:30am
 
For the today/yesterday part of the problem, you might find this stackoverflow thread of use.

I think I know a reason why my suggested regex for 'reply ' part of the source string didn't work. I typed \n (line break) when I meant \d (digit). corrected up-thread too.
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Practical tips for spreadsheet import
Reply #16 - Apr 3rd, 2013, 4:45pm
 
Thanks for helpful pointers. The paragraph in green perhaps could be added to aTbRef?  The revised cleandate() function below now:

1. Receives the original Tinderbox Forum data and time, which often looks something like this:

           Reply #3 - Mar 21st, 2013, 10:48pm

2. Uses Mark A's cleaner regex to strip any "Reply #" preamble, giving:

           Mar 21st, 2013, 10:48pm

3. Replaces ordinals (number suffixes like 'st') and commas with nothing, giving:

           Mar 21 2013 10:48pm

4. Grabs the components from the above with Python's strptime(), which has been told to expect them in that order and format.

5. Rearranges the components with strftime() to a machine-specific 'short' date and 'time' string, and returns that value to be included in the tsv data for import into Tinderbox.

The function now uses the Python datetime module referenced in Mark A's helpful stackoverflow link to replace any "Today at" or "Yesterday at" with the appropriate date (formatted as in step 3).

The Python module import lines at the top of the script must now be:

import codecs, urllib2, re, time
from datetime import datetime, timedelta
from bs4 import BeautifulSoup

Except for BeautfulSoup (a quick install) all of these come pre-installed on the Mac.

When this cleandate() function is substituted in, does the script now import a forum thread correctly on other machines, using, say, British format?

UK-style sample formats:  my_date_format = '%d/%m/%Y'  my_time_format = '%H:%M'
US-style sample formats:  my_date_format = '%m/%d/%Y'  my_time_format = '%I:%M%p'  

Code:
def cleandate(s):                     # remove ordinals, commas; return system 'short' date time
	my_date_format = '%Y.%m.%d'        # change to match machine-specific 'short' date format
	my_time_format = '%I:%M%p'         # change to match machine-specific 'short' time format
	src_date_format = '%b %d %Y'	   # source (Tinderbox forum) date format - do not change
	src_time_format = '%I:%M%p'	   # source (Tinderbox forum) time format - do not change
	s = re.sub(r'^Reply #\d+ - ','',s)   # regex to remove any leading 'Reply #d - '
	s = s.replace('th','').replace('rd,','').replace('nd','').replace('st,','').replace(',','')
	s = s.replace('Today at',datetime.now().strftime(src_date_format))
	s = s.replace('Yesterday at',(datetime.now()-timedelta(days=1)).strftime(src_date_format))
	try:
		s = datetime.strptime(s,src_date_format + ' ' + src_time_format)
	except TypeError:                  # workaround for some Python installations:
		s = datetime(*(time.strptime(s,src_date_format + ' ' + src_time_format)[0:5]))
	return s.strftime(my_date_format + ' ' + my_time_format)  

Back to top
 
« Last Edit: Apr 3rd, 2013, 11:06pm by Sumner Gerard »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical tips for spreadsheet import
Reply #17 - Apr 4th, 2013, 8:23am
 
Thanks for sharing.  I'm deep in testing other stuff right now but your choice of Python date/time placeholders look fine for a UK date/time, e.g. 02/03/2013 03:40.

The paragraph in green was indeed written with aTbRef in mind, though i'm not sure where to put this in the Import/Export/Formatting. Perhaps a new article "Date strings in imported text"? That way it can link to several import methods where the user may expect to use this concept (dropped text file, spreadsheet import, etc.).
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Practical tips for spreadsheet import
Reply #18 - Apr 4th, 2013, 5:55pm
 
Understood. When you or someone else has the few minutes it takes to give the script a whirl would love to hear how(if) it works elsewhere.

Studying results of running the two scripts here shows that getting stuff in and out of Tinderbox, integrating the program into workflows with other Mac applications, has been a subject of perennial interest on the forum. These are just examples of one way to get stuff in. In many respects, this way is easier for the user than the create-a-record AppleScriptability for which some users have, literally, begged. Just arrange the data in a simple "tsv" tabular "spreadsheet" format and Tinderbox does the rest.

While the concept is simple, the practicalities of arranging data in the right format are not always obvious, especially with so few Tinderbox-centric examples to study. So I think it would be great if you could add the paragraph in green to aTbRef, linked to related topics, and in due course also expand coverage of points that will help those working with other applications. Perhaps give Import, Export, and Formatting their own "sections"(??) And definitely (I think) add a note to the effect that a dropped text file, if its contents are in tsv format, will create a container with the file's name and child notes with all the needed attributes already set up and populated, etc.
Back to top
 
 
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical tips for spreadsheet import
Reply #19 - Apr 5th, 2013, 4:20am
 
New aTbRef page here. Lots of inbound (web) links to notes in this section make re-factoring into an import sub-section impractical for aTbRef5.

For v6 I'm open to ideas. I've still no input on a possible better structure for aTbRef6. In part it will depend on app changes but for some of the more general areas Im sure the layout can improve. It's only grown from my best guess at the time of creation!
Back to top
 
 

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Sumner Gerard
Full Member
*
Offline



Posts: 359

Re: Practical tips for spreadsheet import
Reply #20 - Apr 5th, 2013, 4:21pm
 
Thanks for the new aTbRef entry. Didn't expect it so soon! (Note that some text in para 3, 6, and 7 needs untangling when time permits.)

Given the sheer volume and scope and history of the opus, there's probably no ideal way to organize, but eventually it might help to have some sort of grouping some under the general idea (probably not that wording!) of "ways to get data into Tinderbox--even better than AppleScriptability."

Google must not have reindexed yet, as over here the new page doesn't seem to be listed among search results for "formatting dates" or "spreadsheet import" or "tsv" (the only hit on "tsv" is an entry about DEVONthink).

Mention of the OPML date format exception and the outward link is really helpful (just the kind of thing I would surely stub my toe on). Could the entry on OPML Import link back to the new page? A search for "OPML" seems the more likely entry point on that, at least for me.  Maybe also a link from Spreadsheet (Tab-Delimited Text) Import  and mention in that one, just as in the OPML Import entry, that Tinderbox imports (tsv) files dropped onto Tinderbox views (as well as accepting data tables dragged or pasted onto a view)?

A general observation from my amateur dabbling in scripting: When the source data comes via a custom user script (written in AppleScript, Python, whatever) rather than from another application with built-in export functionality, tsv format can often be *much* easier to achieve, making it a good choice in cases where preserving hierarchy of the data is not critical.  
Back to top
 
« Last Edit: Apr 5th, 2013, 4:32pm by Sumner Gerard »  
  IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Practical tips for spreadsheet import
Reply #21 - Apr 5th, 2013, 5:30pm
 
Thanks. I've corrected - I hope - the errors in the new page.  If more correction, drop me an email; I'm happy to accept suggestions. Otherwise, I suggest following up in the "Documentation and Tutorials" forum so we don't drift ob
ff-topic.

OPML Export and Import page already link to the new page. Have you refreshed cached pages in your browser? I've added link to the Spreadsheet import page.

Google updates in its own sweet time. I suspect my little site get re-indexed less often than key sites like icanhascheezburger or justinbieber.
Back to top
 
« Last Edit: Apr 5th, 2013, 5:32pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1 2 
Send Topic Print