Hackpads are smart collaborative documents. .

Peter Darche

1368 days ago
Unfiled. Edited by Julia Marden 1368 days ago
Crisis Text Line helps teens in crisis by providing access to a trained professional counselor via the medium they are most comfortable with: texts.  The Crisis Text Line has anonymized metadata on over 3000 conversations. 
 
1371 days ago
Unfiled. Edited by Laura Kiku Rodriguez Takeuchi , Peter Darche 1371 days ago
Laura T Sara, do we have the data for all the Barometers already? I can only see the African one in the dropbox. Let me know if I can be of help to obtain the remaining (note I don't have SPSS)
Peter D Alex and I have been munging away and have data barometer data, IIAG data, WGI data and HDI data.  It's currently in a github repo that I'll be linking to shortly.  
 
Additional datasets: 
 
Arabbarometer http://arabbarometer.org/?q=instruments-and-data-files
Asiabarometer http://www.asianbarometer.org/newenglish/surveys/DataRelease3.htm
 
1380 days ago
Unfiled. Edited by Dave Goodsmith , Peter Darche 1380 days ago
  • Project-related Data
 
Please include following categories and sub-categories in your project pad:  
 
1384 days ago
Unfiled. Edited by Dave Goodsmith 1384 days ago
Tagline: Streamline MODA's Data Pipeline
 
Goal:
Develop a better script to convert address data to BIN and BBL
 
Opportunity:
Citizen reports -- including noise complaints, crime reports, trash citings, building safety, etc. -- use addresses, aka geocodes, to identify location.  MODA, tasked with analyzing and aggregating data across the entire city, needs to convert geocodes to BIN and BBL in order to generate analyses that can get at the heart of real change.  Nearly every MODA analysis starts with the conversion from geocode to BIN and BBL, and the process could use a revamp.
 
Technical Specifics:
There's a SAS script currently written to convert Geo to BIN and BBL, but it could use improvement for both speed, clarity and reliability.  The Data Scientists at a 'Dive could be the perfect saviors to tackle the problem -- possibly reframing it in R or another open source fashion, and nailing the algorithm.
 
Impact:
If they can beat the current procedure, volunteers can make an impact on every single city-wide analysis conducted by MODA and NYC.  Imagine MODA can do twice as many analyses, leading to twice as much monetary or social impact, or other types of metrics.  Furthermore, diving into this conversion is a process which can have positive implications for cities across the country.
 
Data:
- Geo Database
- BIN Database
- BBL Database
- SAS Files
 
 
Notes on the Current Algorithm
  • Input Data problems:
  • Geocodes only by addresses (not by bin or business names, for example)
  • Input data can be just a csv file
  • Data Cleaning: 
  • Not all addresses are able to be cleaned, parsed, normalized and standardized 
  • Reference Data
  • No automated checks for new updates of PAD from www.nyc.gov
  • No self-learning of common misspellings, abbreviations
  • Can be added more underlying geographic databases  
  • Matching process 
  • Doesn’t match 100%, the best results are ~ 98% with clean perfect data
  • Matches only to points (x, y coordinates, bin, bbl), can be added more matching options: match to lines, to polygons (the ideal situation to match to tax lot polygons)  
  • Output Data Problems:
  • No history/time series  
  • No additional information:
  • # of units for each bin (can be used Melissa data)
  • Some pluto characteristics 
  • Outputs just as csv file
  • No Quality Control (accuracy, false positive after cleaning, matching)
  • Running time on personal computers (to geocode ~1 Gigabyte takes ~ 8 hours on a station with ~4 Gigabytes memory and ~ 100 gigabytes of disk space)
 
  
 
 
 
 
 
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in