forked from nikolaosk/mrjob
-
Notifications
You must be signed in to change notification settings - Fork 1
/
CHANGES.txt
103 lines (91 loc) · 4.94 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
v0.2.8, 2011-09-07 -- Bugfixes and betas
* Fix log parsing crash dealing with timeout errors
* Make mr_travelling_salesman.py work with simplejson
* Add emr_additional_info option, to support EMR beta features
* Remove debian packaging (should be handled separately)
* Fix crash when creating tmp bucket for job in us-east-1
v0.2.7, 2011-07-12 -- Hooray for interns!
* All runner options can be set from the command line (Issue #121)
* Including for mrjob.tools.emr.create_job_flow (Issue #142)
* New EMR options:
* availability_zone (Issue #72)
* bootstrap_actions (Issue #69)
* enable_emr_debugging (Issue #133)
* Read counters from EMR log files (Issue #134)
* Clean old files out of S3 with mrjob.tools.emr.s3_tmpwatch (Issue #9)
* EMR parses and reports job failure due to steps timing out (Issue #15)
* EMR boostrap files are no longer made public on S3 (Issue #70)
* mrjob.tools.emr.terminate_idle_job_flows handles custom hadoop streaming
jars correctly (Issue #116)
* LocalMRJobRunner separates out counters by step (Issue #28)
* bootstrap_python_packages works regardless of tarball name (Issue #49)
* mrjob always creates temp buckets in the correct AWS region (Issue #64)
* Catch abuse of __main__ in jobs (Issue #78)
* Added mr_travelling_salesman example
v0.2.6, 2011-05-24 -- Hadoop 0.20 in EMR, inline runner, and more
* Set Hadoop to run on EMR with --hadoop-version (Issue #71).
* Default is still 0.18, but will change to 0.20 in mrjob v0.3.0.
* New inline runner, for testing locally with a debugger
* New --strict-protocols option, to catch unencodable data (Issue #76)
* Added steps_python_bin option (for use with virtualenv)
* mrjob no longer chokes when asked to run on an EMR job flow running
Hadoop 0.20 (Issue #110)
* mrjob no longer chokes on job flows with no LogUri (Issue #112)
v0.2.5, 2011-04-29 -- Hadoop input and output formats
* Added hadoop_input/output_format options
* You can now specify a custom Hadoop streaming jar (hadoop_streaming_jar)
* extra args to hadoop now come before -mapper/-reducer on EMR, so
that e.g. -libjar will work (worked in hadoop mode since v0.2.2)
* hadoop mode now supports s3n:// URIs (Issue #53)
v0.2.4, 2011-03-09 -- fix bootstrapping mrjob
* Fix bootstrapping of mrjob in hadoop and local mode (Issue #89)
* SSH tunnels try to use the same port for the same job flow (Issue #67)
* Added mr_postfix_bounce and mr_pegasos_svm to examples.
* Retry on spurious 505s from EMR API
v0.2.3, 2011-02-24 -- boto compatibility
* Fix incompatibility with boto 2.0b4 (Issue #91)
v0.2.2, 2011-02-15 -- GET/POST EMR issue
* Use POST requests for most EMR queries (EMR was choking on large GETs)
* find_probable_cause_of_failure() ignores transient errors (Issue #31)
* --hadoop-arg now actually works (Issue #79)
* on Hadoop, extra args are added first, so you can set e.g. -libjar
* S3 buckets may now have . in their names
* MRJob scripts now respect --quiet (Issue #84)
* added --no-output option for MRJob scripts (Issue #81)
* added --python-bin option (Issue #54)
v0.2.1, 2010-11-17 -- laststatechangereason bugfix
* Don't assume EMR sets laststatechangereason
v0.2.0, 2010-11-15 -- Many bugfixes, Windows support
* New Features/Changes:
* EMRJobRunner now prints % of mappers and reducers completed when you
enable the SSH tunnel.
* Added mr_page_rank example
* Added mrjob.tools.emr.audit_usage script (Issue #21)
* You can specify alternate job owners with the "owner" option. Useful for
auditing usage. (Issue #59)
* The job_name_prefix option has been renamed to label (the old name still
works but is deprecated)
* bootstrap_cmds and bootstrap_scripts no longer automatically invoke sudo
* Bugs Fixed/Cleanup:
* bootstrap files no longer get uploaded to S3 twice (Issue #8)
* When using add_file_option(), show_steps() can now see the local version
of the file (Issue #45)
* Now works on Windows (Issue #46)
* No longer requires external jar, tar, or zip binaries (Issue #47)
* mrjob-* scratch bucket is only created as needed (Issue #50)
* Can now specify us-east-1 region explicitly (Issue #58)
* mrjob.tools.emr.terminate_idle_job_flows leaves Hive jobs alone (Issue #60)
v0.1.0, 2010-10-28 -- Same code, better version. It's official!
v0.1.0-pre3, 2010-10-27 -- Pre-release to run Yelp code against
* Added debian packaging
* mrjob bootstrapping can now deal with symlinks in site-packages/mrjob
* MRJobRunner.stream_output() can now be called multiple times
v0.1.0-pre2, 2010-10-25 -- Second pre-release after testing
* Fixed small bugs that broke Python 2.5.1 and Python 2.7
* Fixed reading mrjob.conf without yaml installed
* Fix tests to work with modern simplejson and pipes.quote()
* Auto-create temp bucket on S3 if we don't have one (Issue #16)
* Auto-infer AWS region from bucket (Issue #7)
* --steps now passes in all extra args (e.g. --protocol) (Issue #4)
* Better docs
v0.1.0-pre1, 2010-10-21 -- Initial pre-release. YMMV!